1 s2.0 S0360835219307107 Main
1 s2.0 S0360835219307107 Main
1 s2.0 S0360835219307107 Main
a
School of Business Administration, Southwestern University of Finance and Economics, Chengdu 610074, PR China
b
SMART, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, PR China
c
Department of Logistics and Maritime Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
d
ECEE, College of Engineering & Applied Science, University of Colorado, Boulder, CO 80309, USA
Keywords: The multiple vehicle pickup and delivery problem is a generalization of the traveling salesman problem that has
Pickup and delivery problem many important applications in supply chain logistics. One of the most prominent variants requires the route
Routing durations and the capacity of each vehicle to lie within given limits, while performing the loading and unloading
Traveling salesman operations by a last-in-first-out (LIFO) protocol. We propose a learning-based memetic algorithm to solve this
Memetic algorithms
problem that incorporates a hybrid initial solution construction method, a learning-based local search procedure,
Hybrid heuristics
Learning mechanisms
an effective component-based crossover operator utilizing the concept of structured combinations, and a longest-
common-subsequence-based population updating strategy. Experimental results show that our approach is
highly effective in terms of both computational efficiency and solution quality in comparison with the current
state-of-the-art, improving the previous best-known results for 132 out of 158 problem instances, while matching
the best-known results for all but three of the remaining instances.
⁎
Corresponding author.
E-mail addresses: [email protected] (B. Peng), [email protected] (Z. Lü), [email protected] (T.C.E. Cheng).
https://doi.org/10.1016/j.cie.2019.106241
Received 31 May 2019; Received in revised form 17 December 2019; Accepted 19 December 2019
Available online 03 February 2020
0360-8352/ © 2020 Elsevier Ltd. All rights reserved.
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
from their industry partner. Koch, Bortfeldt, and Wäscher (2018) con- 2017; Montero, Miranda-Bront, & Mndez-Daz, 2017; Naccache, Côté, &
sidered vehicle routing problems with backhauls, time windows, si- Coelho, 2018; Veenstra, Cherkesly, Desaulniers, & Laporte, 2017).
multaneous delivery and pickup and three dimensional loading con- In this paper we focus on the pickup and delivery problem under the
straints (3L-VRPSDPTW), proposing a hybrid algorithm consisting of an LIFO policy, utilizing a general framework that can also be applied to
adaptive large neighborhood search for the routing and different address the problem under the FIFO policy. Specifically, we extend the
packing heuristics for the loading problem. Reil, Bortfeldt, and Mönch TSPPD problem to involve multiple vehicles, enabling the single vehicle
(2018) considered the vehicle routing problems with backhauls, time problem to be handled as a special case.
windows and three dimensional loading constraints, studying different Over the past decades, several state-of-the-art algorithms have been
backhaul variants, including clustered backhauls, mixed linehauls and proposed for solving TSPPD with the multiple vehicle extension.
backhauls, and variants with simultaneous delivery and pickup and Kachitvichyanukul, Sombuntham, and Kunnapapdeelert (2015) pro-
with divisible delivery and pickup. They employed another two-phase posed two new solution representations, SD2 and SD3 for PSO algo-
method to tackle this problem, in which the first phase of their method rithm which they used in conjunction with a variant of PSO involving
carried out the packing of goods for solving a 3D strip packing problem multiple social learning terms in application to the generalized multi-
for each customer using tabu search. The second phase frist used a depot vehicle routing problem with multiple pickup and delivery re-
multi-start evolutionary strategy to minimize the number of vehicles quests. Cherkesly, Desaulniers, and Laporte (2015) proposed a popu-
and then a tabu search to minimize the total travel distance. lation-based metaheuristic to address the multiple vehicle pickup and
delivery problem with LIFO loading and time windows, called the
1.2. Distinguishing characteristics of the TSPPD problem and its MPDPL with time windows. The authors combined local search with a
applications genetic algorithm to produce high-quality solutions within reasonable
computing times. Cheang, Gao, Lim, Qin, and Zhu (2012) considered
In the TSPPD problem, there exist two ways in which the loading the case where the route length of each vehicle cannot exceed a max-
and unloading operations corresponding to the pickup and delivery imum limit and the vehicles have unlimited capacity, called MPDPL
activities, respectively, are performed, namely first-in-first-out (FIFO) with distance constraints, abbreviated as PDPLD. They proposed a two-
and last-in-first-out (LIFO), which correspond to two variants of TSPPD, stage approach for solving the problem to minimize the total distance
called TSPPD with LIFO and TSPPD with FIFO. The FIFO policy implies and the number of vehicles, employing simulated annealing and ejec-
that when a pickup node is visited, its corresponding item is loaded in a tion pool in the first stage, and variable neighborhood search and
linear queue and an item can only be delivered if it is the first item of probabilistic tabu search in the second stage. Benavent, Landete, Mota,
the queue, while the LIFO policy utilizes the mechanism of stack instead and Tirado (2015) addressed MPDPL with distance constraints (PDPLD)
of queue, i.e., an item can be delivered if it is on the top of the stack. as a special case of MPDPL with maximum time (which is called the
Fig. 1 depicts the two different policies, in which 0+ and 0 represent pickup and delivery problem with limited time, abbreviated as PDPLT),
the depot at the beginning and end of the two routes, and i+ and i observing that minimizing the total distance is equivalent to mini-
represent the pickup and the delivery nodes for item i (and similarly for mizing the total time and that minimizing the number of vehicles as the
item j ), where Fig. 1(a) and (b) show the FIFO and LIFO loadings, re- primary objective can be addressed by adding a large number to the
spectively. travel times of the arcs leaving the depot. However, the exact method
In practice, TSPPD with FIFO exists in many real-life applications proposed by Benavent et al. (2015) can only solve instances with up to
such as the dial-a-ride system where the major concern is fairness, i.e., 60 nodes, while their proposed tabu search can solve larger instances
the passengers (such as patients) picked up earlier must be dropped off with up to 400 nodes.
earlier. Previous contributions to solve this problem include a branch- This difference between the exact and metaheuristic methods mo-
and-bound algorithm by Carrabs, Cerulli, and Cordeau (2007), a tivates us to employ a metaheuristic approach to tackle large-size in-
branch-and-cut algorithm by Cordeau, DellAmico, and Iori (2010) that stances of the PDPLT problem. The main contributions of our study are
can solve instances with up to 25 requests, and two effective heuristics as follows:
based on probabilistic tabu search and iterated local search by Erdogan,
Cordeau, and Laporte (2009). Recently, Lu, Benlic, and Wu (2018) •A learning-based memetic algorithm (LMA) for solving PDPLT,
proposed a multi-restart iterative search approach based on combined which introduces a hybrid initial solution construction method by
utilization of six move operators to tackle this problem. incorporating the splitting approach and the Lin-Kernighan heuristic
On the other hand, TSPPD with LIFO likewise occurs in many ap- (LKH) for the asymmetric travelling salesman problem (ATSP), a
plications, such as the transport of bulky, fragile, or hazardous items. subproblem of PDPLT, to generate a random initial population with
Cordeau, Iori, Laporte, and Salazar González (2010) proposed a branch- high quality.
and-cut algorithm that can solve instances with up to 17 requests, while • A reward and punishment mechanism inspired by reinforcement
Li, Lim, Oon, Qin, and Tu (2011) proposed a variable neighborhood learning to manage the multiple neighborhood moves and guide the
search heuristic based on a tree representation to improve the previous search.
results in the literature. In general, TSPPD and its variants have been • A component-based crossover operator and a longest-common-sub-
extensively researched in the literature, where the recent studies in- sequence-based (LCS-based) population updating strategy to obtain
clude (Chami, Manier, & Manier, 2017; Furtado, Munari, & Morabito, a better trade-off between intensification and diversification of the
search.
• Our experimental results demonstrate that the performance of our
LMA is highly effective compared to state-of-the-art approaches in
the literature by improving the previous best-known results for 131
out of 158 problem instances (including both PDPLD and PDPLT
instances), while matching the best-known results for all but three of
the remaining instances.
2
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
of the proposed algorithm against the current best performing algo- minimum total travel time as follows:
rithms for both PDPLT and PDPLD. In Section 5 we analyze the main S
strategic components of our algorithm. Finally, we conclude the paper Minimizef (S ) = DT (Ri ) + ST (Ri ),
and suggest topics for future research in Section 6. i=1 (1)
•V=P D O denotes the set of nodes, where P = {1+ , 2+, ...,n+} component. Fig. 2 is an example in which there are two components in
the route. The vertices in positions 1 and 7 identify the beginnings of
denotes the set of pickup nodes, D = {1 , 2 , ...,n } is the set of de-
the two components as their values are equal to 1, i.e., there are no
livery nodes, O denotes the set of starting and ending nodes {0+, 0 } ,
also called depots, and E = {(u, v ): u, v V , u)¯=v} is the edge set. requests transported by the vehicle before visiting these two nodes. The
• Each item must be picked at i+ P and delivered at i D , where delivery vertices in positions 6 and 10 respectively correspond to the
ends of the components as their values are equal to 0. There are no
the load of the item is denoted by di .
• The service time at each pickup node or delivery node u P D is requests for the vehicle after serving these two vertices. Hence, the
paths from positions 1 to 6 and positions 7 to 10 denote two different
denoted by stu , and the travel time to traverse the arc (u, v ) E is
denoted by ttu, v . components.
• The maximum capacity of each vehicle is .
• The maximum duration of each route including the service time and 3. Memetic algorithm
traversal time is MD .
3.1. Main framework
Let R be one route in a solution S and
R = {u0 = 0+, u1, u2 , , um = 0 } , where uk is the k th node visited in R A memetic algorithm is a general-purpose metaheuristic approach
(1 k m ). If the visited node is a pickup node (i.e., uk P ), the that typically combines a local search optimization procedure with a
corresponding load capacity of the vehicle after visiting it is population-based framework, which has been successfully applied to
l (uk ) = l (uk 1) + duk , while the corresponding load of the vehicle after tackle many classical combinatorial optimization problems, including
visiting it is equal to l (uk ) = l (uk 1) - duk if the visited node is a pickup the quadratic assignment problem (Benlic & Hao, 2015), which pro-
node (i.e., uk D ). For each node in the route, the corresponding load vides a different generalization of the traveling salesman problem. The
cannot exceed the given maximum capacity, i.e., l (uk ) MC . We de- purpose of combining local search and population-based strategies is to
note by DT (R) = k = 0 ttuk , uk +1 the total traversal time and
m 1
take advantage of both the crossover operator as a diversification me-
ST (R) = k = 1 stk the total service time. The corresponding duration of
m 1
chanism for discovering promising unexplored regions of the search
each route including the traversal time and service time cannot exceed space and the local optimization as an intensification procedure to
the given maximum duration, i.e., DT (R) + ST (R) MD . In addition, obtain high-quality solutions within a search region. We outline our
the LIFO policy is followed for both pickup and delivery operations. A proposed memetic algorithm for PDPLT in Algorithm 1. At the begin-
feasible solution to this problem is a set of vehicle routes that satisfy ning of the algorithm, we iteratively employ a hybrid heuristic method
three constraints, i.e., the maximum capacity, maximum duration, and to generate the initial population (line 1). Following this, we employ a
LIFO constraints. The objective is to find a feasible solution with the learning-based local search to optimize the solutions in the population
3
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
(lines 2–4). Later, we iteratively combine two parent solutions ran- construction procedure are presented in Algorithm 2 and can be sum-
domly selected from the population to generate offspring solutions marized in the following steps:
using a component-based crossover operator under the LIFO policy
until the stopping criterion, i.e., maximum computing time, is satisfied
Algorithm 2. Hybrid initial solution
(lines 5–7). After each use of the crossover operator, we improve the
generated offspring solution using a learning-based local search to Require: Benchmark instance (B)
guide the search to promising regions (line 8). During this process, S Ensure: The initial solution (S0 )
records the best solution found so far (lines 9–11). We then apply the / Step 1: Generate a set of components by randomly setting k couples as one
longest-common-sequence-based (LCS-based) population updating component, with satisfying the maximum capacity constraint /
1: t 0
strategy to possibly replace the worst individual in the population with 2: while The request set N is not empty, i.e., N do
the improved offspring solution (lines 12–15). 3: Construct each component by randomly selecting k requests (i.e., i1, , ik ) f-
rom the request set N , by satisfying the maximum capacity constraint, i.e.,
k
s = 1 dis < = MC
Algorithm 1. Framework of the memetic algorithm for solving PDPLT
4: N N {i1, , ik }
Require: Benchmark instance (B); the maximum computing time (Tmax ) 5: t t+1
Ensure: Best-found solution (S ) 6: Ct {i1+, , ik+, ik , , i1 }
/ Generate np feasible solutions as an initial population (Section 3.2) / 7: end while
1: Pc = {S1, , Snp} Hybird _inital _solution (B ) / Step 2: Construct a composite ATSP tour containing all the components gener-
ated in Step 1 by employing the well-known LKH as implemented /
/ Improve each individual Si in the population with a learning-based local search
(Section 3.3) / 8: S LKH_ATSP((C1, C2, , Ct ) ) / t denotes the number of components generated
2: for i = 1, , np do in Step 1 /
/ Step 3: Split the composite ATSP tour into several routes with satisfying the
3: Si Learning _based _localsearch (Si )
maximum duration constraint /
4: end for
9: Rm 0, m 1
5: while The maximum computing time Tmax is not reached do
10: for i = 1, , t do
6: Randomly select parent solutions Si and Sj from P where 1 i, j np and i j
11: if DT(Rm Ci ) + ST(Rm Ci ) MD then
/ Generate offspring Sc from Si and Sj (Section 3.4) /
12: Rm Rm Ci
7: Sc Si Sj = Component _based _crossover (Si , Sj ) 13: else
/ Improve Sc with a learning-based local search (Section 3.3) / 14: m m+1
8: Sc Learning _based _localsearch (Sc ) 15: Rm Ci
9: Sc is better than S 16: end if
10: S Sc 17: end for
11: end if / Step 4: Use LKH to optimize each route /
/ The longest-common-subsequence based population updating strategy (Sect- 18: for i = 1, , m do
ion 3.5) / 19: LKH_ATSP(Ri )
12: Determine the worst individual Sw where the goodness value 20: end for
GS (Sw, Pc ) = min{GS (Sk , Pc ) } , 1 k np (see Eq. (7)) 21: S0 (R1, , Rm )
13: if GS (Sc, Pc Sc ) > GS (Sw, Pc Sc ) then 22: return (S0 )
14: Pc Pc Sc Sw
15: end if
16: end while
17: return (S )
• Step 1. Generate a set of components C by randomly setting k
(1 k 3) couples {i1+, , ik+, ik , , i1 } as one component with sa-
tisfying both the LIFO constraint and maximum capacity constraint
3.2. Hybrid initial solutions (see lines 1–6 in Algorithm 2). For example, in Fig. 3, there exist six
components, i.e., C1-C6.
We construct the initial solutions by iteratively using a hybrid initial
procedure based on a splitting approach that is able to obtain high-
• Step 2. Construct a composite ATSP tour by employing the well-
known LKH procedure (Helsgaun, 2000) to optimize all the com-
quality initial solutions within short computing time. A similar hybrid ponents (C1, C2, , Ct ) generated in Step 1 (line 8). The reason why
initial procedure has been successfully employed to tackle various ve- we deploy the ATSP model to obtain high-quality route stems from
hicle routing problems (VRPs), e.g., multi-depot VRP (Escobar, Linfati, the fact that the traveling time from component C1 to component C2
Toth, & Baldoquin, 2014) and multi-route VRP (Azi, Gendreau, & is not necessarily equal to the traveling time from C2 to C1.
Potvin, 2014). In order to generate high-quality initial solutions, we
first adapt the splitting mechanism for our problem by employing the
• Step 3. Split the composite ATSP tour into several routes, by itera-
tively inserting each component C into the current route R if the
Lin-Kernighan heuristic (LKH) for the ATSP subproblem in PDPLT to current route satisfies the maximum duration constraint. Otherwise,
improve the solution quality of the initial solutions. The steps of the
4
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Fig. 4. Illustration of the crossover operator with respect to the LIFO policy.
Table 1
Settings of some important parameters for LMA.
Parameter Description Candidate values Final value
The reaction factor that controls how quickly the score adjustment function reacts to changes according to the performance of the 0.1, 0.2, 0.3 0.1
moves
1 The reward parameter if a move produces a new best solution 1, 5, 10 5
2 The reward parameter if a move improves the current solution 1, 2, 5 1
The punishment parameter if a generated solution is worse than the current solution 0.8, 0.9, 0.99 0.9
The parameter of the ratio of the number of request nodes deleted in the perturbation strategy (n ) 2,4,6 4
The threshold used to employ the dedicated perturbation (n * ) 1, 5, 10 5
np The number of individuals in the population 10, 20, 30 10
The constant parameter to balance the objective value and the distance in the goodness value 0.5, 0.7, 0.9 0.7
we insert it into a new route (lines 9–17). For example, in Fig. 3, the • Request-Insertion (M ): A request (e.g., {i , i }) is removed from its
1
+
route of the whole ATSP tour is split into three routes to satisfy the current route and inserted either in a different position of the same
maximum duration constraint. route or in a different route.
• Step 4. For each route R , we obtain an ATSP tour by using LKH to • Request-Swap (M2 ): Two requests (in the same route or in different
minimize the total travelling cost for visiting the customers be- routes) exchange their positions.
longing to the route (lines 18–21). Finally, the generated solution S0 • Double-Request-Swap (M3 ): Two pairs of consecutive requests ex-
is returned as the initial solution (line 22). change their positions. This operator extends the Request-Swap
move by considering consecutive requests i and j defined to be
The hybrid initial solution construction procedure proposed above consecutive if the pickup node j+ of request j is the closest to pickup
is usually able to generate feasible solutions that satisfy the three node i+ of request i in the route.
constraints, i.e., the maximum duration, maximum capacity, and LIFO • Component-Insertion (M4 ): A component is removed from its cur-
constraints. In addition, high-quality solutions are generated within rent route and inserted either in a different position of the same
reasonable computing times. In Section 5.1 we demonstrate the efficacy route or in a different route.
of our procedure in comparison with other constructive methods pro- • Component-Swap (M5 ): Two components (in the same route or in
posed in the literature. different routes) exchange their positions.
• Double-Component-Swap (M6 ): Two pairs of consecutive compo-
3.3. Learning-based local search nents exchange their positions. This operator extends the
Component-Swap move by considering two pairs of consecutive
One of the key components of our memetic algorithm is the components.
learning-based local search procedure that plays the critical role of
intensifying the search. With the exception of tabu search, traditional In our local search, we only consider neighborhood moves that can
local search utilizes a set of moves to search the solution regions satisfy all the three constraints, i.e., the maximum capacity, maximum
without maintaining a memory of the process, while the local search duration, and LIFO constraints. Obviously, in cases where the con-
based on our reinforcement learning mechanism is able to effectively straints are tight, it is very difficult to find feasible solutions in a single
exploit memory to manage the neighborhood moves and guide the move as confirmed in benavent2015multiple. Hence, the main differ-
search to promising regions. ence of our approach from that proposed in Benavent et al. (2015),
where only the Request-Insertion move is used, is that we employ six
3.3.1. Neighborhood moves different neighborhood moves to expand the search space in order to
Our algorithm employs both intra-route moves (performed in the find more promising feasible solutions, instead of relaxing the max-
same route) and inter-route moves (performed between two different imum duration constraint and penalizing violations.
routes), as follows: The size and complexity of the neighborhood structure significantly
5
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Fig. 5. Box-and-whisker plots of the gaps between the objective values and the best found solutions for the considered parameters.
affect the performance of the algorithm. The request-insertion neigh- 3.3.2. Learning mechanism
borhood operator (M1) has n candidate requests to be removed, and Reinforcement learning is an area of machine learning concerned
nk × (2nk 1) possible positions for each candidate pickup and its with how an agent should take actions in an environment so as to
corresponding delivery node, where nk denotes the number of vehicles maximize cumulative reward. The intuition underlying reinforcement
in route k . Hence the size of M1 neighborhood is learning is that actions that lead to large rewards should be made more
k=K
n × k = 1 (nk × (2nk 1)) , where K denotes the number of routes. For likely to recur.
request-swap neighborhood operator (M2 ), the size of the neighborhood We employ a reward and penalty strategy to dynamically manage
is the same as the number of two candidate requests combinations, the neighborhood moves and guide the search based on the expectation
which is equal to n × (n 1) 2 . For double-request-swap neighborhood that different neighborhood moves may be preferable for different
operator (M3 ), the size of the neighborhood is (n 2) × (n 3) 2 . As problem instances or search landscapes. Consequently, we keep track of
for the component based neighborhood operators (i.e., the component- a score for each neighborhood move, which measures how well the
insertion operator (M4 ), component-swap (M5 ) and double-component- move has performed for the current instance or landscape, adopting the
swap (M6 )n), the size of each neighborhood is no more than its corre- perspective that alternating among different moves based on the pro-
sponding request-based neighborhood operators, since a component posed learning mechanism may yield more robust performance.
usually consists of several requests. Therefore, the complexity of all the To select moves, we assign scores to different moves and use the
six neighborhood operators employed in our method is bounded by roulette wheel selection principle. If we have n moves with scores sci
O (n3) . (i 1, 2, ...,n ), move k is selected with probability k , where
6
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Table 2
Results for solving public benchmark PDPLT instances with 100–110 nodes.
MS-ITS LMA MS-ITS LMA
Instances fbest Veh Time fbest Veh favg Time Instances fbest Veh Time fbest Veh favg Time
lc101 9867.61 10 65.91 9867.29 9 9868.01 16.31 lc201 9825.64 3 125.00 9824.76 3 9825.73 28.75
lc102 9862.70 9 59.19 9862.70 9 9862.70 27.87 lc202 9860.47 3 115.66 9860.24 3 9861.17 16.07
lc103 9867.84 9 56.47 9867.84 9 9867.98 10.69 lc203 9873.71 3 107.53 9873.71 3 9874.13 17.30
lc104 9857.95 10 62.22 9857.80 9 9857.95 41.32 lc204 9832.31 3 121.44 9830.38 3 9831.28 16.94
lc105 9848.78 9 54.31 9848.01 9 9848.83 4.94 lc205 9794.84 3 103.66 9794.84 3 9794.84 15.84
lc106 9866.18 9 60.28 9866.18 9 9866.18 10.71 lc206 9882.15 3 116.55 9882.15 3 9882.15 17.49
lc107 9874.48 9 62.25 9874.48 9 9876.24 33.08 lc207 9802.90 3 98.61 9802.90 3 9802.90 16.25
lc108 9862.42 9 60.89 9862.42 9 9863.11 48.03 lc208 9803.00 3 99.86 9803.00 3 9803.00 15.31
lc109 9855.31 9 52.73 9855.31 9 9855.32 22.76
lr101 2140.15 11 61.36 2110.53 10 2120.47 33.78 lr201 1967.24 3 188.69 1948.63 2 1955.15 16.49
lr102 2125.61 10 62.24 2114.12 10 2126.19 13.96 lr202 2130.35 3 188.00 2124.35 3 2129.76 10.86
lr103 2113.11 10 49.53 2111.05 10 2113.66 38.62 lr203 2102.09 3 208.83 2100.92 3 2101.99 18.68
lr104 2072.80 10 47.72 2064.99 10 2076.81 46.35 lr204 2184.12 3 189.06 2179.69 3 2185.68 12.80
lr105 2114.97 10 51.80 2109.91 10 2110.01 11.52 lr205 2079.14 3 202.53 2065.66 3 2085.43 23.56
lr106 2147.80 10 58.52 2126.12 10 2139.85 38.58 lr206 2111.64 3 168.78 2110.17 3 2111.32 19.20
lr107 2188.71 11 52.13 2173.23 10 2193.84 11.41 lr207 2146.86 3 188.17 2144.38 3 2148.59 75.53
lr108 2150.33 10 45.59 2148.11 10 2151.92 39.51 lr208 2090.03 3 170.94 2082.15 3 2093.70 16.28
lr109 2165.48 11 60.38 2154.84 10 2163.55 45.93 lr209 2066.29 3 211.49 2045.65 3 2058.91 21.55
lr110 2041.50 10 54.05 2041.50 10 2042.16 42.09 lr210 2073.73 3 193.95 2068.44 3 2070.08 97.72
lr111 2114.81 10 60.55 2110.61 10 2115.86 21.61 lr211 2027.38 3 216.20 2027.38 3 2027.98 16.87
lr112 2102.96 10 53.89 2098.09 10 2100.47 19.24
lrc101 2289.98 10 50.88 2265.37 10 2287.39 18.55 lrc201 2203.00 3 161.02 2168.34 3 2187.96 17.20
lrc102 2336.21 10 48.58 2331.00 10 2340.18 14.82 lrc202 2250.50 3 158.69 2204.66 3 2235.81 38.47
lrc103 2213.32 11 61.28 2195.80 10 2207.54 26.77 lrc203 2192.66 3 157.36 2174.51 3 2189.39 50.17
lrc104 2191.28 10 57.22 2190.22 10 2198.88 35.40 lrc204 2018.41 3 158.92 2018.41 3 2019.13 16.05
lrc105 2335.36 11 60.52 2305.44 10 2321.17 25.32 lrc205 2261.11 3 171.17 2225.98 3 2249.28 18.26
lrc106 2238.28 10 57.77 2226.83 10 2237.95 17.59 lrc206 2281.59 3 145.84 2263.27 3 2279.38 10.88
lrc107 2218.77 10 50.31 2204.83 10 2220.36 43.51 lrc207 2426.00 4 169.05 2366.94 3 2388.16 14.88
lrc108 2201.42 10 51.73 2201.10 10 2202.11 19.21 lrc208 2176.96 3 154.55 2104.36 3 2149.66 18.86
avg 4560.90 9.93 56.22 4553.30 9.69 4559.89 26.88 4424.60 3.04 158.95 4410.96 2.96 4420.09 24.38
#better 22 21
#equal 7 6
#worse 0 0
Table 3
Results for solving public benchmark PDPLT instances with 402–422 nodes.
MS-ITS LMA MS-ITS LMA
Instances fbest Veh Time fbest Veh favg Time Instances Best Veh Time fbest Veh favg Time
LC1_4_1 42336.75 30 405.08 42237.43 30 42340.19 307.99 LC2_4_1 40714.04 12 732.05 40670.71 12 40691.66 210.03
LC1_4_2 42366.74 30 382.77 42346.62 30 42410.56 357.37 LC2_4_2 41140.66 14 737.39 40866.52 12 40919.77 394.92
LC1_4_3 42599.50 30 402.63 42569.57 30 42578.89 386.35 LC2_4_3 41316.85 13 741.09 41014.62 12 41210.58 278.97
LC1_4_4 42482.09 30 417.38 42402.31 30 42456.06 355.22 LC2_4_4 41035.84 13 764.05 40870.53 12 40981.32 324.30
LC1_4_5 42312.18 30 395.00 42213.65 30 42289.31 358.27 LC2_4_5 40996.57 13 772.70 40735.31 12 40810.29 297.03
LC1_4_6 42284.78 29 437.97 42271.70 30 42299.72 388.61 LC2_4_6 40727.48 12 744.81 40594.96 12 40661.76 223.93
LC1_4_7 42257.38 30 391.61 42163.86 30 42231.43 303.67 LC2_4_7 41001.45 13 752.27 40687.18 12 40883.25 385.95
LC1_4_8 42409.92 30 400.83 42381.73 30 42399.15 385.35 LC2_4_8 40925.04 13 737.17 40667.13 12 40760.51 247.25
LC1_4_9 42657.31 30 377.06 42645.50 30 42660.04 371.78 LC2_4_9 40810.30 13 788.56 40745.74 12 40781.96 275.21
LC1_4_10 42539.79 30 406.77 42524.90 30 42539.11 376.14 LC2_4_10 41060.15 14 735.99 40873.01 13 40987.03 366.40
LR1_4_1 10502.64 18 840.63 9851.86 14 9917.68 274.60 LR2_4_1 12095.03 7 1598.11 11186.32 4 11199.84 231.22
LR1_4_2 10624.45 18 827.59 9845.99 14 9941.57 272.03 LR2_4_2 12087.73 10 1528.31 11126.38 4 11564.79 351.06
LR1_4_3 10515.92 18 805.61 9784.51 14 9856.29 284.36 LR2_4_3 11890.15 7 1578.91 10876.17 4 11239.41 314.20
LR1_4_4 9910.92 17 850.89 9147.95 14 9403.33 399.28 LR2_4_4 11445.21 10 1612.69 10371.95 4 10685.31 296.48
LR1_4_5 10190.46 16 762.41 9602.53 13 9820.47 374.79 LR2_4_5 11753.71 6 1622.72 11076.18 6 11542.49 399.52
LR1_4_6 10414.40 16 829.00 9952.60 14 10231.68 396.33 LR2_4_6 11549.06 6 1540.38 10666.13 4 10869.72 155.28
LR1_4_7 10321.30 20 851.63 9495.14 13 9853.49 298.65 LR2_4_7 11454.11 8 1658.30 10383.31 4 10534.97 398.30
LR1_4_8 9290.86 15 856.17 8922.60 12 9127.58 393.68 LR2_4_8 11684.98 6 1472.44 10688.82 5 10898.28 162.62
LR1_4_9 10447.86 18 827.13 9801.38 13 9910.65 373.73 LR2_4_9 11695.28 9 1574.20 10786.47 4 11349.26 136.39
LR1_4_10 10041.44 17 848.27 9417.07 13 9823.10 316.42 LR2_4_10 12228.19 8 1589.20 11064.19 4 11901.38 159.49
LRC1_4_1 9700.01 17 818.39 9178.79 14 9320.28 389.15 LRC2_4_1 10733.52 6 1617.91 9656.87 4 9810.79 349.17
LRC1_4_2 9651.76 16 838.09 8871.73 13 8913.31 305.77 LRC2_4_2 10444.30 5 1670.25 9515.38 4 9882.12 277.20
LRC1_4_3 9667.97 16 759.05 9220.59 14 9513.77 389.64 LRC2_4_3 10704.17 7 1577.23 9673.14 4 10106.65 361.75
LRC1_4_4 9015.70 14 772.53 8823.12 13 8948.09 399.86 LRC2_4_4 10857.97 12 1489.13 9477.39 5 9811.89 368.64
LRC1_4_5 9555.94 16 791.08 8879.86 13 9135.54 292.87 LRC2_4_5 10288.07 7 1612.36 9484.97 4 9879.17 256.31
LRC1_4_6 9363.88 17 814.67 9023.08 14 9136.72 365.37 LRC2_4_6 10596.35 10 1638.67 9275.77 4 9513.30 292.73
LRC1_4_7 9645.08 18 870.91 8828.10 13 9132.86 362.44 LRC2_4_7 11160.72 7 1551.36 9861.02 4 10391.22 385.31
(continued on next page)
7
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Table 3 (continued)
Instances fbest Veh Time fbest Veh favg Time Instances Best Veh Time fbest Veh favg Time
LRC1_4_8 9528.92 16 829.53 8855.54 13 9111.02 386.94 LRC2_4_8 10882.58 11 1478.17 9441.73 4 9689.08 358.19
LRC1_4_9 9634.73 17 837.19 8949.42 13 9095.33 189.92 LRC2_4_9 11054.00 7 1578.86 9668.99 4 9910.38 206.83
LRC1_4_10 9639.88 16 838.55 8975.52 13 9110.15 398.25 LRC2_4_10 10855.83 7 1590.75 9786.51 4 10213.11 389.08
avg 20730.35 21.17 682.89 20306.15 18.90 20450.25 348.83 21172.98 9.53 1302.87 20393.11 6.83 20656.04 295.13
#better 30 30
#equal 0 0
#worse 0 0
8
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Table 4
Performance comparisons among VNS, PTS, and LMA for solving PDPLD instances.
BKS VNS PTS LMA
Instance Size fbest Veh favg Gap(%) Time favg Gap(%) Time fbest Gap(%) Veh favg Gap(%) Time
brd14051 25 5086 2 5086.0 0 1.62 5087.2 0.02 0.54 5084 −0.04 2 5084 −0.04 0.41
51 9352 2 9352.0 0 7.27 9438.7 0.93 2.96 9354 0.02 2 9355.4 0.04 2.71
75 8543 2 8543.4 0 19.05 8544.8 0.02 6.84 8544 0.01 2 8544.2 0.01 2.48
101 11,169 2 11169.0 0 21.31 11169.0 0 6.42 11156 −0.12 2 11246.7 0.7 4.81
251 27195 8 28196.1 3.68 172.9 28730.2 5.65 119.51 26690 −1.86 5 27372.6 0.65 89.89
501 58028 8 59090.0 1.83 965.21 60699.8 4.6 907.25 57419 −1.05 8 57810.2 −0.38 330.32
751 96068 12 96421.5 0.37 1345.17 97976.6 1.99 4559.2 95167 −0.94 13 95597.1 −0.49 471.86
d15112 25 108207 2 108207.0 0 1.45 108439.0 0.21 0.31 108208 0 2 108211.3 0 1.09
51 178863 3 179086.0 0.12 5.29 179859.0 0.56 1.13 178866 0 3 178866.5 0 2.33
75 239511 4 244600.7 2.13 13.82 251313.9 4.93 3.71 239516 0 4 239516.1 0 8.29
101 325761 5 339504.3 4.22 14.19 346055.0 6.23 5.28 325761 0 5 325761.3 0 4.22
251 700366 10 720770.3 2.91 98.15 727627.5 3.89 200.22 697292 −0.44 10 703117.9 0.39 65.85
501 1145838 16 1165275.3 1.7 252.73 1188016.3 3.68 663.22 1143485 −0.21 17 1145970 0.01 330.32
751 1596048 22 1622964.0 1.69 1036.52 1637573.1 2.6 1765.95 1587880 −0.51 22 1600147.6 0.26 979.01
d18512 25 5086 2 5086.0 0 1.63 5087.2 0.02 0.54 5084 −0.04 2 5084 −0.04 1.38
51 9245 2 9256.8 0.13 6.67 9286.0 0.44 1.17 9250 0.05 2 9250.1 0.06 2.72
75 10147 2 10148.8 0.02 16.85 10180.7 0.33 8.41 10146 −0.01 2 10146 −0.01 2.92
101 11742 2 11765.6 0.2 31.13 11909.0 1.42 6.24 11614 −1.09 2 11747.2 0.04 18.53
251 27945 5 28933.5 3.54 229.93 29314.2 4.9 206.38 27757 −0.67 5 28757.9 2.91 204.08
501 56790 8 57787.4 1.76 733.32 58928.0 3.76 1241.01 56575 −0.38 8 58109.2 2.32 528.32
751 91670 11 94016.2 2.56 2346.58 95612.5 4.3 3765.94 90801 −0.95 12 91862.9 0.21 2970.85
fnl4461 25 2168 1 2168.0 0 2.35 2168.0 0 0.40 2170 0.09 2 2170 0 0.21
51 4830 2 4830.0 0 4.68 4830.0 0 1.11 4832 0.04 2 4832 0 2.49
75 7399 3 7432.2 0.45 15.83 7466.5 0.91 8.22 7406 0.09 2 7406 0.09 6.19
101 10608 4 10765.3 1.48 18.29 10897.8 2.73 7.32 10604 −0.04 4 10615.9 0.07 13.68
251 30651 4 31334.1 2.23 255.07 31782.4 3.69 210.30 30546 −0.34 4 30842.9 0.63 39.84
501 81994 7 83246.3 1.53 667.54 85081.8 3.77 1596.72 81454 −0.66 7 81702.6 −0.36 478.17
751 135940 12 138626.0 1.98 2163.77 139106.8 2.33 3172.93 135592 −0.26 12 138392.2 1.8 1908.04
nrw1379 25 3464 2 3464.0 0 2.00 3466.1 0.06 0.58 3465 0.03 2 3465 0 0.22
51 5398 2 5423.4 0.47 8.71 5499.3 1.88 2.43 5397 −0.02 2 5397 0 2.74
75 8207 3 8352.2 1.77 20.23 8403.4 2.39 11.84 8219 0.15 3 8220.5 0.16 11.66
101 11933 4 12220.9 2.41 33.06 12537.3 5.06 18.82 11733 −1.68 4 11989.7 0.48 23.78
251 31075 7 31635.0 1.8 96.03 32217.6 3.68 186.17 30957 −0.38 7 31348.5 0.88 76.51
501 68202 13 68962.5 1.12 293.56 70800.0 3.81 775.82 67684 −0.76 13 67914.8 −0.42 150.78
751 122587 21 124711.0 1.73 858.91 125859.0 2.67 1740.60 122179 −0.33 22 123753.9 0.95 771.56
pr1002 25 16221 1 16221.0 0 1.75 16221.0 0 0.21 16221 0 1 16221 0 0.28
51 47905 3 47989.0 0.18 4.60 47980.6 0.16 1.20 47129 −1.62 2 47293.5 −1.28 3.64
75 64102 4 64889.0 1.23 11.94 65197.8 1.71 7.49 63869 −0.36 4 63960.4 −0.22 8.98
101 87700 5 88260.8 0.64 13.78 88373.9 0.77 7.65 87692 −0.01 5 87717.8 0.02 19.01
251 25798 8 263657.4 2.51 60.59 264251.3 2.74 157.55 254460 −1.06 8 256864.3 −0.13 10.12
501 597464 14 611917.4 2.42 396.32 620169.5 3.8 776.39 595095 −0.4 15 599562.1 0.35 297.45
751 1008027 19 1031423.3 2.32 1045.75 1034424.5 2.62 1723.79 1000604 −0.74 20 1017101.2 0.9 375.67
avg 250.71 174422.21 6.40 177923.54 1.27 316.56 179942.44 2.27 568.57 173641.83 −0.39 6.45 174960.23 0.26 243.41
#better 29
#equal 2
#worse 11
Table 5
Comparisons of HIS with RCDP and RDPD on public benchmark PDPLT instances.
RCPD RPBD HIS
Instances Set fbest favg Vavg fbest favg Vavg fbest favg Vavg
9
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
where 2 n and lcs (Si, Sj ) denote the number of all the pickup and
delivery nodes and the length of the longest common subsequence
between Si and Sj , respectively. For example, given a solution S1 =
(i+, i , j+ , j , k +, k ), the candidate neighboring solution is represented
by S2 = ( j+ , j , k +, k , i+, i ) after a request-insert move by insert
request i after request k . The distance between S1 and S2 based on the
longest common subsequence is only 2 while the distance between S1
and S2 based on the classic Hamming distance can be 6. With respect to
the hamming distance of solutions S1 and S2 , these two solutions are
completely different solutions. However, both solutions can be
converted by using an insert neighborhood move. Hence, the longest
common subsequence mechanism can better match the insert and swap
moves.
Definition 2 ((Goodness score of a solution in the population).). The
goodness score GS (Si ) of a solution Si is defined by its objective function
value, as well as its distance to the population, as follows:
fmax fSi Dist (Si ) Distmin
GS (Si ) = × + (1 )× ,
fmax fmin + 1 Distmax Distmin + 1 (7)
where fmax and fmin denote the maximum and minimum objective
values of the individuals in the population PP , and Distmax and Distmin
are the maximum and minimum distances between a solution to the
population, respectively. The is a constant parameter. Since it could
be Distmax = Distmin , in the denominator, Distmax - Distmin + 1 was used.
In each generation, the offspring individual is inserted into the
population if the goodness score of the offspring is better than the worst
solution in the population according to the goodness score. Otherwise,
the offspring individual is discarded. It is clear that the greater the
Fig. 6. Comparative results of HIS with RCPD and RPBD in terms of (a)average
goodness score GS (Si ) is, the better is the solution Si . It is noted that this
total duration (b)the corresponding number of vehicles, respectively.
goodness score function simultaneously considers the factors of solution
quality and population diversity. On the one hand, we should maintain
avoid redundancy. The best insertion position of each component cp in a population of elite solutions. On the other hand, we have to empha-
the route is the one yielding the minimum value of f ' : size the importance of the diversity of the solutions to avoid premature
convergence of the population.
f '= (f ) , (5)
where (f ) represents the incremental objective value after in- 4. Computational studies
serting component cp in the route and denotes the number of couples
in component cp . As shown in Fig. 4, we insert component In this section we report extensive computational studies conducted
(1+ , 2+, 2 , 1 ) into route V1c and remove two couples (1+ , 1 ) and (2+, 2 ) to assess the performance of the proposed learning-based memetic al-
from I a and I b . Then, we insert the next component (3+ , 3 ) with the gorithm (LMA) with the state-of-the-art reference algorithms in solving
best value f ' into route V1c . Note that if all the cases for which one of the public benchmark instances of both PDPLD and PDPLT. Note that
remaining components is inserted into route V1c will violate the max- PDPLD is a special case of PDPLT where the latter simply considers very
imum time constraint, then the selected component (4+, 5+, 5 , 4 ) will high vehicle capacity and ignores the service times in the request nodes.
constitute one new route V2c .
4.1. Benchmark instances and experimental protocols
3.5. LCS-based population updating strategy
For experimental evaluations, we use two data sets in Benavent
To maintain healthy diversity of the population, we use the LCS- et al. (2015) and Cheang et al. (2012). The first set of benchmark in-
based population updating strategy introduced by Cheng and Peng stances was first proposed in Li and Lim (2003) for the pickup and
(2016) to solve the job shop scheduling problem, to decide whether the delivery problem with time windows (PDPTW), which consists of six
improved solution should be inserted into the population or discarded. classes of instances, namely lc1, lc2, lr1, lr2, lrc1, and lrc2, where the
This population updating strategy simultaneously considers the solu- nodes in the lc instances are in clusters, in the lr instances are randomly
tion quality and the distances among the individuals in the population distributed, and in the lrc instances are partially clustered and partially
10
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Table 6
Comparison of the memetic algorithms with and without the learning-based mechanism on public benchmark PDPLT and PDPLD instances.
WLMA LMA
Fig. 7. Percentage of the chosen times for six neighborhood moves in LMA.
randomly distributed. Benavent et al. (2015) modified these original ignoring the service time in each request vertex. These instances were
instances to match the new problem PDPLT by setting the maximum derived from the six TSP instances fn14461, brd14051, d15112,
route duration equal to the width of the time window associated with d18512, nrw1379, and pr1002 extracted from TSPLIB (Reinelt, 1991).
the depot and ignoring all the other time windows. Hence, in our For each of these TSP instances, subsets of vertices were selected with
computational studies, we used the same benchmark instances as 25, 51, 75, 101, 251, 501, and 751 vertices. In addition, we imposed a
thosed used in Benavent et al. (2015). The instances can be divided into travel distance limit dmax on each instance, where
two subsets of 116 instances, including 56 small-size instances with dmax = max {d (0+, i+) + d (i+, i ) + d (i , 0 )} (i r ), which is the lar-
100–110 request nodes and 60 large-size instances with 402–422 re- gest distance for any route involving a single request. All the bench-
quest nodes. mark instances are publicly available on the website1, as well as the
The second set of benchmark instances was used in Cheang et al. executable files of our LMA method.
(2012) for PDPLD. Since PDPLD is a special case of PDPLT, we compare
our learning algorithm LMA with the reference algorithms in the lit-
erature by setting a very high vehicle capacity limit for each vehicle and 1
https://github.com/283224262/PDPLT.
11
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Fig. 8. Comparisons of LMA with its variants that exclude six different neighborhoods.
Fig. 9. Comparisons of LMA with its variants that use two crossover operator (i.e., one-point crossover and two-point crossover).
Fig. 10. Comparisons of LMA with its variant that uses the Hamming distance in the population updating procedure.
In this study we coded our LMA algorithm in C++ and ran it on a proposed in Chen, Hao, and Glover (2016) to scale the computing times
PC with a AMD Athlon 3.0 GHz CPU and 2 Gb RAM operating under the reported for the mentioned heuristics in the corresponding studies. The
Windows 7 operating system. To evaluate the performance of LMA, we procedure is based on the assumption that the CPU speed is approxi-
compare it with the following state-of-the-art heuristics from the lit- mately linearly proportional to the CPU frequency. Specifically, we
erature: performed ten independent runs of our LMA for each instance, with the
maximum time limit per run set to equal the scaled CPU times by
• The multi-start iterated tabu search (MS-ITS) for PDPLT proposed by multiplying the computing time reported by the current best-per-
Benavent et al. (Benavent et al., 2015), who implemented it on a forming algorithms from Benavent et al. (2015) and Cheang et al.
2.66 GHz Core 2Quad processor with 3 Gb RAM under the Windows (2012) with the ratio values (2.66/3.0) and (2.26/3.0), respectively.
XP operating system.
• Eight heuristics including a Variable Neighborhood Search (VNS)
4.2. Parameter tuning and parameter sensitivity analysis
algorithm, six reduced versions of the VNS heuristic, and the
Probabilistic Tabu Search (PTS) algorithm for PDPLD proposed in
Cheang et al. (2012), who tested their performance on a Dell server Table 1 presents the settings of the LMA parameters used in the
with an Intel Xeon E5520 2.26 GHz CPU and 8 GB RAM operating reported experiments. We tuned the parameters ( , 1, 2 , , , np , ,
under the Linux operating system. The computing times reported are and ) with the Iterated F-race (IFR) proposed by Birattari, Yuan,
in CPU seconds on this server. Balaprakash, and Stützle (2010) and an automated configure method
that is part of the IRACE package from Lopez-Ibanez, Dubois Lacoste,
In order to achieve relatively fair comparisons, we apply the method Caceres, Birattari, and Stützle (2016). We performed the tuning process
on instances LC1, LR1, LRC1, brd14051, d15112, d18512, fnl4461,
12
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
Table 7
Summary for the key components in LMA.
Key components Description Motivation Influence and results
Hybrid initial solution (HIS) HIS adapts the splitting mechanism for PDPLT The similar initial generation method is Experimental results in Section 5.1 show
method by employing the Lin-Kernighan heuristic (LKH) successfully applied to other routing problems that HIS can generate better initial
for the ATSP sub- problem in PDPLT to improve (such as multi-depot VRP and multi-route VRP). solutions than two previous methods
the solution quality of the initial solutions. RCDP and RPBD.
Learning-based mechanism Reinforcement learning mechanism is able to (1) The structure of the problem instances Experimental results in Sections 5.2 and
effectively exploit memory to manage the affects the performance of neighborhood 5.3 show the effectiveness of the learning
neighborhood moves. moves; (2) Alternating among different moves based-mechanism and six neighborhood
based on the proposed learning mechanism moves.
yield more robust performance.
Components-based crossover Two steps based on the iterated greedy method (1) The classic one-point and two-point Better performance in comparison with
to generate the feasible offspring solution. crossover usually generate a infeasible offspring two crossover operators in Section 5.4
solution; (2) The dedicated component-based shows its importance.
crossover based on the problem structure may
obtain a better performance.
Longest common subsequence We define the distance and goodness score in the The similarity of two solutions based on the Compared with the Hamming distance
(LCS) based population population, which are based on the longest longest common subsequence could better based updating strategy, LCS-based
updating strategy common subsequence strategy match the neighborhood moves based on the updating strategy can obtain a slightly
insert and swap operations. better performance (see Section 5.5).
13
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
in percentage (%), and computing time Time obtained by the VNS and particular, in terms of the best and average objective values, LMA ob-
PTS algorithms in ten executions in columns 5–10. The last six columns tains better values than WLMA for all the tested instances, as illustrated
show that LMA obtains better results for 29 out of 42 instances, matches by WLMA's 66473.22 and 67290.66 versus LMA's 66241.09 and
the current best-known results for 2 instances, and is slightly worse 66757.84. As for the average running time, the two algorithms are very
than MS-ITS for 11 instances. In addition, LMA outperforms VNS and close to each other, i.e., WLMA's 223.81 versus LMA's 198.48. Specifi-
PTS in terms of the average objective value, the average gap and the cally, LMA takes shorter (longer) average computing times to reach the
average computing time for most tested instances. average best results on 13 (5) sets of the tested instances, respectively,
To summarize, the results of the above extensive computational compared with WLMA.
studies demonstrate the efficacy of LMA in tackling PDPLT in terms of In order to further study the characteristics of the learning me-
both solution quality and computational efficiency. chanism. We obtain the percentage of the chosen times for six neigh-
borhood moves during the search of LMA for ten representative in-
5. Analysis and discussions stances with different scales2. We run each instance once with the
maximum time limit of 360 s by using LMA. Fig. 7 shows the experi-
In this section we analyze several key components and one im- mental results for the percentage of the chosen moves for all the six
portant parameter of LMA, including the hybrid initial solution (HIS) neighborhood moves in LMA. The probability of choosing small moves
method, the learning-based mechanism to manage the neighborhood with no more than two requests (such as M1 and M2 ) clearly decreases as
moves, the six neighborhood moves, the component-based crossover the size of the instances increases (for sizes ranging from 25 to 501). In
operator, and the parameter for the goodness score, with a view to contrast, the probability of choosing large moves based on components
understanding their impacts on the performance of LMA. (such as M4 , M5 and M6 ) significantly increases. One can clearly observe
that the large moves are preferred in the large scale instances. In gen-
5.1. Comparisons between the hybrid initial solution method and the eral, the probability of each move being selected varies for difference
previous construction methods instances, which are significantly different from the token-ring fashion
for WLMA (with the same probability for each move) verifying the
To study the effectiveness of the proposed HIS method, we compare impact of our proposed learning mechanism.
it with the two effective reference methods, namely random solution The above results indicate that our learning-based mechanism plays
with consecutive pickups and deliveries (RCPD), and random solution a crucial role in boosting the performance of LMA in solving PDPLT.
with pickups before deliveries (RPBD), proposed in Benavent et al.
(2015) for PDPLT. Since all the initial solution generation methods only 5.3. Effectiveness of the neighborhoods
run within a very short time (in fact, less than one second) for all the
benchmark instances, we ignore the comparison in terms of computing As described in Section 3.3.1, our LMA procedure employs six
time. We performed ten runs of each method to solve each instance and dedicated neighborhood structures (M1-M6 ). In this section, we in-
recorded the best objective value (minimum duration) and the corre- vestigate the influence of each neighborhood on the performance of the
sponding number of vehicles. Table 5 and Fig. 6 present the compar- LMA algorithm. For this purpose, we propose six variants of LMA such
isons of the HIS method with the RCDP and RPBD methods on different that for each LMA variant, we disable one particular neighborhood
classes of instances. Table 5 presents for each instance set its name, the while keeping other components unchanged. Along with the standard
average results in terms of the best objective value fbest and the average LMA algorithm, six LMA variants are tested on ten representative in-
objective value favg for each instance set, and the average number of stances. Specifically, for each instance, all the reference methods were
vehicles Vavg corresponding to the best solution, which is rounded to an iteratively performed until a pre-fixed maximum time of LMA given in
integer. Fig. 6 graphically shows the comparisons of HIS with RCDP and Tables 2–4 is reached. The best and the average performances are
RPBD, where the horizontal axis denotes the sets of instances, and the plotted in Fig. 8, where the y-axis indicates the percentage gap to the
vertical axis denotes the total duration and the number of vehicles, best-known solutions (see Tables 2–4). Fig. 8 shows that the manner in
respectively. which removing a neighborhood deteriorates the search power of LMA,
From Table 5 and Fig. 6, we observe that HIS is able to obtain better and confirms that all six neighborhoods contribute to the performance
results than RCDP and RPBD in terms of the best objective value and the of the LMA algorithm. Apart from the usefulness of each individual
corresponding number of vehicles. As the size of the instances becomes neighborhood, their combined use within the LMA algorithm con-
larger, the advantage of LMA becomes more evident. In sum, HIS is a stitutes an important feature to ensure the performance of the whole
very effective initial solution generation method for PDPLT. algorithm.
14
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
point crossover operator, two cutting points are randomly selected to 6. Conclusion
divide the parent solutions. By reference to the two parent solutions S1
and S2 , the offspring is obtained by first transferring the sub-sequence Our learning-based memetic algorithm LMA for the pickup and
between the two cutting points from S1, and then copying in the same delivery problem under the LIFO loading policy demonstrates the ef-
order the remaining nodes from S2 . In contrast to the component-based fectiveness of its key features, which include a hybrid initial solution
operator, both two reference operators usually generate infeasible so- construction method, a learning-based local search procedure, a com-
lutions, violating the LIFO constraint. Hence, we employ the request- ponent-based crossover operator utilizing the ideas of structured com-
insertion neighborhood operator to reschedule the position of the binations, and a longest-common-subsequence-based population up-
conflicting requests to make the infeasible solution satisfy the max- dating strategy.
imum duration, maximum capacity, and LIFO constraints. For this Experimental evaluations on two sets of public benchmark instances
analysis, we repeat the same experimental procedure as in the previous show that our LMA performs very favourably compared to the current
section. The best and the average performances are plotted in Fig. 9. In state-of-the-art reference algorithm MS-ITS benavent2015multiple and
terms of the best and the average results, we observe a better perfor- the eight heuristics including a VNS algorithm, its six reduced versions,
mance of LMA when the component-based crossover operator is used, and PTS for PDPLD proposed in Cheang et al. (2012). In particular, LMA
which highlights its importance in LMA. is able to obtain highly competitive results in terms of both computa-
tional efficiency and solution quality for most of the PDPLT and PDPLD
instances, improving the best-known results for 132 out of 158 of the
5.5. Impact of longest common subsequence based updating strategy tested problem instances, while matching the best-known results for all
but three of the remaining instances. In addition, our computational
To investigate the merit of the longest common subsequence me- studies demonstrate the effectiveness of the key strategies incorporated
chanism, we compare LMA with a variant that uses the Hamming dis- into our proposed LMA.
tance in the population updating procedure, as described in Section 3.5. These outcomes motivate future research to extend our work in the
For this analysis, we repeat the same experimental procedure as in the following directions: First, it would be interesting to employ a powerful
previous section. The best and the average performances are plotted in tabu search method (such as granular tabu search) to improve the
Fig. 10. In terms of the best and the average results, we observe a search capability of the learning-based local search phase. Second, the
slightly better performance of LMA when the longest common sub- design of our approach invites the development of related procedures
sequence is used. In particular, LMA is able to obtain better values for that combine its strategies with those of other population-based fra-
the best results than the LMA variant that uses the Hamming distance meworks like path-relinking (Glover, Laguna, & Marti, 2000) should be
for two solutions, although both methods perform similarly in terms of very promising. Finally, the success of these ideas for tackling the
the average results, which show the effectiveness of the longest PDPLT problem suggests that it would be worthwhile to test their
common subsequence for LMA. performance in dealing with other variants of the vehicle routing pro-
blem.
The LCS-based population updating strategy considers both solution Bo Peng: Conceptualization, Methodology, Writing - original draft.
quality and the distances between solutions and the population in up- Yuan Zhang: Conceptualization, Methodology, Writing - original draft.
dating the search. The parameter is used to balance these two factors Zhipeng Lü: Writing - review & editing, Project administration,
in order to achieve a better trade-off between intensification and di- Funding acquisition. T.C.E. Cheng: Writing - review & editing, Project
versification of the search. In order to ascertain the importance of in administration, Funding acquisition. Fred Glover: Writing - review &
LMA, we conducted the following computational studies. editing.
. Specifically, we selected eight representative sets of instances with
different scales of the benchmark instances, namely lc2, lr2, lrc2, LC2, Acknowledgments
LR2, LRC2, fn14461 and pr1002. Taking into account the randomness
of the algorithm, we ran it ten times for each parameter setting (0, 0.1, This research was supported in part by the National Natural Science
, 1.0). Fig. 11 presents the average results over all the tested in- Foundation of China under grant numbers 71320107001, 61370183,
stances, where the horizontal axis denotes the value of the parameter 71871184, the Fundamental Research Funds under grant numbers
and the vertical axis denotes the computing time and the objective JBK2001013 for the Central Universities of China, and the Programme
values normalized by the following normalized function: for New Century Excellent Talents in Universities (NCET 2013).
z zmin
(z ) = . References
zmax zmin (8)
Azadian, F., Murat, A., & Chinnam, R. B. (2017). An unpaired pickup and delivery pro-
Fig. 11 shows the trajectories of the computing time and objective blem with time dependent assignment costs: Application in air cargo transportation.
value over different values. When is equal to 1, which gives all the European Journal of Operational Research, 263(1), 188–202.
Azi, N., Gendreau, M., & Potvin, J.-Y. (2014). An adaptive large neighborhood search for
weight to the objective value and none to the population distance, the
a vehicle routing problem with multiple routes. Computers & Operations Research, 41,
corresponding algorithm is the fastest but cannot find high-quality so- 167–173.
lutions because the algorithm does not consider the diversity of the Benavent, E., Landete, M., Mota, E., & Tirado, G. (2015). The multiple vehicle pickup and
population, while the algorithm can obtain the best values with set at delivery problem with LIFO constraints. European Journal of Operational Research,
243(3), 752–762.
0.6 to 0.8. The results demonstrate that premature convergence of the Benlic, U., & Hao, J. K. (2015). Memetic search for the quadratic assignment problem.
algorithm can be avoided by employing the LCS-based population up- Expert Systems with Applications, 42(1), 584–595.
dating strategy. We further observe that the best-performing parameter Birattari, M., Yuan, Z., Balaprakash, P., Stützle, T. (2010). F-race and iterated f-race: An
overview.
values are the same as the best parameter values analyzed in Fig. 5. Bortfeldt, A. (2012). A hybrid algorithm for the capacitated vehicle routing problem with
At last, we summarize and explain the key proposed components, as three-dimensional loading constraints. Computers & Operations Research, 39(9),
well as highlighting their corresponding motivations and influences in 2248–2257.
Bortfeldt, A., & Homberger, J. (2013). Packing first, routing seconda heuristic for the
our LMA in Table 7. vehicle routing and loading problem. Computers & Operations Research, 40(3),
15
B. Peng, et al. Computers & Industrial Engineering 142 (2020) 106241
16