This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
In recent years, with the booming development of e-commerce and the rapid popularity of mobile payment, online shopping has been developing at a high speed and the number of parcels has increased every year. Online shopping has grown at a quick pace in recent years, thanks to the blooming expansion of e-commerce and the rapid popularity of mobile payment. The number of deliveries has climbed every year. China’s e-commerce transactions were only 3.14 trillion yuan in 2008, but they have already risen to 31.63 billion yuan in 2018, a ten-year increase of tenfold, resulting in an explosive increase in the number of orders processed by e-commerce sorting centers per day [1]. However, as consumer demand continues to diversify and personalize, it is clear that the traditional picking center manual picking method is no longer able to control picking costs while responding quickly to customer order demands. Picking operations account for around 90% of the cost within the picking center, labor costs directly involved in picking operations account for approximately 0%, and picking time accounts for as much as 30% to 40% of the overall operating time. [2]. This shows that the real-time nature of picking operations is a key aspect of picking efficiency improvement and is crucial for e-commerce companies to enhance their core competitiveness. For example, Taobao’s “Double Eleven” and “Jingdong618’s” have raised the demand for real-time and efficient logistics sorting center solutions. Automated Guided Vehicles (AGVs) are increasingly being employed in logistics distribution centers as advanced automated logistics equipment in e-commerce, superstores, express delivery, and other industries. With the newest picking technology to manage finely orders and inventory items, a picking center can handle up to 2 million parcels in a single day and up to 4100 orders for custom clearance in one minute. The picking center’s real-time order picking efficiency places exceptionally high demands behind the massive order data.
There are currently two main types of logistics picking operation modes, namely, multi-picker synchronous picking mode and multi-picker asynchronous picking mode. Multi-picker synchronous picking mode means that all picking stations must start at the same time for the same batch of tasks, and the next batch of tasks can only be performed when all picking stations have completed the current batch of tasks. Multi-picker asynchronous picking mode means that all logistics AGVs are grouped together, with each group serving one picker, and each picker can complete its work independently. It can be seen that the multi-picker synchronous picking mode has the problem of wasted picker resources waiting due to synchronous processing, while the multi-picker asynchronous picking mode has the problem of uneven task allocation of AGV resources due to the global grouping of AGVs.
Therefore, this paper proposes a hybrid picking mode for the problem of uneven task allocation between synchronous picking waiting time consumption and asynchronous picking, i.e., a real-time picking operation mode according to the order reach time and the picking table idle rate, which solves the problem of time wastage caused by waiting in synchronous picking and the problem of uneven task allocation for nonglobal AGVs in asynchronous.
2. Literature Review
In essence, the picking path optimization problem with multiple picking stations studied in this paper belongs to the special vehicle path optimization problem (Vehicle Routing Problem, VRP). In 1959, the vehicle path problem (VRP) was proposed by Dantzig [3]. The traveling salesman problem (TSP) in vehicle routing Problem (VRP) was proved by Gaery [4] in his paper to be an NP-hard problem, so VRP problem is also an NP-hard problem. The NP-hard problem is not only of great significance in theoretical research but also of great significance in real life, as evidenced by the vehicle path optimization problem.
In recent years, Praitk [5] has studied the effect of shelf quantity and shelf length on picking efficiency and proposed a picking equipment event model. Chan and Chan [6] investigated the effects of warehouse storage allocation strategy, picking path strategy, and picking density on picking efficiency. Lu et al. [7] focused on the implementation process of the order batch picking method (GASM) and the basic genetic algorithm picking method (GABM), and compared and analyzed the S-type picking path strategy and the midpoint slewing picking path strategy. Zhi and Ailan [8] suggested that the order picking strategy and path optimization were studied according to the distribution of goods in the order in the warehouse. And, Yi [9] proposed to establish an order management information system and strengthen the picking operation and the docking of the overbank distribution system and other strategies to optimize the picking operation with words. Caunhye et al. [10] studied the path optimization model in emergency logistics and analyzed the methods of path optimization in terms of the distribution situation of rescue supplies, the location of facilities, and casualty transportation, respectively.
Some scholars have considered both order batching and picking path optimization together, in order to improve the efficiency of picking operations. Scholz et al. [11] proposed a variable neighborhood descent algorithm capable of handling large amounts of data by considering order binning, order assignment, batch sorting, and picking paths, and a new heuristic initial solution construction method was designed in order to speed up the solution over the vehicle. Chen et al. [12] combined the three problems of order batching, path optimization, and batch sorting to design a mixed integer programming model, and used a hybrid coded genetic algorithm and an ant colony optimization algorithm to solve the problem, and the experimental results proved that the solution speed and solution quality of the algorithm were due to the multi-objective genetic algorithm and the due date optimization algorithm. Matusiak et al. [13] received inspiration from a large shopping mall and used simulated annealing algorithm and A
In today’s era of rapid e-commerce development, how to respond quickly to customer needs is becoming a core issue for e-commerce and logistics companies. In order to improve the effectiveness of real-time picking, Amazon’s “KIVA” system and Jingdong’s “Tianwoldi Wolf System” have been born one after another. Various logistics companies in the industry have carried out innovation and exploration in picking system in picking route optimization. By combining the coordinate points of the storage area in the warehouse, the link path, using the ant colony algorithm and other intelligent algorithms to calculate the shortest picking path from the main channel to the picking table, improve the picking efficiency, and gradually improve the efficiency of real-time picking.
Using neural network to solve the problem of path optimization can be traced back to 1985. Hopfield [14] used Hopfield networks for solving small CVRP instances. In 2015, Vinyals et al. [15] proposed a Pointer Network (PN) based on Recurrent Neural Network (RNN), which made a qualitative leap in the study of combinatorial optimization problems. Kool et al. [16] drew on the transformer model for a variety of path optimization and proposed a transformer model with rollout baseline to remove position embedding for multiple path optimization problems, which was applied to the VRP example. Ibrahim et al. [17] developed a reinforcement learning technique to find the optimal path from the parking lot to the customer base. Zhang and Yin [18] studied the optimization of supply paths in emergency situations and solved the supply routes based on a hybrid particle swarm algorithm under the premise of obtaining emergency orders. Sun et al. [19] established a mathematical model with capacity constraints and optimized the picking path based on genetic algorithm for a dual-zone distribution center.
There are also some scholars to decompose the multi-pair multi-path optimization problem (MDVRP). David Pisinger and Ropke [20] proposed a heuristic algorithm for solving single-center VRP and applied it to solving MDVRP. Yu et al. [21] firstly simplified MDVRPTW into multiple VRPTW by clustering heuristic classification algorithm, and then used ant colony algorithm to solve each VRPTW. Jiali and Xiuping [22] designed a “two-stage adaptive genetic algorithm” based on multiple scanning operations and stage information feedback based on the idea of “segmented processing and overall optimization.”
In summary, path planning is the focus of scholars’ research to improve the efficiency of picking operations and reduce the picking cost of sorting centers. In order to improve the picking efficiency of the picking center while responding quickly to the customer’s order demand, this paper proposes a hybrid real-time picking model of AGV based on the real-time operation of logistics AGVs, and designs an improved transformer algorithm to solve the task mobilization problem, and finally verifies the effectiveness of the algorithm through simulation experiments. The rest of this paper is organized as follows: Section 3 describes the AGVs of picking center path optimization problem; Section 4 introduces the improved path optimization model based on deep reinforcement learning method; Section 5 performs the warehouse picking example validation; Section discusses the simulation experiment results; Section 6 and Section 7 give the conclusion and outlook of this paper.
3. A Mathematical Model of Mixed Real-Time Picking Mode with Multiple Picking Stations
3.1. Problem Description of the Hybrid Picking Model
An intelligent storage system is known to have multiple picking stations and a certain number of goods picking operations; each picking station can individually initiate picking tasks in real time, the system order portfolio arriving at the picking station is divided by order items, the AGV can visit multiple shelves of bays at a time to pick the goods required by the order according to the items and deliver these goods to the picking station. The shelves are stationary, and each shelf includes vertical levels. The AGVs are located in a uniform buffer zone when idle, have their own storage shelves and are capacity-constrained, so they can pick multiple items at once (e.g., the AGV TURO). When an order comes in, the AGV immediately picks the goods at the goods level according to the order requirements and sends them back to the picking table for sorting. The requirement is to design a reasonable picking path from the current scenario based on the order requirements and the location of the goods, and to efficiently complete the real-time picking task. A schematic of the picking path with capacity constraints in mixed picking mode is shown as Figure 1.
[figure(s) omitted; refer to PDF]
The actual problem is abstracted and the following assumptions are made.
(1) When an order comes in, priority is given to the picking table with a high degree of availability and a short distance from the center of gravity of the order item for picking.
(2) All AGVs are in a unified buffer zone when they are idle; when an order comes, all AGVs will start from the same starting point and return to the starting point after completing the pickup task.
(3) AGVs depart from the picking table to each goods node for one-way pickup (one round), and an order can be completed by multiple rounds (AGVs) together.
(4) The AGV comes with its own shelf and has the capacity to be idle; when the pickup is carried out, the current pickup round ends immediately when the AGV’s shelf is full.
(5) The AGVs can go back and forth between the goods cargo and the picking table many times, and each round of the task accepts only one picking service from one AGV, and each time it departs to the goods node, it needs to meet the demand of the current goods picking at one time.
(6) Once an order arrives, the AGV is immediately assigned to pick up the goods.
3.2. Matching Method of Idle Picking Stations and Orders
3.2.1. Calculating the Idle Degree of Picking Table
First, the idle degree of picking stations is calculated, and based on the idle degree of idle picking stations, picking orders are prioritized to the picking station with the highest idle degree. Idle degree
3.2.2. Calculation of the Distance between the Center of Gravity of the Order Item and the Picking Table
Next, the distance between the center of the order item and the picking table is calculated with the following formula.
3.3. CVRP Mathematical Model for Hybrid Real-Time Picking Mode
To facilitate model construction and subsequent calculations, the relevant variables are assumed as 3.3.1.
3.3.1. Symbol Description
(2) Decision-making variables.
3.3.2. Objective Function
3.3.3. Constraints
The objective function (4) indicates that the sum of the travel distances of all AGVs is the shortest; equation (5) indicates that the total number of AGVs required for order picking is less than the total number of AGVs; equation (6) indicates that the amount of goods picked by each AGV does not exceed its capacity limit; equation (7) indicates that the amount of goods required for this order picking does not exceed the sum of the capacity limits of all AGVs; equation (8) indicates that each goods position is guaranteed to be visited by only one AGV; equation (9) indicates that each cargo space is allowed to be visited only once in the same picking round; equation (10) indicates that each AGV is empty from the picking table; equation (11) indicates that the AGV has sufficient loading capacity for any cargo space before the AGV picks that space; equation (12) indicates the elimination of subloops.
4. Algorithm Design for Hybrid Picking Mode
In previous reinforcement learning algorithms, it is more effective to solve purely theoretical routing problems. However, in this paper, starting from a realistic picking scenario, the shortest picking path solution is sought while considering the two prerequisites of “the idleness of the picking table” and “the distance between the center of gravity of the order item and the picking table.” The algorithm design of the hybrid picking mode in this paper refers to Wouter’s transformer model [16], and on this basis, the initial screening of the picking table and the order length alignment processing are added, which means that the limitation of the order volume at the moment of order arrival can be broken, so that the resources of the picking table can be more fully utilized and the picking efficiency of the intelligent AGV can be improved. The following is a detailed description of the algorithm.
4.1. Model Construction of Hybrid Picking Model
In order to solve the hybrid picking path optimization problem, this paper improves the transformer model. The specific idea is: using the method of placeholder expansion of picking nodes, a new placeholder encoding mechanism for aligning the order length of item order groups is proposed, so that it is transformed into an array of equal length, thus allowing the input data in the input batch to be solved for different numbers of picking positions; subsequently, in the encoding process, the input data are mapped into a high-dimensional space to obtain vector features. What’s more, in the encoding process, the input data are mapped into a high-dimensional space to obtain vector features, and then, in the decoding process, the solution model is constructed based on Markov Decision Process (MDP), and the algorithm model is shown in Figure 2.
[figure(s) omitted; refer to PDF]
4.1.1. Coding Model for Hybrid Picking
(1) Target Picking Table Selection
(A) Calculate the picking table idle rate
Based on the work duration of each picking table in the picking center during the day, the idle degree of each picking table in the idle state is calculated as shown in equation (1), and the idle rate is sorted from highest to lowest, and the picking table with the highest idle rate is selected for picking.
(B) Calculate the distance between the target picking table and the order center of gravity of the item to be picked
According to the warehouse management system and the warehouse control system, the order items are automatically divided according to the order commodity labels. In order to minimize the walking distance of the logistics AGV, the distance between the target picking table and the center of gravity of the order group to be picked for each item is calculated as shown in equation (2), and the picking table for entering the picking work is selected based on the principle of the shortest distance.
(2) Item Order Processing. During e-commerce promotions (e.g., “Double 11” and“618 ”), the number of real-time orders arriving at the picking center management system increases sharply for a short period of time. To address the irregular nature of the real-time order quantity, this paper adds to the encoding structure of the transformer and proposes a new placeholder encoding mechanism for aligning the order lengths of item order groups into arrays of equal length to address the traditional reinforcement learning model in which the key to parallel computation is based on the input of the regularized batch tensor dimension. The principle is to determine the longest length of the required number of picks in the irregular batch order array, fill the placeholders for the blank, idle picking stations between the required picking orders and the longest length, and form structured data after tensor transformation, as shown in equation (13).
(3) Data Embedding Processing. First of all input the coordinates of the goods location
[figure(s) omitted; refer to PDF]
(4) Picking Network Feature Extraction
(A) Attention mechanism
In order to calculate the features of the network graph between picking points and picking points and between picking points and picking tables, the features,
First, by using the model parameters
Next, the weights are normalized. We use the
Finally, the normalized weights are used to sum up with the weight of V. Based on the
To facilitate the understanding of the attention mechanism, the detailed steps of the attention mechanism are shown in the Algorithm 1, where the model parameters are presented above.
(B) Multi-headed attention mechanism
Algorithm 1: Attention mechanism.
Inputs: each embedded node after embedding
Output: information about the correlation of each picking node with other picking nodes
(1) Initializing the transformer network parameters
(2) For i = 1: L do:
(3) Calculate the values of q, k, v
(4)
(5) For j = 1: L does:
(6) Calculate the correlation between picking nodes i and other picking nodes j
(7) Calculating attention weights
(8) Calculating attention
(9) End for
(10) End for
The multi-headed attention mechanism is an algorithm for weighted messaging between cargo locations, and consists of multiple attention mechanisms that are stitched together to calculate the correlation information between picking points based on different dimensions to obtain the picking network feature vector
4.1.2. Decoding Model for Hybrid Picking
(1) Picking Node Relevance Embedding. First, the picking point features are obtained by coding
(2) Attention Mechanism. Like the attention mechanism for encoding, the attention mechanism in this stage calculates the direct correlation between the last picking node in the identified path order, i.e., the picking point about to depart, and the outstanding picking point to calculate the individual probabilities of the next picking node and determine the next picking point to go to.
(3) Masking Mechanism. The masking mechanism is used to mask the next round of unsuitable cargo bits after the probability of the next visit to the node of interest is generated using the attention mechanism. And, the parameters of the masking process are set as shown in Table 1.
Table 1
Mask parameter representation.
Number | Parameters | Indicates | Examples |
1 | Visited node vector | ||
2 | Node demand vector | ||
3 | Car capacity already used | ||
4 | Broadcast value | ||
5 | Truncated vectors (greater than the car capacity is truncated) | ||
6 | Mask vector (indicates nodes that cannot be accessed) |
In the masking process,
[figure(s) omitted; refer to PDF]
4.2. Training Optimization Components for Hybrid Picking Patterns
Here, we choose a reinforcement learning algorithm with a rollout baseline [22] as the evaluation parameter to train the model, which is a hybrid algorithm combining value-based and policy gradient-based algorithms with two neural network models built in, i.e., the transformer network model and the rollout network model.
4.2.1. Reinforcement of the Learning Network Model
In a realistic picking center environment, it is generally a grid layout that presents a north-south or east-west orientation, thus using the Manhattan distance to design the objective function, see equation (14).
In this paper, the training network uses the above encoding and decoding model, i.e., the transformer model, with a θ-parameterized stochastic policy π, which generates the conditional probability distributions of the nodes according to the attention mechanism, and sequentially generates the solution sequence points, see equation (14). Based on the generated solution sequence points, the loss function is calculated, see the definition of equation (15), wherein
Based on the loss function, the gradient optimization parameters of the objective function are calculated, which are defined as shown in equation (16).
Finally, the traditional stochastic gradient descent (SGD) process is replaced by the Adam first-order optimization algorithm, which backpropagates the gradients and updates the neural network weights iteratively using the training data, see the defined equation (17).
4.2.2. Reinforcing the Learning Baseline Model
Inputs: picking table and shelf locations
Training: calculated according to the Adam optimization algorithm
Algorithm 2: Reinforcement learning algorithms under mixed models.
Inputs: coordinates of picking table p, position
Output: The picked sequential cargo position is the solution
(1) Initialize transformer and rollout evaluation network model parameters. Initialize optimizer parameters and initialize learning rate
(2) Calculation of picking table idleness
(3) Calculation of the distance between the pintle order and the center of gravity of the picking table
(4) For i = 1: E do-
(5) Initialization parameters.
(6)
(7)
(8)
(9) For
(10) Initialization time step.
(11) Repeat
(12) According to the probability distribution
(13) Observing the new state
(14)
(15) Until the termination conditions are met
(16) Calculate the objective function of the training
(17) Calculate the evaluation value of the rollout baseline
(18) To find the gradient for the transformer model network.
(19) Gradient updates to the transformer model network.
(20) End for
(21) If one-sided paired t-test test<
(22)
(23) End if
(24) End for
Step (1) evaluate the network model parameters with 0 initializing transformer and rollout; step (2) with step (3) is to select the best picking station; step (4) iterative module; steps (5)–(7) randomly initialize the network strategy; step (9) perform batch training; step (10) start decoding; steps (11)–(15) generate decoding sequence, steps (16)–(29) calculate update gradient values and optimize the neural network parameters based on Adam’s algorithm. Steps (21)–(23) whether to update the evaluation function parameters.
5. Example Simulation and Analysis
5.1. Example Simulation of Hybrid Real-Time Picking Mode
A picking center is arranged in a regional layout according to commodity items. There are multiple picking stations in the center. The AGVs are in a unified charging cache area, and they work together to complete picking. The intelligent warehouse is set to have 5 picking stations and 360 goods slots, and the demand for each task in each batch of order is less than 1. The quantity of AVG is not limited, which is enough to meet the assigned task of order picking. In order to increase the complexity of the task and improve the universality of the algorithm, 13/14/15 picking order tasks are randomly generated according to the Poisson distribution adhered to at the time of order arrival. Since the distance between the CHARGING buffer of AGV and the picking point is too small compared with the total picking path of batch orders, the simulation experiment in this paper ignores the distance between the charging buffer of AGV and the first picking task point, and regards it as the AGV starting from the picking station and returning to the picking station when the loading capacity is insufficient. The relevant simulation data are shown in Table 1. In order to display the distribution of data intuitively, it is drawn in the form of a scatter diagram, and the warehouse coordinate diagram is shown in Figure 6.
[figure(s) omitted; refer to PDF]
The calculation data of the picking station is shown in Table 2and Table 3.The bold value in Table 2 indicates that under the conditions of comprehensively considering the idleness of the picking table and the distance from the center of gravity of the order to the picking table, we will select the optimal selection of the picking table entering the working state.
Table 2
Picking table idleness and item order center of gravity distance.
Picking table | Location | Vacancy rate | Distance from the center of gravity of the first group of item orders | Distance to the center of gravity of the second group of item orders | Distance to the center of gravity of the third group of item orders |
Picking table1 | (6, 17) | 0.6 | 9.75 | 17.22 | 31.77 |
Picking table2 | (14, 17) | 0.3 | 13.39 | 11.35 | 24.42 |
Picking table3 | (22, 17) | 0.57 | 19.79 | 9.46 | 17.65 |
Picking table4 | (30, 17) | 0.4 | 27.05 | 13.34 | 12.44 |
Picking table5 | (38, 17) | 0.5 | 34.64 | 19.86 | 11.22 |
Table 3
Original data for picking nodes.
Num | Picking stations 1 | Picking stations 3 | Picking stations 5 | |||
Coordinates | Demand | Coordinates | Demand | Coordinates | Demand | |
1 | (1, 9) | 0.13 | (27, 12) | 0.13 | (31, 6) | 0.03 |
2 | (4, 14) | 0.03 | (19, 13) | 0.10 | (37, 5) | 0.07 |
3 | (9, 11) | 0.17 | (22, 8) | 0.26 | (31, 9) | 0.07 |
4 | (3, 11) | 0.20 | (15, 12) | 0.2 | (39, 14) | 0.23 |
5 | (13, 8) | 0.10 | (25, 2) | 0.23 | (31, 14) | 0.20 |
6 | (9, 0) | 0.23 | (21, 0) | 0.40 | (31, 2) | 0.13 |
7 | (6, 11) | 0.13 | (27, 5) | 0.23 | (40, 6) | 0.07 |
8 | (3, 13) | 0.03 | (19, 6) | 0.33 | (43, 12) | 0.30 |
9 | (10, 11) | 0.27 | (16, 14) | 0.16 | (39, 1) | 0.27 |
10 | (0, 8) | 0.27 | (27, 3) | 0.10 | (39, 3) | 0.23 |
11 | (10, 13) | 0.40 | (21, 5) | 0.13 | (34, 5) | 0.30 |
12 | (3, 5) | 0.02 | (24, 11) | 0.10 | (33, 6) | 0.10 |
13 | (0, 6) | 0.03 | (25, 5) | 0.26 | (39, 2) | 0.07 |
14 | — | — | (19, 1) | 0.20 | (36, 6) | 0.07 |
15 | — | — | — | — | (34, 6) | 0.23 |
Then the data are loaded into the model to solve the task assignment above. The picking road map obtained is shown in Figure 7.
[figure(s) omitted; refer to PDF]
5.2. Algorithm Comparison
The experiments were implemented using Python 3.6, pycharm2019.1.3 of pytorch0.4.1, and the computer configuration used was: Intel(R) Core(TM) i5-6300U [email protected] GHz, 2496Mhz, 22 cores, 41 logical processor, 4 GB of RAM, and a 64 bit of Windows 10. operating system.
To analyze the effect of the proposed model, we choose three algorithms such as OR_ARC, OR_CHR, and PATH_CHEAPEST_ARC (PATH-ARC) from Google OR-Tools to perform the algorithm comparison analysis of length dimension and time dimension on the public dataset (Augerat standard dataset set A) . The datasets can be found at the URL https://neo.lcc.uma.es/vrp/vrp-instances/capacitated-vrp-instances/download.Firstly, the advantages and disadvantages of the algorithm are evaluated from three aspects: the optimal distance dissociation of the example (best), the distance solution obtained by the algorithm model (min), and the error between the experimental solution and the optimal solution of the example (DEV%) (DEV = (min best)/best). The specific experimental results are shown in Table 4.
Table 4
Comparison table of node algorithms.
Instance | Best | PRL | OR_CHR | PATH-ARC | OR_ARC | |||||
Min (m) | DEV (%) | Min (m) | DEV (%) | Min (m) | DEV (%) | Min (m) | DEV (%) | |||
1 | A-n32-k5 | 784 | 795 | 1.40 | 889 | 13.35 | 787 | 0.37 | 847 | 8.00 |
2 | A-n33-k5 | 661 | 704 | 6.51 | 761 | 15.07 | 676 | 2.32 | 692 | 4.65 |
3 | A-n33-k6 | 742 | 762 | 2.70 | 743 | 0.20 | 768 | 3.51 | 791 | 6.62 |
4 | A-n34-k5 | 799 | 836 | 4.63 | 840 | 5.17 | 813 | 1.80 | 878 | 9.90 |
5 | A-n37-k5 | 730 | 757 | 3.70 | 790 | 8.22 | 822 | 12.65 | 767 | 5.08 |
6 | A-n37-k6 | 822 | 863 | 4.99 | 931 | 13.25 | 876 | 6.59 | 924 | 12.36 |
7 | A-n38-k5 | 831 | 863 | 3.85 | 893 | 7.41 | 857 | 3.08 | 919 | 10.56 |
8 | A-n39-k5 | 937 | 1014 | 8.22 | 1004 | 7.12 | 1042 | 11.26 | 1058 | 12.96 |
9 | A-n44-k7 | 1146 | 1173 | 2.36 | 1213 | 5.81 | 1162 | 1.38 | 1248 | 8.89 |
10 | A-n45-k6 | 914 | 990 | 8.32 | 1036 | 13.33 | 1045 | 14.29 | 983 | 7.55 |
11 | A-n45-k7 | 1073 | 1101 | 2.61 | 1258 | 17.24 | 1156 | 7.73 | 1227 | 14.31 |
12 | A-n46-k7 | 1010 | 1099 | 8.81 | 1147 | 13.55 | 1119 | 10.78 | 1036 | 2.62 |
13 | A-n48-k7 | 1167 | 1178 | 0.94 | 1241 | 6.34 | 1294 | 10.89 | 1334 | 14.34 |
14 | A-n53-k7 | 1408 | 1456 | 3.41 | 1757 | 24.81 | 1596 | 13.35 | 1623 | 15.24 |
15 | A-n54-k7 | 1035 | 1127 | 8.89 | 1157 | 11.80 | 1243 | 20.11 | 1092 | 5.54 |
16 | A-n55-k9 | 1290 | 1374 | 6.51 | 1434 | 11.16 | 1431 | 10.96 | 1414 | 9.61 |
17 | A-n62-k8 | 1168 | 1227 | 5.05 | 1264 | 8.24 | 1215 | 4.01 | 1183 | 1.30 |
18 | A-n63-k9 | 1764 | 1916 | 8.62 | 1987 | 12.63 | 1895 | 7.45 | 1856 | 5.21 |
19 | A-n63-10 | 784 | 795 | 1.40 | 889 | 13.35 | 787 | 0.37 | 847 | 8.00 |
20 | A-n64-k9 | 661 | 704 | 6.51 | 761 | 15.07 | 676 | 2.32 | 692 | 4.65 |
21 | A-n65-k9 | 742 | 762 | 2.70 | 743 | 0.20 | 768 | 3.51 | 791 | 6.62 |
22 | A-n69-k9 | 799 | 836 | 4.63 | 840 | 5.17 | 813 | 1.80 | 878 | 9.90 |
Subsequently, in order to summarize the overall evaluation of each algorithm in the 22 group instances, four comparative analyses were performed in terms of the mean length of the algorithms, the total time, the mean DEV, and the standard deviation of the DEV, as shown in Table 5.
Table 5
Algorithm comparison table.
Projects | Distance mean (m) | Distance mean error rate (%) | DEV mean value | DEV standard deviation | Total time (s) |
Best solution | 966.68 | 0 | 0 | 0 | — |
PRL | 1015.09 | 5.01 | 4.85 | 2.60 | 0.7883 |
OR_ARC | 1049.09 | 8.52 | 8.36 | 3.86 | 7.0147 |
OR_CHR | 1071.73 | 10.87 | 10.39 | 5.69 | 5.8683 |
PATH_ARC | 1038.23 | 7.40 | 6.84 | 5.46 | 6.8282 |
As can be seen from Table 5, compared with the three heuristics of ortools, PRL algorithm proposed in this paper is the closest to the exact solution from the perspective of distance and length. And, from the time dimension analysis, PRL algorithm has obvious advantages of parallel algorithm, shortens the time, and can be solved quickly.
Then, this paper conducted significance tests for best solution and RL, OR_CHR, PATHARC, and OR_ARC, respectively, as shown in Table 6.
Table 6
Inter-subject effect test.
Dependent variable: | Best | ||||
Source | Class III sum of squares | Degree of freedom | Mean square | F | Significance |
Correction model | 1571745.414a | 4 | 392936.353 | 1772.163 | 4.97709E-22 |
Intercept distance | 534.751 | 1 | 534.751 | 2.412 | 0.138845022 |
PRL | 14975.580 | 1 | 14975.580 | 67.541 | 2.52493E-07 |
OR_CHR | 4.489 | 1 | 4.489 | .020 | 0.888531759 |
PATHARC | 305.100 | 1 | 305.100 | 1.376 | 0.256955406 |
OR_ARC | 6074.522 | 1 | 6074.522 | 27.396 | 6.7336E-05 |
Error | 3769.359 | 17 | 221.727 | ||
Total | 22133937.000 | 22 | |||
Total after correction | 1575514.773 | 21 |
a. R-squared = 0.998 (adjusted R-squared = .997).
From Table 6, it can be seen that the significance of the RL method and OR_ARC method is less than 0.05 and passes the level of the significance test, while the significance of the OR_CHR method and PATHARC method is bigger than 0.05 and does not pass the level of the significance test, given the significance of 5%. And, it can also be seen that there is a significant difference between the results obtained by the OR_CHR method and PATHARC method and Best solution, while there is no significant difference between the RL method and OR_ARC method and Best solution, and the difference between the RL method and Best solution is much smaller than the OR_AHC difference. It can be seen that the RL algorithm in this paper can solve the storage picking path problem well.
Next, we analyzed the time and space complexity of the algorithm, and the results are shown in Table 7.
Table 7
Time complexity and space complexity.
Algorithm | Time complexity | Spatial complexity |
OR_CHR | — | |
PATHARC | — | |
OR_ARC | — | |
PRL |
Table 7 shows that the spatial complexity of both the heuristic algorithm and Transformer is
As we all know, Google orTools is developed in C++, and its statements are executed much more efficiently than Python as a scripting language. However, according to the experimental data, the Google ortools heuristic algorithm developed by C++ has no advantage in computing time and optimal path length, and is obviously weaker than the PRL algorithm. Therefore, the PRL algorithm can be regarded as a good choice when solving the problem of warehouse routing.
To sum up, although the average path length planned by the PRL method is slightly longer than the optimal solution accurately solved, it is closer to the optimal solution compared with the three heuristic algorithms of Google ortools. From the mean value of DEV, the error with the optimal solution is 5.1%. From the standard deviation of DEV, the error between the path length solved by the PRL method and the optimal solution is controlled within 7.45% with a probability of 0.8413, while the other three methods are 12.22%, 16.08%, and 12.3%, respectively. From the perspective of time dimension, PRL has strong parallel computing capability, and the solving time is 1/12 of Google OR_tools heuristic algorithm, or even shorter.
6. Conclusion
With the rapid development of the Internet of Things technology, optimize the warehouse management system, establish and improve the intelligent picking system, for unmanned, intelligent transformation is the key direction of reform and development of e-commerce enterprises, courier enterprises.
This thesis addresses the picking path planning problem of multiple picking stations in a warehouse in a zoned area, and designs a PRL algorithm using placeholder control to solve the path optimization problem in a shared mode where the number of picking orders is unequal and the picking stations do not need to work synchronously, taking into account two factors: the idleness of the picking stations and the distance of the center of gravity of the order items. The simulation results have shown that the PRL algorithm in this paper is very close to the optimal solution and far better than the three heuristic algorithms of Google’s ortools in terms of solving accuracy. In terms of solving speed, it has the advantage of fast solving and can almost realize real-time solving. In the dimension of structure, it has the advantage of variable input and avoiding retuning. Therefore, this method is very effective to solve the problem of multiple picking stations with different order numbers in intelligent warehouse. It can solve the picking path effectively and reduce the picking cost [23].
7. Outlook
This paper focuses on the picking center intelligent AGV picking path planning problem, cleverly converting the MDVRP problem into a CVRP problem by two-stage method, and designing an improved Transformer algorithm for path planning, thus realizing real-time picking in picking centers with improved picking efficiency compared to traditional methods. Meanwhile, this paper can be promoted for scenarios such as UAV take-out delivery, port container loading and unloading, and logistics distribution. Since this paper is carried out without considering the limitation of AVG number of picking centers, the following research direction is the optimal path planning of picking centers under the constraint of AVGs’ number.
Disclosure
Deyao Wang, Jun Jiang, and Ran Ma are the co-first authors.
Authors’ Contributions
Deyao Wang, Jun Jiang, and Ran Ma contributed equally to this work. Deyao Wang performed conceptualization (providing part ideas), performed formal analysis (some formal analysis, such as time & space analysis and correlational analyses), wrote the final draft, developed the methodology (the modification of MDVRP two-stage solution of picking routing), performed visualization, reviewed & edited the article, and provided the software (support of computer code). Jun Jiangperformed conceptualization, performed data curation, performed formal analysis, developed the methodology (MDVRP two-stage solution of picking routing), provided the software (algorithm design for hybrid picking mode, simulation example, and comparison of algorithms), performed validation, wrote the original and final drafts, performed visualization, and reviewed & edited the article. Ran Ma performed conceptualization (providing part ideas), developed the methodology (design of coding model in reinforcement learning algorithm model), provided the software (support of computer code, such as comparison between simulation examples and some algorithms), wrote the original draft, and performed visualization (visualization of simulation example results). Guicheng Shen performed conceptualization and supervision and was responsible for funding acquisition.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (NO: 71831001) and Beijing Key Laboratory of Intelligent Logistics Systems (NO: BZ0211).
[1] K. Li, T. Liu, B. He, D. Xu, "Research on AGV path planning and scheduling in “goods-to-person” picking system," Open Journal of Social Sciences, vol. 37 no. 6,DOI: 10.4236/jss.2019.76001, 2021.
[2] X. Ma, "The development status and trend of warehouse picking technology," Logistics Technology and Applications, vol. 21 no. 6, pp. 111-114, 2016.
[3] C. Zhu, M. Liu, C. Wu, "Research progress and prospects of vehicle path problem in supply chain," Computer Integrated Manufacturing Systems, vol. 7 no. 11, 2001.
[4] D. L. Jiang, X. L. Yang, W. Du, "Genetic algorithms for vehicle path problems," Systems Engineering Theory and Practice, vol. 19 no. 6, pp. 40-45, 1999.
[5] P. J. Parikh, R. D. Meller, "A travel-time model for a person-onboard order picking system," European Journal of Operational Research, vol. 200 no. 2, pp. 385-394, DOI: 10.1016/j.ejor.2008.12.031, 2010.
[6] F. T. S. Chan, H. K. Chan, "Improving the productivity of order picking of a manual-pick and multi-level rack distribution warehouse through the implementation of class-based storage," Expert Systems with Applications, vol. 38 no. 3, pp. 2686-2700, DOI: 10.1016/j.eswa.2010.08.058, 2011.
[7] Z. Lu, Y. Han, S. Zhang, "A case study of order picking path optimization in distribution centers based on genetic algorithm," Logistics Technology, vol. 32 no. 9, 2013.
[8] Z. Jia, F. Ailan, "A review of research on manual order picking path optimization and decision making in distribution centers," Logistics Technology, vol. 33 no. 10, 2014.
[9] Y. He, "Order picking path optimization in logistics distribution centers with genetic algorithm," Business and Economic Research, vol. 34 no. 21, pp. 110-111, 2016.
[10] A. M. Caunhye, X. Nie, S. Pokharel, "Optimization models in emergency logistics: a literature review," Socio-Economic Planning Sciences, vol. 46 no. 1,DOI: 10.1016/j.seps.2011.04.004, 2012.
[11] A. Scholz, D. Schubert, G. Wäscher, "Order picking with multiple pickers and due dates - simultaneous solution of order batching, batch Assignment and sequencing, and picker routing problems," European Journal of Operational Research, vol. 263 no. 2, pp. 461-478, 2017.
[12] T.-L. Chen, C.-Y. Cheng, Y.-Y. Chen, L.-K. Chan, "An efficient hybrid algorithm for integrated order batching, sequencing and routing problem," International Journal of Production Economics, vol. 159 no. jan, pp. 158-167, DOI: 10.1016/j.ijpe.2014.09.029, 2015.
[13] M. Matusiak, L. Kroon, "A fast simulated annealing method for batching precedence-constrained customer orders in a warehouse," European Journal of Operational Research, vol. 236 no. 3, pp. 968-977, DOI: 10.1016/j.ejor.2013.06.001, 2014.
[14] J. H. John, W. T. David, "Neural computation of decisions in optimization problems," Biological Cybernetics, vol. 52 no. 3, pp. 141-152, 1985.
[15] O. Vinyals, M. Fortunato, N. Jaitly, "Pointer networks," Proceedings of the International Conference on Neural Information Processing Systems, pp. 2692-2700, .
[16] W. Kool, H. H. Van, M. Welling, "Attention, learn to solve routing problems!," . 2019, https://arxiv.org/abs/1803.08475v3
[17] A. A. Ibrahim, R. O. Abdulaziz, J. A. Ishaya, "Capacitated vehicle routing problem," International Journal of Regulation and Governance, vol. 7 no. 3, pp. 310-327, 2019.
[18] F. Zhang, X. Q. Yin, "Feeding path optimization considering order urgency," Journal of Shandong University of Technology: Natural Science Edition, vol. 29 no. 1, 2015.
[19] H. Sun, Ke Zhang, F. Zhang, "Genetic algorithm-based manual picking path optimization for dual-zone warehouses[J]," Journal of Qingdao University Engineering&Technology Edition, vol. 29 no. 1, 2014.
[20] D. Pisinger, S. Ropke, "A general heuristic for vehicle routingproblems," Computers & Operations Research, vol. 34, pp. 2403-2435, 2007.
[21] B. Yu, P. Jin, Z. Yang, "Two-stage heuristic algorithm for multi-center vehicle routing problem with time Windows [J]," Systems Engineering-Theory & Practice, vol. 32 no. 8, pp. 1793-1800, 2012.
[22] L Jiali, G Xiuping, "Multi-center open-loop vehicle routing problem with mutual exclusion and vehicle matching," Journal of Systems Management, vol. 25 no. 1, pp. 129-138, 2016.
[23] S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, V. Goel, "Self-critical sequence training for image captioning," Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179-1195, DOI: 10.1109/CVPR.2017.131, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2022 Deyao Wang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
In the order picking process of the warehouse center, considering the rapid increase in the volume of orders arriving at the picking center at the time of the promotional festival, a hybrid operation mode with multiple picking tables is used to meet the picking requirements of the huge number of real-time orders. Therefore, in this paper, a hybrid picking mode is proposed, taking into account both the idle degree of picking stations and their order item centers of gravity, and a new reinforcement learning algorithm embedding mechanism (PRL) with placeholder control is designed to solve the problem of a huge number of real-time item orders arriving at the picking center system on promotional holidays and in inconsistent quantities, and numerical simulations are performed for this algorithm. The experimental results show that the PRL algorithm in hybrid picking mode can handle a huge number of orders simultaneously and improve picking efficiency effectively.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer