Research on Hybrid Real-Time Picking Routing

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

In recent years, with the booming development of e-commerce and the rapid popularity of mobile payment, online shopping has been developing at a high speed and the number of parcels has increased every year. Online shopping has grown at a quick pace in recent years, thanks to the blooming expansion of e-commerce and the rapid popularity of mobile payment. The number of deliveries has climbed every year. China’s e-commerce transactions were only 3.14 trillion yuan in 2008, but they have already risen to 31.63 billion yuan in 2018, a ten-year increase of tenfold, resulting in an explosive increase in the number of orders processed by e-commerce sorting centers per day [1]. However, as consumer demand continues to diversify and personalize, it is clear that the traditional picking center manual picking method is no longer able to control picking costs while responding quickly to customer order demands. Picking operations account for around 90% of the cost within the picking center, labor costs directly involved in picking operations account for approximately 0%, and picking time accounts for as much as 30% to 40% of the overall operating time. [2]. This shows that the real-time nature of picking operations is a key aspect of picking efficiency improvement and is crucial for e-commerce companies to enhance their core competitiveness. For example, Taobao’s “Double Eleven” and “Jingdong618’s” have raised the demand for real-time and efficient logistics sorting center solutions. Automated Guided Vehicles (AGVs) are increasingly being employed in logistics distribution centers as advanced automated logistics equipment in e-commerce, superstores, express delivery, and other industries. With the newest picking technology to manage finely orders and inventory items, a picking center can handle up to 2 million parcels in a single day and up to 4100 orders for custom clearance in one minute. The picking center’s real-time order picking efficiency places exceptionally high demands behind the massive order data.

There are currently two main types of logistics picking operation modes, namely, multi-picker synchronous picking mode and multi-picker asynchronous picking mode. Multi-picker synchronous picking mode means that all picking stations must start at the same time for the same batch of tasks, and the next batch of tasks can only be performed when all picking stations have completed the current batch of tasks. Multi-picker asynchronous picking mode means that all logistics AGVs are grouped together, with each group serving one picker, and each picker can complete its work independently. It can be seen that the multi-picker synchronous picking mode has the problem of wasted picker resources waiting due to synchronous processing, while the multi-picker asynchronous picking mode has the problem of uneven task allocation of AGV resources due to the global grouping of AGVs.

Therefore, this paper proposes a hybrid picking mode for the problem of uneven task allocation between synchronous picking waiting time consumption and asynchronous picking, i.e., a real-time picking operation mode according to the order reach time and the picking table idle rate, which solves the problem of time wastage caused by waiting in synchronous picking and the problem of uneven task allocation for nonglobal AGVs in asynchronous.

2. Literature Review

In essence, the picking path optimization problem with multiple picking stations studied in this paper belongs to the special vehicle path optimization problem (Vehicle Routing Problem, VRP). In 1959, the vehicle path problem (VRP) was proposed by Dantzig [3]. The traveling salesman problem (TSP) in vehicle routing Problem (VRP) was proved by Gaery [4] in his paper to be an NP-hard problem, so VRP problem is also an NP-hard problem. The NP-hard problem is not only of great significance in theoretical research but also of great significance in real life, as evidenced by the vehicle path optimization problem.

In recent years, Praitk [5] has studied the effect of shelf quantity and shelf length on picking efficiency and proposed a picking equipment event model. Chan and Chan [6] investigated the effects of warehouse storage allocation strategy, picking path strategy, and picking density on picking efficiency. Lu et al. [7] focused on the implementation process of the order batch picking method (GASM) and the basic genetic algorithm picking method (GABM), and compared and analyzed the S-type picking path strategy and the midpoint slewing picking path strategy. Zhi and Ailan [8] suggested that the order picking strategy and path optimization were studied according to the distribution of goods in the order in the warehouse. And, Yi [9] proposed to establish an order management information system and strengthen the picking operation and the docking of the overbank distribution system and other strategies to optimize the picking operation with words. Caunhye et al. [10] studied the path optimization model in emergency logistics and analyzed the methods of path optimization in terms of the distribution situation of rescue supplies, the location of facilities, and casualty transportation, respectively.

Some scholars have considered both order batching and picking path optimization together, in order to improve the efficiency of picking operations. Scholz et al. [11] proposed a variable neighborhood descent algorithm capable of handling large amounts of data by considering order binning, order assignment, batch sorting, and picking paths, and a new heuristic initial solution construction method was designed in order to speed up the solution over the vehicle. Chen et al. [12] combined the three problems of order batching, path optimization, and batch sorting to design a mixed integer programming model, and used a hybrid coded genetic algorithm and an ant colony optimization algorithm to solve the problem, and the experimental results proved that the solution speed and solution quality of the algorithm were due to the multi-objective genetic algorithm and the due date optimization algorithm. Matusiak et al. [13] received inspiration from a large shopping mall and used simulated annealing algorithm and A $*$ algorithm to solve the order batching and path planning problems.

In today’s era of rapid e-commerce development, how to respond quickly to customer needs is becoming a core issue for e-commerce and logistics companies. In order to improve the effectiveness of real-time picking, Amazon’s “KIVA” system and Jingdong’s “Tianwoldi Wolf System” have been born one after another. Various logistics companies in the industry have carried out innovation and exploration in picking system in picking route optimization. By combining the coordinate points of the storage area in the warehouse, the link path, using the ant colony algorithm and other intelligent algorithms to calculate the shortest picking path from the main channel to the picking table, improve the picking efficiency, and gradually improve the efficiency of real-time picking.

Using neural network to solve the problem of path optimization can be traced back to 1985. Hopfield [14] used Hopfield networks for solving small CVRP instances. In 2015, Vinyals et al. [15] proposed a Pointer Network (PN) based on Recurrent Neural Network (RNN), which made a qualitative leap in the study of combinatorial optimization problems. Kool et al. [16] drew on the transformer model for a variety of path optimization and proposed a transformer model with rollout baseline to remove position embedding for multiple path optimization problems, which was applied to the VRP example. Ibrahim et al. [17] developed a reinforcement learning technique to find the optimal path from the parking lot to the customer base. Zhang and Yin [18] studied the optimization of supply paths in emergency situations and solved the supply routes based on a hybrid particle swarm algorithm under the premise of obtaining emergency orders. Sun et al. [19] established a mathematical model with capacity constraints and optimized the picking path based on genetic algorithm for a dual-zone distribution center.

There are also some scholars to decompose the multi-pair multi-path optimization problem (MDVRP). David Pisinger and Ropke [20] proposed a heuristic algorithm for solving single-center VRP and applied it to solving MDVRP. Yu et al. [21] firstly simplified MDVRPTW into multiple VRPTW by clustering heuristic classification algorithm, and then used ant colony algorithm to solve each VRPTW. Jiali and Xiuping [22] designed a “two-stage adaptive genetic algorithm” based on multiple scanning operations and stage information feedback based on the idea of “segmented processing and overall optimization.”

In summary, path planning is the focus of scholars’ research to improve the efficiency of picking operations and reduce the picking cost of sorting centers. In order to improve the picking efficiency of the picking center while responding quickly to the customer’s order demand, this paper proposes a hybrid real-time picking model of AGV based on the real-time operation of logistics AGVs, and designs an improved transformer algorithm to solve the task mobilization problem, and finally verifies the effectiveness of the algorithm through simulation experiments. The rest of this paper is organized as follows: Section 3 describes the AGVs of picking center path optimization problem; Section 4 introduces the improved path optimization model based on deep reinforcement learning method; Section 5 performs the warehouse picking example validation; Section discusses the simulation experiment results; Section 6 and Section 7 give the conclusion and outlook of this paper.

3. A Mathematical Model of Mixed Real-Time Picking Mode with Multiple Picking Stations

3.1. Problem Description of the Hybrid Picking Model

An intelligent storage system is known to have multiple picking stations and a certain number of goods picking operations; each picking station can individually initiate picking tasks in real time, the system order portfolio arriving at the picking station is divided by order items, the AGV can visit multiple shelves of bays at a time to pick the goods required by the order according to the items and deliver these goods to the picking station. The shelves are stationary, and each shelf includes vertical levels. The AGVs are located in a uniform buffer zone when idle, have their own storage shelves and are capacity-constrained, so they can pick multiple items at once (e.g., the AGV TURO). When an order comes in, the AGV immediately picks the goods at the goods level according to the order requirements and sends them back to the picking table for sorting. The requirement is to design a reasonable picking path from the current scenario based on the order requirements and the location of the goods, and to efficiently complete the real-time picking task. A schematic of the picking path with capacity constraints in mixed picking mode is shown as Figure 1.

[figure(s) omitted; refer to PDF]

The actual problem is abstracted and the following assumptions are made.

(1) When an order comes in, priority is given to the picking table with a high degree of availability and a short distance from the center of gravity of the order item for picking.

(2) All AGVs are in a unified buffer zone when they are idle; when an order comes, all AGVs will start from the same starting point and return to the starting point after completing the pickup task.

(3) AGVs depart from the picking table to each goods node for one-way pickup (one round), and an order can be completed by multiple rounds (AGVs) together.

(4) The AGV comes with its own shelf and has the capacity to be idle; when the pickup is carried out, the current pickup round ends immediately when the AGV’s shelf is full.

(5) The AGVs can go back and forth between the goods cargo and the picking table many times, and each round of the task accepts only one picking service from one AGV, and each time it departs to the goods node, it needs to meet the demand of the current goods picking at one time.

(6) Once an order arrives, the AGV is immediately assigned to pick up the goods.

3.2. Matching Method of Idle Picking Stations and Orders

3.2.1. Calculating the Idle Degree of Picking Table

First, the idle degree of picking stations is calculated, and based on the idle degree of idle picking stations, picking orders are prioritized to the picking station with the highest idle degree. Idle degree $t_{0 m p}$ The formula is shown in (1). $\begin{matrix} (1) & t_{0 m p} = \frac{t_{0 m f}}{t}, \end{matrix}$ where $t_{0 m f}$ denotes the idle time $v_{0 m}$ of idle picking stations, $t$ is the total working time, and M is the total number of picking stations in the idle state, where $m \in 1,2, \dots, M$ .

3.2.2. Calculation of the Distance between the Center of Gravity of the Order Item and the Picking Table

Next, the distance between the center of the order item and the picking table is calculated with the following formula. $\begin{matrix} (2) & \begin{cases} {\bar{x}}_{n} = \frac{\sum_{i = 1}^{L} x_{n 1}}{L}, {\bar{y}}_{n} = \frac{\sum_{i = 1}^{L} y_{n 1}}{L}, \\ L_{n m} = {\bar{x}}_{n} - V_{x m} + {\bar{x}}_{n} - V_{y m} . \end{cases} \end{matrix}$ where $x_{i}, y_{i}$ denote the x-axis coordinates and y-axis coordinates of the goods position where the i’th item order to be picked in the item order is located, and ${\bar{x}}_{n}, {\bar{y}}_{n}$ is the center of the nth item order, $L$ is the number of spaces used for the goods in the order, and $V_{x m}, V_{y m}$ are the horizontal and vertical coordinates of the free picking table.

3.3. CVRP Mathematical Model for Hybrid Real-Time Picking Mode

To facilitate model construction and subsequent calculations, the relevant variables are assumed as 3.3.1.

3.3.1. Symbol Description

$A = V, E$ : Picking network, where $V = v_{0}, v_{1}, v_{2}, \dots \dots, v_{L}$ represents the set of nodes and $v_{0}$ : represents the picking station

$K$ : the number of AGVs required for the current picking order, where $k \in 1,2, \dots, K$

$G$ : the total number of AGVs

$Q$ : maximum load capacity of the AGV

$q_{i}$ : the loading capacity required by cargo $j$

$c_{i j}$ : the picking cost from slot $i$ to slot $j$

$w_{i j k}$ : the AGV $k$ remaining loading capacity from cargo $i$ to cargo $j$

(2) Decision-making variables.

\begin{matrix} (3) & x_{i j k} = \begin{cases} 1, & the k^{'} th AGV will travel form cargo i to cargo j, \\ 0, & otherwise, \end{cases} \\ y_{i k} = \begin{cases} 1, & cargo i will be picked by the k^{'} th AGV, \\ 0, & otherwise . \end{cases} \end{matrix}

3.3.2. Objective Function

$\begin{matrix} (4) & \min H = \sum_{k = 1}^{K} \sum_{i = 0}^{L} \sum_{j = 0}^{L} x_{i j k} c_{i j} . \end{matrix}$

3.3.3. Constraints

$\begin{matrix} (5) & K \leq G, \\ (6) & \sum_{i = 1}^{L} q_{i} y_{i k} \leq Q, \forall k \in 1,2, \dots, K, \\ (7) & \sum_{k = 1}^{K} \sum_{i = 1}^{L} q_{i} y_{i k} \leq KQ, \\ (8) & \sum_{k = 1}^{K} y_{i k} = 1, \forall i \in 1,2, \dots, L, \\ (9) & \sum_{j = 1}^{L} x_{i j k} = y_{i k}, \forall i \in 1,2, \dots, L, \forall k \in 1,2, \dots, K, \\ (10) & w_{0 j k} = x_{0 j k} Q, \forall j \in 1,2, \dots, L, \forall k \in 1,2, \dots, K, \\ (11) & \sum_{k = 1}^{K} x_{i j k} w_{i j k} - q_{j} \geq 0, \forall i \in 1,2, ..., L, \forall k \in 1,2, ..., K, \\ (12) & \sum_{i, j \subset S \times S} x_{i j k} \leq S - 1, S \subset 1, \dots \dots, L S \neq \emptyset \forall k \in 1,2, \dots, K . \end{matrix}$

The objective function (4) indicates that the sum of the travel distances of all AGVs is the shortest; equation (5) indicates that the total number of AGVs required for order picking is less than the total number of AGVs; equation (6) indicates that the amount of goods picked by each AGV does not exceed its capacity limit; equation (7) indicates that the amount of goods required for this order picking does not exceed the sum of the capacity limits of all AGVs; equation (8) indicates that each goods position is guaranteed to be visited by only one AGV; equation (9) indicates that each cargo space is allowed to be visited only once in the same picking round; equation (10) indicates that each AGV is empty from the picking table; equation (11) indicates that the AGV has sufficient loading capacity for any cargo space before the AGV picks that space; equation (12) indicates the elimination of subloops.

4. Algorithm Design for Hybrid Picking Mode

In previous reinforcement learning algorithms, it is more effective to solve purely theoretical routing problems. However, in this paper, starting from a realistic picking scenario, the shortest picking path solution is sought while considering the two prerequisites of “the idleness of the picking table” and “the distance between the center of gravity of the order item and the picking table.” The algorithm design of the hybrid picking mode in this paper refers to Wouter’s transformer model [16], and on this basis, the initial screening of the picking table and the order length alignment processing are added, which means that the limitation of the order volume at the moment of order arrival can be broken, so that the resources of the picking table can be more fully utilized and the picking efficiency of the intelligent AGV can be improved. The following is a detailed description of the algorithm.

4.1. Model Construction of Hybrid Picking Model

In order to solve the hybrid picking path optimization problem, this paper improves the transformer model. The specific idea is: using the method of placeholder expansion of picking nodes, a new placeholder encoding mechanism for aligning the order length of item order groups is proposed, so that it is transformed into an array of equal length, thus allowing the input data in the input batch to be solved for different numbers of picking positions; subsequently, in the encoding process, the input data are mapped into a high-dimensional space to obtain vector features. What’s more, in the encoding process, the input data are mapped into a high-dimensional space to obtain vector features, and then, in the decoding process, the solution model is constructed based on Markov Decision Process (MDP), and the algorithm model is shown in Figure 2.

[figure(s) omitted; refer to PDF]

4.1.1. Coding Model for Hybrid Picking

(1) Target Picking Table Selection

(A) Calculate the picking table idle rate

Based on the work duration of each picking table in the picking center during the day, the idle degree of each picking table in the idle state is calculated as shown in equation (1), and the idle rate is sorted from highest to lowest, and the picking table with the highest idle rate is selected for picking.

(B) Calculate the distance between the target picking table and the order center of gravity of the item to be picked

According to the warehouse management system and the warehouse control system, the order items are automatically divided according to the order commodity labels. In order to minimize the walking distance of the logistics AGV, the distance between the target picking table and the center of gravity of the order group to be picked for each item is calculated as shown in equation (2), and the picking table for entering the picking work is selected based on the principle of the shortest distance.

(2) Item Order Processing. During e-commerce promotions (e.g., “Double 11” and“618 ”), the number of real-time orders arriving at the picking center management system increases sharply for a short period of time. To address the irregular nature of the real-time order quantity, this paper adds to the encoding structure of the transformer and proposes a new placeholder encoding mechanism for aligning the order lengths of item order groups into arrays of equal length to address the traditional reinforcement learning model in which the key to parallel computation is based on the input of the regularized batch tensor dimension. The principle is to determine the longest length of the required number of picks in the irregular batch order array, fill the placeholders for the blank, idle picking stations between the required picking orders and the longest length, and form structured data after tensor transformation, as shown in equation (13). $\begin{matrix} (13) & A = {\begin{matrix} v_{00} & v_{01} & v_{02} & \dots & \dots & v_{0 n_{0}} & v & v & \dots & \dots & v \\ v_{10} & v_{11} & v_{12} & \dots & \dots & v_{1 n 1} & v & v & \dots & \dots & v \\ v_{20} & v_{21} & v_{22} & \dots & \dots & v_{2 n_{2}} & v & v & \dots & \dots & v \\ \dots & \dots \\ v_{M 0} & v_{M 1} & v_{M 2} & \dots & \dots & v_{M n_{M - 3}} & v_{M n_{M - 2}} & v_{MN} & \dots & \dots \end{matrix}}_{M \times N}, \end{matrix}$ where $M$ is the total number of batches, $N$ is the length of the longest number of picks required in the batch order, $v_{i j}$ is the demand for picking spaces in each order, $n_{i}$ is the length of the number of picks required for each order, and $v$ is the placeholder.

(3) Data Embedding Processing. First of all input the coordinates of the goods location $s = x_{0}, y_{0}, x_{1}, y_{1}, \dots \dots$ and the demand for each good position in the order $D = d_{1}, d_{2}, \dots \dots$ to the coding model. Then, the coordinates of the demand point $x_{i}, y_{i}$ and the demand $d_{i}$ are stitched together to obtain $p_{i}$ （ where P is a three-dimensional data (x, y, d)）and the high-dimensional linear projection of $p_{i}$ . The feature vector of picking points is obtained by embedding $h_{i}^{0}$ . The feature vectors of the neural network at the $i$ ’th picking station are shown in Figure 3, where $o$ represents the layer of the neural network and $i$ indicates the picking points Briefly speaking, this vector represents the feature vector of the neural network of layer 0 in the ith picking table.

[figure(s) omitted; refer to PDF]

(4) Picking Network Feature Extraction

(A) Attention mechanism

In order to calculate the features of the network graph between picking points and picking points and between picking points and picking tables, the features, $h_{i}^{0}$ . The picking network feature vector is calculated by performing multiple high-dimensional mappings, $h_{g}^{N}$ .

First, by using the model parameters $W^{Q}, W^{K}, W^{V}$ of the matrix operation, the feature information $q, k, v$ of each picking node is calculated. Subsequently, the $q$ with $k$ correlation calculation is performed to obtain the weights. Similarly, we can calculate the picking node $i$ with other picking nodes $j$ correlation weights between $u_{i j}$ .

Next, the weights are normalized. We use the $softmax$ function to normalize the correlation $u_{i j}$ to normalize and calculate the attention weights $a_{i j} \in 0,1$ , which is the correlation coefficients received by the picked nodes.

Finally, the normalized weights are used to sum up with the weight of V. Based on the $a_{i j}$ and $v_{j}$ , calculate the current $q$ . The attention value under the value $h_{i}^{'}$ represents the relevance degree information between the current picking node and other picking nodes.

To facilitate the understanding of the attention mechanism, the detailed steps of the attention mechanism are shown in the Algorithm 1, where the model parameters are presented above.

(B) Multi-headed attention mechanism

Algorithm 1: Attention mechanism.

Inputs: each embedded node after embedding $h_{i}^{l} l = 0$ , picking nodes i, $j \in 1,2, \dots, L$

Output: information about the correlation of each picking node with other picking nodes $h_{i}^{'}$

(1) Initializing the transformer network parameters $W_{Q}, W_{K}, W_{V}$

(2) For i = 1: L do:

(3) Calculate the values of q, k, v $q_{i} = W^{Q} h_{i}^{l}, h_{i}^{'} = \sum_{j} a_{i j} v_{j}, v_{i} = W^{V} h_{i}^{l}$

(4) $l = 1 + 1$

(5) For j = 1: L does:

(6) Calculate the correlation between picking nodes i and other picking nodes j $u_{i j} = \begin{cases} q_{i}^{T} k_{j} / \sqrt{d_{k}} & If i and j are not related \\ - \infty & else \end{cases}$

(7) Calculating attention weights $a_{i j} = e^{u_{i j}} / \sum_{j} e^{u_{i j}}$

(8) Calculating attention $h_{i}^{'} = \sum_{j} a_{i j} v_{j}$

(9) End for

(10) End for

The multi-headed attention mechanism is an algorithm for weighted messaging between cargo locations, and consists of multiple attention mechanisms that are stitched together to calculate the correlation information between picking points based on different dimensions to obtain the picking network feature vector $h_{g}^{N}$ .

4.1.2. Decoding Model for Hybrid Picking

(1) Picking Node Relevance Embedding. First, the picking point features are obtained by coding $h_{i}^{N} = h_{0}^{N}, h_{1}^{N}, \dots \dots$ and picking network diagram features $h_{g}^{N}$ , and the coordinates and demand of the goods and picking stations as input for decoding. The information of the picking table and the remaining loading capacity of the AGV are marked to make a start for solving the path solution.

(2) Attention Mechanism. Like the attention mechanism for encoding, the attention mechanism in this stage calculates the direct correlation between the last picking node in the identified path order, i.e., the picking point about to depart, and the outstanding picking point to calculate the individual probabilities of the next picking node and determine the next picking point to go to.

(3) Masking Mechanism. The masking mechanism is used to mask the next round of unsuitable cargo bits after the probability of the next visit to the node of interest is generated using the attention mechanism. And, the parameters of the masking process are set as shown in Table 1.

Table 1

Mask parameter representation.

Number	Parameters	Indicates	Examples
1	$V_{e d}$	Visited node vector	$V_{e d} = 0,0,0,0$
2	$D$	Node demand vector	$D = 0.3, 0.5, 0.7, 0.1$
3	$U_{cap}$	Car capacity already used	$U_{cap} = 0.3$
4	$B_{cap}$	Broadcast value	$B_{cap} = Broadcasting V_{e d}, U_{cap}$
5	$A_{cap}$	Truncated vectors (greater than the car capacity is truncated)	$A_{cap} = 1,0,0,0$
6	$M$	Mask vector (indicates nodes that cannot be accessed)	$M = 1,0,0,0$

In the masking process, $V_{e d}$ uses 0 and 1 to represent accessible nodes, where the 1 represents two different meanings: (1) the nodes that have been accessed and (2) the nodes that are not suitable for access in this sub-loop, i.e., after accessing the point in the sub-loop, the cargo nodes will exceed the car capacity and do not meet the realistic constraints but have not yet been picked. The picking path can then be indexed by the sequence of the mask. The masking process is shown in Figure 4 and the whole decoding process is shown in Figure 5.

[figure(s) omitted; refer to PDF]

4.2. Training Optimization Components for Hybrid Picking Patterns

Here, we choose a reinforcement learning algorithm with a rollout baseline [22] as the evaluation parameter to train the model, which is a hybrid algorithm combining value-based and policy gradient-based algorithms with two neural network models built in, i.e., the transformer network model and the rollout network model.

4.2.1. Reinforcement of the Learning Network Model

In a realistic picking center environment, it is generally a grid layout that presents a north-south or east-west orientation, thus using the Manhattan distance to design the objective function, see equation (14). $\begin{matrix} (14) & r π | s = \sum_{t = 1}^{n} x_{π t} - x_{π t - 1} + x_{π n} - x_{π 1} + \sum_{t = 1}^{n} y_{π t} - y_{π t - 1} + y_{π n} - y_{π 1} . \end{matrix}$

In this paper, the training network uses the above encoding and decoding model, i.e., the transformer model, with a θ-parameterized stochastic policy π, which generates the conditional probability distributions of the nodes according to the attention mechanism, and sequentially generates the solution sequence points, see equation (14). Based on the generated solution sequence points, the loss function is calculated, see the definition of equation (15), wherein $L_{i}^{B L} X_{0}^{n}; σ$ is described in detail in the rollout, and is the time step of decoding. $\begin{matrix} (15) & \sum^{T \cdot} L_{t' = t} π_{i} - L_{i}^{BL} X_{0}^{n} : σ . \end{matrix}$

Based on the loss function, the gradient optimization parameters of the objective function are calculated, which are defined as shown in equation (16). $\begin{matrix} (16) & \nabla θ \leftarrow \frac{1}{N} \sum_{n = 1}^{N} \sum^{T_{n}} L_{t' = t} π_{i} - L_{i}^{BL} X_{0}^{n}; σ \nabla_{θ} \log P Y^{n} | X_{0}^{n} . \end{matrix}$

Finally, the traditional stochastic gradient descent (SGD) process is replaced by the Adam first-order optimization algorithm, which backpropagates the gradients and updates the neural network weights iteratively using the training data, see the defined equation (17). $\begin{matrix} (17) & θ \leftarrow A d a m θ, \nabla θ . \end{matrix}$

4.2.2. Reinforcing the Learning Baseline Model

Inputs: picking table and shelf locations $s^{i} = x^{i}, y^{i}, i = 0,1,2, \dots, N$ , the number of picking orders demanded is $d_{i}$ , the number of AGVs is $k$ , the loading capacity of the AGV is $D$ .

Training: calculated according to the Adam optimization algorithm $θ$ , update back propagation gradient, optimize transformer model parameters and update $rollout$ , evaluate the network parameters, the detailed steps are shown in Algorithm 2.

Algorithm 2: Reinforcement learning algorithms under mixed models.

Inputs: coordinates of picking table p, position $s$ , and the demand for the picking table $D$

Output: The picked sequential cargo position is the solution $π$

(1) Initialize transformer and rollout evaluation network model parameters. Initialize optimizer parameters and initialize learning rate

(2) Calculation of picking table idleness

(3) Calculation of the distance between the pintle order and the center of gravity of the picking table

(4) For i = 1: E do-

(5) Initialization parameters. $θ, θ^{B L} \leftarrow θ$

(6) $s_{i} \leftarrow random 1234 \forall i \in 1, \dots, Batch$

(7) $π_{i} \leftarrow rollout s_{i}, p_{θ} \forall i \in 1, \dots, Batch$

(8) $π_{i}^{B L} \leftarrow rollout s_{i}, p_{θ} \forall i \in 1, \dots, Batch$

(9) For $n = 1, \dots, N d o$

(10) Initialization time step. $t \leftarrow 0$

(11) Repeat

(12) According to the probability distribution $p y_{t + 1}^{n} | Y_{t}^{n}, X_{t}^{n}$ , select sequence points $y_{t + 1}^{n}$

(13) Observing the new state $X_{t + 1}^{n}$

(14) $t \leftarrow t + 1$

(15) Until the termination conditions are met

(16) Calculate the objective function of the training $L π_{i} = L Y^{n}, X_{0}^{n}$

(17) Calculate the evaluation value of the rollout baseline $L π_{i}^{B L}$

(18) To find the gradient for the transformer model network. $\nabla θ \leftarrow 1 / N \sum_{n = 1}^{N} \sum_{T_{n}} L_{t^{'} = t} π_{i} - L_{i}^{B L} X_{0}^{n}; σ \nabla_{θ} \log P Y^{n} | X_{0}^{n}$

(19) Gradient updates to the transformer model network. $θ \leftarrow Adam θ, \nabla θ$

(20) End for

(21) If one-sided paired t-test test< $α$ then

(22) $θ^{B L} \leftarrow θ$

(23) End if

(24) End for

Step (1) evaluate the network model parameters with 0 initializing transformer and rollout; step (2) with step (3) is to select the best picking station; step (4) iterative module; steps (5)–(7) randomly initialize the network strategy; step (9) perform batch training; step (10) start decoding; steps (11)–(15) generate decoding sequence, steps (16)–(29) calculate update gradient values and optimize the neural network parameters based on Adam’s algorithm. Steps (21)–(23) whether to update the evaluation function parameters.

5. Example Simulation and Analysis

5.1. Example Simulation of Hybrid Real-Time Picking Mode

A picking center is arranged in a regional layout according to commodity items. There are multiple picking stations in the center. The AGVs are in a unified charging cache area, and they work together to complete picking. The intelligent warehouse is set to have 5 picking stations and 360 goods slots, and the demand for each task in each batch of order is less than 1. The quantity of AVG is not limited, which is enough to meet the assigned task of order picking. In order to increase the complexity of the task and improve the universality of the algorithm, 13/14/15 picking order tasks are randomly generated according to the Poisson distribution adhered to at the time of order arrival. Since the distance between the CHARGING buffer of AGV and the picking point is too small compared with the total picking path of batch orders, the simulation experiment in this paper ignores the distance between the charging buffer of AGV and the first picking task point, and regards it as the AGV starting from the picking station and returning to the picking station when the loading capacity is insufficient. The relevant simulation data are shown in Table 1. In order to display the distribution of data intuitively, it is drawn in the form of a scatter diagram, and the warehouse coordinate diagram is shown in Figure 6.

[figure(s) omitted; refer to PDF]

The calculation data of the picking station is shown in Table 2and Table 3.The bold value in Table 2 indicates that under the conditions of comprehensively considering the idleness of the picking table and the distance from the center of gravity of the order to the picking table, we will select the optimal selection of the picking table entering the working state.

Table 2

Picking table idleness and item order center of gravity distance.

Picking table	Location	Vacancy rate	Distance from the center of gravity of the first group of item orders	Distance to the center of gravity of the second group of item orders	Distance to the center of gravity of the third group of item orders
Picking table1	(6, 17)	0.6	9.75	17.22	31.77
Picking table2	(14, 17)	0.3	13.39	11.35	24.42
Picking table3	(22, 17)	0.57	19.79	9.46	17.65
Picking table4	(30, 17)	0.4	27.05	13.34	12.44
Picking table5	(38, 17)	0.5	34.64	19.86	11.22

Table 3

Original data for picking nodes.

Num	Picking stations 1		Picking stations 3		Picking stations 5
Num	Coordinates	Demand	Coordinates	Demand	Coordinates	Demand
1	(1, 9)	0.13	(27, 12)	0.13	(31, 6)	0.03
2	(4, 14)	0.03	(19, 13)	0.10	(37, 5)	0.07
3	(9, 11)	0.17	(22, 8)	0.26	(31, 9)	0.07
4	(3, 11)	0.20	(15, 12)	0.2	(39, 14)	0.23
5	(13, 8)	0.10	(25, 2)	0.23	(31, 14)	0.20
6	(9, 0)	0.23	(21, 0)	0.40	(31, 2)	0.13
7	(6, 11)	0.13	(27, 5)	0.23	(40, 6)	0.07
8	(3, 13)	0.03	(19, 6)	0.33	(43, 12)	0.30
9	(10, 11)	0.27	(16, 14)	0.16	(39, 1)	0.27
10	(0, 8)	0.27	(27, 3)	0.10	(39, 3)	0.23
11	(10, 13)	0.40	(21, 5)	0.13	(34, 5)	0.30
12	(3, 5)	0.02	(24, 11)	0.10	(33, 6)	0.10
13	(0, 6)	0.03	(25, 5)	0.26	(39, 2)	0.07
14	—	—	(19, 1)	0.20	(36, 6)	0.07
15	—	—	—	—	(34, 6)	0.23

Then the data are loaded into the model to solve the task assignment above. The picking road map obtained is shown in Figure 7.

[figure(s) omitted; refer to PDF]

5.2. Algorithm Comparison

The experiments were implemented using Python 3.6, pycharm2019.1.3 of pytorch0.4.1, and the computer configuration used was: Intel(R) Core(TM) i5-6300U [email protected] GHz, 2496Mhz, 22 cores, 41 logical processor, 4 GB of RAM, and a 64 bit of Windows 10. operating system.

To analyze the effect of the proposed model, we choose three algorithms such as OR_ARC, OR_CHR, and PATH_CHEAPEST_ARC (PATH-ARC) from Google OR-Tools to perform the algorithm comparison analysis of length dimension and time dimension on the public dataset （Augerat standard dataset set A） . The datasets can be found at the URL https://neo.lcc.uma.es/vrp/vrp-instances/capacitated-vrp-instances/download.Firstly, the advantages and disadvantages of the algorithm are evaluated from three aspects: the optimal distance dissociation of the example (best), the distance solution obtained by the algorithm model (min), and the error between the experimental solution and the optimal solution of the example (DEV%) (DEV = (min best)/best). The specific experimental results are shown in Table 4.

Table 4

Comparison table of node algorithms.

	Instance	Best	PRL		OR_CHR		PATH-ARC		OR_ARC
	Instance	Best	Min (m)	DEV (%)	Min (m)	DEV (%)	Min (m)	DEV (%)	Min (m)	DEV (%)
1	A-n32-k5	784	795	1.40	889	13.35	787	0.37	847	8.00
2	A-n33-k5	661	704	6.51	761	15.07	676	2.32	692	4.65
3	A-n33-k6	742	762	2.70	743	0.20	768	3.51	791	6.62
4	A-n34-k5	799	836	4.63	840	5.17	813	1.80	878	9.90
5	A-n37-k5	730	757	3.70	790	8.22	822	12.65	767	5.08
6	A-n37-k6	822	863	4.99	931	13.25	876	6.59	924	12.36
7	A-n38-k5	831	863	3.85	893	7.41	857	3.08	919	10.56
8	A-n39-k5	937	1014	8.22	1004	7.12	1042	11.26	1058	12.96
9	A-n44-k7	1146	1173	2.36	1213	5.81	1162	1.38	1248	8.89
10	A-n45-k6	914	990	8.32	1036	13.33	1045	14.29	983	7.55
11	A-n45-k7	1073	1101	2.61	1258	17.24	1156	7.73	1227	14.31
12	A-n46-k7	1010	1099	8.81	1147	13.55	1119	10.78	1036	2.62
13	A-n48-k7	1167	1178	0.94	1241	6.34	1294	10.89	1334	14.34
14	A-n53-k7	1408	1456	3.41	1757	24.81	1596	13.35	1623	15.24
15	A-n54-k7	1035	1127	8.89	1157	11.80	1243	20.11	1092	5.54
16	A-n55-k9	1290	1374	6.51	1434	11.16	1431	10.96	1414	9.61
17	A-n62-k8	1168	1227	5.05	1264	8.24	1215	4.01	1183	1.30
18	A-n63-k9	1764	1916	8.62	1987	12.63	1895	7.45	1856	5.21
19	A-n63-10	784	795	1.40	889	13.35	787	0.37	847	8.00
20	A-n64-k9	661	704	6.51	761	15.07	676	2.32	692	4.65
21	A-n65-k9	742	762	2.70	743	0.20	768	3.51	791	6.62
22	A-n69-k9	799	836	4.63	840	5.17	813	1.80	878	9.90

Subsequently, in order to summarize the overall evaluation of each algorithm in the 22 group instances, four comparative analyses were performed in terms of the mean length of the algorithms, the total time, the mean DEV, and the standard deviation of the DEV, as shown in Table 5.

Table 5

Algorithm comparison table.

Projects	Distance mean (m)	Distance mean error rate (%)	DEV mean value	DEV standard deviation	Total time (s)
Best solution	966.68	0	0	0	—
PRL	1015.09	5.01	4.85	2.60	0.7883
OR_ARC	1049.09	8.52	8.36	3.86	7.0147
OR_CHR	1071.73	10.87	10.39	5.69	5.8683
PATH_ARC	1038.23	7.40	6.84	5.46	6.8282

As can be seen from Table 5, compared with the three heuristics of ortools, PRL algorithm proposed in this paper is the closest to the exact solution from the perspective of distance and length. And, from the time dimension analysis, PRL algorithm has obvious advantages of parallel algorithm, shortens the time, and can be solved quickly.

Then, this paper conducted significance tests for best solution and RL, OR_CHR, PATHARC, and OR_ARC, respectively, as shown in Table 6.

Table 6

Inter-subject effect test.

Dependent variable:	Best
Source	Class III sum of squares	Degree of freedom	Mean square	F	Significance
Correction model	1571745.414a	4	392936.353	1772.163	4.97709E-22
Intercept distance	534.751	1	534.751	2.412	0.138845022
PRL	14975.580	1	14975.580	67.541	2.52493E-07
OR_CHR	4.489	1	4.489	.020	0.888531759
PATHARC	305.100	1	305.100	1.376	0.256955406
OR_ARC	6074.522	1	6074.522	27.396	6.7336E-05
Error	3769.359	17	221.727
Total	22133937.000	22
Total after correction	1575514.773	21

a. R-squared = 0.998 (adjusted R-squared = .997).

From Table 6, it can be seen that the significance of the RL method and OR_ARC method is less than 0.05 and passes the level of the significance test, while the significance of the OR_CHR method and PATHARC method is bigger than 0.05 and does not pass the level of the significance test, given the significance of 5%. And, it can also be seen that there is a significant difference between the results obtained by the OR_CHR method and PATHARC method and Best solution, while there is no significant difference between the RL method and OR_ARC method and Best solution, and the difference between the RL method and Best solution is much smaller than the OR_AHC difference. It can be seen that the RL algorithm in this paper can solve the storage picking path problem well.

Next, we analyzed the time and space complexity of the algorithm, and the results are shown in Table 7.

Table 7

Time complexity and space complexity.

Algorithm	Time complexity	Spatial complexity
OR_CHR	—	$O 1$
PATHARC	—	$O 1$
OR_ARC	—	$O 1$
PRL	$O n^{2} d$	$O 1$

Table 7 shows that the spatial complexity of both the heuristic algorithm and Transformer is $O 1$ , and the spatial complexity is the same. As for time complexity, Google’s ortools heuristic algorithm uses simulated annealing algorithm, which is based on probability and is a general optimization algorithm derived from the solid annealing principle. Theoretically, the algorithm has global optimization performance, so its complexity cannot be calculated. As a result, PRL’s time complexity is determined in comparison to heuristic algorithms, which use probabilistic algorithms and may keep iterating because the optimal solution is not found.

As we all know, Google orTools is developed in C++, and its statements are executed much more efficiently than Python as a scripting language. However, according to the experimental data, the Google ortools heuristic algorithm developed by C++ has no advantage in computing time and optimal path length, and is obviously weaker than the PRL algorithm. Therefore, the PRL algorithm can be regarded as a good choice when solving the problem of warehouse routing.

To sum up, although the average path length planned by the PRL method is slightly longer than the optimal solution accurately solved, it is closer to the optimal solution compared with the three heuristic algorithms of Google ortools. From the mean value of DEV, the error with the optimal solution is 5.1%. From the standard deviation of DEV, the error between the path length solved by the PRL method and the optimal solution is controlled within 7.45% with a probability of 0.8413, while the other three methods are 12.22%, 16.08%, and 12.3%, respectively. From the perspective of time dimension, PRL has strong parallel computing capability, and the solving time is 1/12 of Google OR_tools heuristic algorithm, or even shorter.

6. Conclusion

With the rapid development of the Internet of Things technology, optimize the warehouse management system, establish and improve the intelligent picking system, for unmanned, intelligent transformation is the key direction of reform and development of e-commerce enterprises, courier enterprises.

This thesis addresses the picking path planning problem of multiple picking stations in a warehouse in a zoned area, and designs a PRL algorithm using placeholder control to solve the path optimization problem in a shared mode where the number of picking orders is unequal and the picking stations do not need to work synchronously, taking into account two factors: the idleness of the picking stations and the distance of the center of gravity of the order items. The simulation results have shown that the PRL algorithm in this paper is very close to the optimal solution and far better than the three heuristic algorithms of Google’s ortools in terms of solving accuracy. In terms of solving speed, it has the advantage of fast solving and can almost realize real-time solving. In the dimension of structure, it has the advantage of variable input and avoiding retuning. Therefore, this method is very effective to solve the problem of multiple picking stations with different order numbers in intelligent warehouse. It can solve the picking path effectively and reduce the picking cost [23].

7. Outlook

This paper focuses on the picking center intelligent AGV picking path planning problem, cleverly converting the MDVRP problem into a CVRP problem by two-stage method, and designing an improved Transformer algorithm for path planning, thus realizing real-time picking in picking centers with improved picking efficiency compared to traditional methods. Meanwhile, this paper can be promoted for scenarios such as UAV take-out delivery, port container loading and unloading, and logistics distribution. Since this paper is carried out without considering the limitation of AVG number of picking centers, the following research direction is the optimal path planning of picking centers under the constraint of AVGs’ number.

Disclosure

Deyao Wang, Jun Jiang, and Ran Ma are the co-first authors.

Authors’ Contributions

Deyao Wang, Jun Jiang, and Ran Ma contributed equally to this work. Deyao Wang performed conceptualization (providing part ideas), performed formal analysis (some formal analysis, such as time & space analysis and correlational analyses), wrote the final draft, developed the methodology (the modification of MDVRP two-stage solution of picking routing), performed visualization, reviewed & edited the article, and provided the software (support of computer code). Jun Jiangperformed conceptualization, performed data curation, performed formal analysis, developed the methodology (MDVRP two-stage solution of picking routing), provided the software (algorithm design for hybrid picking mode, simulation example, and comparison of algorithms), performed validation, wrote the original and final drafts, performed visualization, and reviewed & edited the article. Ran Ma performed conceptualization (providing part ideas), developed the methodology (design of coding model in reinforcement learning algorithm model), provided the software (support of computer code, such as comparison between simulation examples and some algorithms), wrote the original draft, and performed visualization (visualization of simulation example results). Guicheng Shen performed conceptualization and supervision and was responsible for funding acquisition.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (NO: 71831001) and Beijing Key Laboratory of Intelligent Logistics Systems (NO: BZ0211).

References

[1] K. Li, T. Liu, B. He, D. Xu, "Research on AGV path planning and scheduling in “goods-to-person” picking system," Open Journal of Social Sciences, vol. 37 no. 6,DOI: 10.4236/jss.2019.76001, 2021.

[2] X. Ma, "The development status and trend of warehouse picking technology," Logistics Technology and Applications, vol. 21 no. 6, pp. 111-114, 2016.

[3] C. Zhu, M. Liu, C. Wu, "Research progress and prospects of vehicle path problem in supply chain," Computer Integrated Manufacturing Systems, vol. 7 no. 11, 2001.

[4] D. L. Jiang, X. L. Yang, W. Du, "Genetic algorithms for vehicle path problems," Systems Engineering Theory and Practice, vol. 19 no. 6, pp. 40-45, 1999.

[5] P. J. Parikh, R. D. Meller, "A travel-time model for a person-onboard order picking system," European Journal of Operational Research, vol. 200 no. 2, pp. 385-394, DOI: 10.1016/j.ejor.2008.12.031, 2010.

[6] F. T. S. Chan, H. K. Chan, "Improving the productivity of order picking of a manual-pick and multi-level rack distribution warehouse through the implementation of class-based storage," Expert Systems with Applications, vol. 38 no. 3, pp. 2686-2700, DOI: 10.1016/j.eswa.2010.08.058, 2011.

[7] Z. Lu, Y. Han, S. Zhang, "A case study of order picking path optimization in distribution centers based on genetic algorithm," Logistics Technology, vol. 32 no. 9, 2013.

[8] Z. Jia, F. Ailan, "A review of research on manual order picking path optimization and decision making in distribution centers," Logistics Technology, vol. 33 no. 10, 2014.

[9] Y. He, "Order picking path optimization in logistics distribution centers with genetic algorithm," Business and Economic Research, vol. 34 no. 21, pp. 110-111, 2016.

[10] A. M. Caunhye, X. Nie, S. Pokharel, "Optimization models in emergency logistics: a literature review," Socio-Economic Planning Sciences, vol. 46 no. 1,DOI: 10.1016/j.seps.2011.04.004, 2012.

[11] A. Scholz, D. Schubert, G. Wäscher, "Order picking with multiple pickers and due dates - simultaneous solution of order batching, batch Assignment and sequencing, and picker routing problems," European Journal of Operational Research, vol. 263 no. 2, pp. 461-478, 2017.

[12] T.-L. Chen, C.-Y. Cheng, Y.-Y. Chen, L.-K. Chan, "An efficient hybrid algorithm for integrated order batching, sequencing and routing problem," International Journal of Production Economics, vol. 159 no. jan, pp. 158-167, DOI: 10.1016/j.ijpe.2014.09.029, 2015.

[13] M. Matusiak, L. Kroon, "A fast simulated annealing method for batching precedence-constrained customer orders in a warehouse," European Journal of Operational Research, vol. 236 no. 3, pp. 968-977, DOI: 10.1016/j.ejor.2013.06.001, 2014.

[14] J. H. John, W. T. David, "Neural computation of decisions in optimization problems," Biological Cybernetics, vol. 52 no. 3, pp. 141-152, 1985.

[15] O. Vinyals, M. Fortunato, N. Jaitly, "Pointer networks," Proceedings of the International Conference on Neural Information Processing Systems, pp. 2692-2700, .

[16] W. Kool, H. H. Van, M. Welling, "Attention, learn to solve routing problems!," . 2019, https://arxiv.org/abs/1803.08475v3

[17] A. A. Ibrahim, R. O. Abdulaziz, J. A. Ishaya, "Capacitated vehicle routing problem," International Journal of Regulation and Governance, vol. 7 no. 3, pp. 310-327, 2019.

[18] F. Zhang, X. Q. Yin, "Feeding path optimization considering order urgency," Journal of Shandong University of Technology: Natural Science Edition, vol. 29 no. 1, 2015.

[19] H. Sun, Ke Zhang, F. Zhang, "Genetic algorithm-based manual picking path optimization for dual-zone warehouses[J]," Journal of Qingdao University Engineering&Technology Edition, vol. 29 no. 1, 2014.

[20] D. Pisinger, S. Ropke, "A general heuristic for vehicle routingproblems," Computers & Operations Research, vol. 34, pp. 2403-2435, 2007.

[21] B. Yu, P. Jin, Z. Yang, "Two-stage heuristic algorithm for multi-center vehicle routing problem with time Windows [J]," Systems Engineering-Theory & Practice, vol. 32 no. 8, pp. 1793-1800, 2012.

[22] L Jiali, G Xiuping, "Multi-center open-loop vehicle routing problem with mutual exclusion and vehicle matching," Journal of Systems Management, vol. 25 no. 1, pp. 129-138, 2016.

[23] S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, V. Goel, "Self-critical sequence training for image captioning," Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179-1195, DOI: 10.1109/CVPR.2017.131, .

Word count: 6755

Show less

Copyright © 2022 Deyao Wang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

In the order picking process of the warehouse center, considering the rapid increase in the volume of orders arriving at the picking center at the time of the promotional festival, a hybrid operation mode with multiple picking tables is used to meet the picking requirements of the huge number of real-time orders. Therefore, in this paper, a hybrid picking mode is proposed, taking into account both the idle degree of picking stations and their order item centers of gravity, and a new reinforcement learning algorithm embedding mechanism (PRL) with placeholder control is designed to solve the problem of a huge number of real-time item orders arriving at the picking center system on promotional holidays and in inconsistent quantities, and numerical simulations are performed for this algorithm. The experimental results show that the PRL algorithm in hybrid picking mode can handle a huge number of orders simultaneously and improve picking efficiency effectively.

Details

Title

Research on Hybrid Real-Time Picking Routing Optimization Based on Multiple Picking Stations

Author

Wang, Deyao¹; Jiang, Jun²; Ma, Ran³; Shen, Guicheng³

¹ School of Maritime Economics and Management, Dalian Maritime University, Dalian 116026, China
² Beijing Jingyibeifang Instrument Co., Ltd.,, Beijing 102600, China
³ School of Information, Beijing Wuzi University, Beijing 101149, China

Editor

Emiliano Mucchi

Publication year

2022

Publication date

2022

Publisher

John Wiley & Sons, Inc.

ISSN

1024123X

e-ISSN

15635147

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2022/5510749

ProQuest document ID

2651402328

Research on Hybrid Real-Time Picking Routing Optimization Based on Multiple Picking Stations

Jump to:

Full text

Abstract

Details

Suggested sources