1 Introduction
In recent years, with the acceleration of urbanization, urban road traffic resources can not meet the growing traffic demand, therefore, Intelligent Transportation System (ITS) [1] came into being. The automatic train operation (ATO) system [2], which replaces manual driving in many places with low cost and automation, has become an important part of ITS. While ATO systems have been increasingly embraced by many metro systems over the past decades due to their low cost and practicality, it is evident that they fall short in several critical areas. Firstly, the intelligence of these systems is limited; they often rely on predefined operational strategies and lack the dynamic adaptability to respond effectively to complex and unforeseen circumstances. Secondly, the absence of self-learning capabilities restricts their potential to improve efficiency and safety over time through the accumulation and analysis of operational data. Lastly, the generalization of these systems is constrained; they are typically tailored to specific lines and struggle to adapt to diverse line conditions, such as varying speed limits, gradients, and traffic flows, which limits their broader application. These limitations underscore the need for more advanced, intelligent, and adaptable train operation systems that can enhance operational efficiency, safety, and passenger comfort.
The speed control of metro train operation can be represented as a multi-objective optimization problem with constraints. In order to satisfy these constraints and optimize the objectives, the train must make driving decisions based on real-time information. Under normal conditions, the ATO is responsible for all train traction and braking control commands to make the train run on time, regulate its speed and stop exactly at its destination [3]. ATO is traditionally divided into two sub-modules. The first one is dedicated to the calculation of the speed profile of the future train operation. Under this module, offline optimization algorithms are used to calculate the optimal speed profile in terms of performance and energy consumption. The second sub-module works mainly to ensure that the train accurately tracks the given speed profile.
Recently, many studies have been devoted to designing an offline optimized train trajectory to improve energy efficiency. For example, Khmelnitsky [4] devised a numerical algorithm to get the best velocity profile, taking into account changeable gradients and arbitrary speed limits. Furthermore, train operation issues encompass a variety of additional factors, such as trip comfort and punctuality. Yang et al. [5] created a genetic algorithm based on binary coding, developing a two-target integer programming model with headway time control and dwell time management to find the optimal solution in terms of energy savings and service quality. Wang et al. [6] introduced a new iterative convex planning (ICP) technique to solve the train scheduling problem to achieve the ideal departure time, running time, and dwell time in order to minimize travel time and energy consumption. Using optimal speed trajectory searching methodologies under diverse track parameters, Guan et al. [7] created a multi-objective optimization model for the speed trajectory, with energy consumption and travel time as the key optimization objectives. With the development of artificial intelligence, many intelligent algorithms have been applied to train operation. Akba et al. [8] employ an artificial neural network with the genetic algorithm to optimize the coasting points of the velocity-distance trajectory to obtain minimum energy expenditure for a given travel time. Yang et al. [9] combined a simulation-based approach and a genetic algorithm to find an approximate optimal coasting control strategy. Yin et al. [10] developed ITOR algorithm for intelligent train operating capable of satisfying multiple objectives by using expert experience and Q-Learning algorithm. Zhang et al. [11] used manual driving data to train (K-NN, Bagging CART, and Adaboost CART) three well-known algorithms to predict the driver’s output control. Recently, Zhou et al. [12] proposed STO algorithm by using deep deterministic policy gradient (DDPG) and normalized dominance function (NAF) algorithms to further optimize the energy consumption, comfort during train operation metrics.
After generating the optimal recommended speed profile, the ATO’s task is to develop an efficient method to control the train relatively to different train models and operating conditions (e.g., tunnels, curves, steep gradients) so that the train can accurately track the speed profile and operate safely and smoothly. Ke et al. [13] proposed a fuzzy PID gain method to track the recommended speed profile, which was optimally generated by the MAX-MIN ant system. Song et al. [14] investigated at the consequences of time-varying failures in both the traction and braking phases of the train, and suggested an adaptive backstepping control system that was completely parameter-dependent and successful in achieving good speed tracking performance. Liu et al. [15] proposed a high-speed railway control system based on fuzzy control method and designed the control system in MATLAB. Gu et al. [16] have proposed a new energy-efficient train operation model based on real-time traffic information from a geometric and topographic perspective. Two robust adaptive control approaches considering actuator saturation and unknown system parameters were proposed by Gao et al. [17]. Recently, Pu et al. [18] proposed a model-free adaptive speed controller based on neural network (NN) and PID algorithms,and the effectiveness of the proposed algorithms to track the SD trajectory precisely is proved by numerical experiments and real-line applications.
Actually, the previous research and application has greatly improved the operational performance of metro train operation. However, there are still some basic problems that have not been solved, which hinder the development of ATO systems. Firstly, most existing ATO systems achieve their train operation goals by focusing on energy-efficient trajectory calculation, real-time tracking methods, and station parking algorithms, respectively. Especially, ATO algorithms are designed to track offline optimized speed profiles, lacking intelligence, flexibility and robustness. Few studies have comprehensively considered multiple objectives such as driving comfort, punctuality, parking accuracy, and energy consumption. Meanwhile, complex control methods are difficult to implement in real operation when faced with system non-linearity, unknown resistance and variable in-train forces. Secondly, modern metro trains are capable of outputting continuous traction and braking forces, but few studies have been conducted to design continuous control models considering complex line conditions, for example, the intelligent train operation algorithms based on reinforcement learning (ITOR) are proposed in [10], which can only achieve discrete control of train with simple line condition. Finally, there are some metro sections with more complex speed limits and gradients change metro, and most of the train models proposed only consider the operation in the intervals with simple speed limits and gradients conditions, such as the smart train operation (STO) algorithms based on normalized advantage function (STON) proposed in [12] is difficult to be applied to the case of long distances between two consecutive stations and lines with complex speed limits.
Facing these problems, new intelligent driving algorithms with a higher level of intelligence need to be investigated, which are called enhanced intelligent operation algorithms (EITOE and EITOP) in this paper. On the one hand, experienced drivers combined with their long-term accumulated maneuvering experience can implement effective control of the train in real-time so that the train operation meets the requirements of several control objectives. Besides, they can be well adapted to different conditions of railroad lines. On the other hand, reinforcement learning(RL) has been used as a powerful decision tool [19] to tackle optimal control problems in many domains, such as micro-drone control [20], robot control [21], and with good results in the field of intelligent driving of trains [10,12]. Meanwhile, deep reinforcement learning [22] is considered to be useful for the control of continuous movements [23], the detailed demonstration is analyzed in Sect 3.2.
Therefore, we consider combining expert (experienced drivers) experience with deep reinforcement learning algorithms to achieve better and intelligent operations. The necessity of proposing both (EITOE and EITOP) lies in their complementary strengths. (EITOE leverages expert knowledge and heuristic rules to provide a robust baseline for intelligent train operation, while EITOP) uses deep reinforcement learning (PPO) to optimize multiple objectives dynamically. By presenting both algorithms, we aim to demonstrate how expert knowledge can be effectively integrated with advanced machine learning techniques to enhance train operation performance. This dual approach allows for a comprehensive evaluation of their effectiveness under varying conditions, showcasing the versatility and adaptability of our proposed solutions. As can be seen from the above analysis, the contributions of this paper are as follows:
1. 1) Integration of Expert Knowledge with Deep Reinforcement Learning: We introduce a novel approach that integrates expert system-based rules, distilled from experienced drivers, with the Proximal Policy Optimization (PPO) algorithm. This integration results in the development of EITOE and EITOP algorithms, which not only provide a robust operational baseline for intelligent train operation but also dynamically optimize multiple objectives, enhancing the adaptability and efficiency of train control systems.
2. 2) Development of EITOE Algorithm: The EITOE algorithm is developed by encapsulating heuristic rules and inference methods from expert drivers within an expert system framework. This innovation allows for the generation of control strategies independent of offline speed profiles, thereby offering a flexible and adaptive operational approach that is responsive to real-time train operation requirements.
3. 3) EITOP Algorithm for Multi-Objective Optimization: Extending the capabilities of EITOE, the EITOP algorithm utilizes PPO to optimize key operational objectives including safety, punctuality, energy efficiency, and passenger comfort. A significant contribution of EITOP is its real-time adjustment of acceleration and braking strategies based on current train conditions and speed limits, which is crucial for maintaining energy efficiency and punctuality in metro train operations.
The rest of the paper is organized as follows. In Sect 2, we define the necessary mathematical notation and performance indicators for metro train operation, and then, we describe the problem of metro train operation. Sect 3 presents the design of the EITO algorithm based on the expert system and PPO. In Sect 4, we construct an EITO simulation platform and give three numerical examples of real data from YLBS. We conclude the paper in Sect 5.
2 Problem formulation and objectives
2.1 Problem statement
This section first formulates the train operation problem and then clearly states the objectives that the proposed algorithms aim to achieve.
The train control problem is formulated as an optimal control problem, focusing on finding an optimal control strategy for the traction and braking force during the travel time. First, the minimum time interval and the travel time of trains are defined as Eqs (1) and (2) respectively:
(1)
For , total travel time T is defined as:
(2)
where the initial run time is t0 = 0(s), and the minimum time interval is .
The train motion model, which incorporates a multi - point mass and signal coordinate model, is used to simulate the electric multiple unit (EMU) of the train. This model takes into account the interaction effects between vehicles, offering an advantage over the traditional single - point train model. The model is expressed as:
(3)
where denotes the weight of the EMU and mi denotes the weight of the i-th vehicle. denotes the interaction between vehicle [24]. denotes the force of each vehicle in the moving train, and is the distribution constant that determines the acceleration/braking force of the i-th vehicle. denotes the variation of the spring deformation of the coupler. denotes the drag force. In addition, describes the drag force caused by friction, The are the vehicle-specific factor. fc is the curve drag force defined as fc = 6.3M/r(s)−55, and r(s) is the radius of the curve [11]. is the drag force caused by the gradient, and is the gradients angle.
The multi-unit model in Eq (3) captures inter-vehicle dynamics (e.g., coupler forces, mass distribution) to better simulate real-world EMUs. While the control force u is centralized, its distribution across vehicles is governed by the force allocation constants (Sect 2.1). For simplicity, we assumed uniform distribution in simulations, as fine-grained force allocation is hardware-dependent and beyond this paper’s scope.
Furthermore, the train acceleration (or braking) system in the study has nonlinear and time delays. Transfer function of the simulated brake acceleration system:
(4)
where G(s) means the actual output, is the system performance gain, and means the delay and time constant of the train acceleration/braking model, respectively.
Metro train operation control models are generally evaluated in terms of five aspects: safety, punctuality, energy consumption, passenger comfort, and parking accuracy.
* Safety: There may be multiple speed limit points between two consecutive metro stations, as shown in Fig 1, where , and are the speed limits for different sections between the two stations. That means during the travel period, the speed of the train must be lower than the current speed limit of the railroad section to ensure safety. The safety evaluation index Is is defined as:(5)
* It is note that the intention of Eq (5) is to ensure that the train’s speed remains within the designated limits. To explicitly state that the evaluation index Is is designed to enforce speed limits to prevent any misinterpretation. The condition should ensure that if the speed exceeds the limit, the index will reflect a violation, thereby discouraging overspeeding.
* Punctuality: Punctuality is an important indicator of metro train operation that affects passenger interchanges and the entire schedule. We first define the running time error as:(6)
* where Ta is the actual running time of the train and Tp is the planned trip time of the train. In this paper, if the running time error is greater than , the metro is not running on time. Therefore, the punctuality evaluation index It is defined as:(7)
* Energy efficiency: Energy consumption accounts for a large portion of train operating costs. The energy consumed is described as(8)
* and the unit mass energy efficiency evaluation index between the two stations is defined as :(9)
* Comfort: Comfort is a direct evaluation criterion for the quality of train service, which ensures that the instantaneous change in acceleration or deceleration should be below a certain threshold value. We define the rate of acceleration u change as:(10)
Therefore, the ride comfort evaluation index Ic can be defined as:(11)
* where is the threshold for acceleration change.
* Parking accuracy: It is used to assess the parking accuracy, expressed as:(12)where sD is the length of the segment between adjacent stations and si is the current running distance of the train. Note that the parking error of the metro is generally required to be within [21] so that metro barrier doors can be opened. Therefore the parking accuracy index can be defined as:(13)
[Figure omitted. See PDF.]
2.2 Problem objectives
This section will clearly articulate the problems that this study aims to address. The two proposed EITO algorithms ( and ) aim to achieve the following objectives corresponding to the above - mentioned problems:
1. Meeting multi-objective requirements: The EITO algorithms should be able to provide control strategies for traction and braking forces that can meet the requirements of multiple objectives such as safety, comfort, punctuality, parking accuracy, and energy efficiency of metro operation. Given the definitions of safety (Is), punctuality (It), energy efficiency (Ie), comfort (Ic), and parking accuracy (Ip), the algorithms need to ensure that the train operation satisfies all these evaluation indices simultaneously.
2. Independent of offline speed profile and continuous force control: The EITO algorithms should be able to perform normal operations without considering the speed distribution of the offline design and achieve the control of continuous forces. As existing ATO systems mainly rely on offline-designed speed profiles and current intelligent driving algorithms have limitations in continuous force control, the EITO algorithms aim to overcome these drawbacks.
3. Outperforming existing methods in energy-efficiency and comfort: The control strategy output by the EITO algorithms should outperform experienced metro drivers and current intelligent driving algorithms in terms of energy efficiency while ensuring good ride comfort. By comparing with manual driving and existing intelligent driving algorithms (such as ITOR and STON), the EITO algorithms should achieve lower energy consumption (Ie) and better comfort (Ic) performance.
4. Adapting to different situations: The EITO algorithms should be able to flexibly adapt to different situations, including different trip times, different temporary faults (earlier or later arrival), speed limits, and gradients conditions (simple or complex). Considering the complex and variable operating conditions of metro trains, the algorithms need to adjust their control strategies accordingly to ensure stable and efficient operation.
Existing ATO systems must track the designed offline speed profile, and current intelligent driving algorithms either cannot achieve control of continuous forces or cannot adapt to complex and variable line conditions, which is the driving force behind this paper. Moreover, RL has been applied in many fields to deal with model-free problems [24], and expert knowledge has been widely used to improve control strategies [10],[11]. Therefore, in this paper, two intelligent algorithms are proposed. Namely, and , where is a heuristic algorithm based on an expert system to address multiple performance objectives of metro train operation. In addition, we develop EITOP based on EITOE using the PPO to comprehensively optimize the multi-objective requirements of safety, comfort, punctuality, parking accuracy, and energy efficiency.
Following the problem statement outlined above, the next section will provide a detailed introduction to the specific control models and methodologies employed to achieve these objectives.
3 EITO algorithm design
The application of expert experience-based control methods to automatic train operation control is motivated by the following two reasons. On the one hand, because the train operation control system is a highly complex, multiobjective nonlinear dynamical system [23,25], which poses great difficulties for traditional control that requires the use of its precise mathematical model; on the other hand, experienced drivers combined with their long-term accumulated maneuvering experience can implement effective control of the train in real-time so that the train operation meets the requirements of several control objectives [26].
Therefore, we first developed an expert system-based algorithm. This expert system contains expert rules and a heuristic inference system. These expert rules were summarized by our communication with metro drivers and by analyzing data from YLBS and literatures. In addition, we developed a heuristic inference method to solve without an offline speed profile reference based on the driver’s operating strategy. Then, the appropriate EITOE output is obtained by combining the speed limit and the current state of the train.
Both and ensure punctuality through real-time adjustments based on current train conditions and speed limits. uses expert rules to allocate trip times effectively, while employs reinforcement learning to dynamically adjust acceleration and braking strategies. The algorithms continuously monitor the train’s position and speed, allowing them to make timely decisions that keep the train on schedule. Specifically, the reward function in penalizes deviations from planned trip times, reinforcing behaviors that promote punctuality.
3.1 EITOE
This research adhered to strict ethical standards. In data collection (human or animal), we followed relevant regulations and obtained necessary consents. Experienced drivers can meet multiple objectives well. By observing the driver’s behavior, we found that an experienced driver can control the train in the correct position, allocate the reserved time reasonably, avoid unnecessary braking, limit the train speed to prevent over-speeding, and reduce the number of switches in the controller output. Based on the study of [10],[12], we derived IF-THEN rules using position, speed, and running time as inputs and acceleration/braking rate as outputs. These rules can be described as follows.
1. Energy-efficient trains operate in three states, namely acceleration, coasting, and braking. The train does not transition directly from the acceleration state to the braking state and vice versa, unless a special incident is encountered. Transfer between any other two states is allowed.
2. The acceleration of the train starting process should be appropriate for comfort (usually less than 0.6 ).
3. For better comfort, the rate of change of acceleration in each time interval should not be too large (usually less than 0.3 ).
4. Determine the next operation mode in advance according to the current speed and the next speed limit value to avoid triggering automatic train protection.
5. Allocate the total trip time to each interval according to the speed limit, and try to operate according to the allocated time in each interval.
[Figure omitted. See PDF.]
As mentioned before, experienced drivers consider the train’s reserved time, reserved distance, speed limit, and current speed: if the train’s speed is too low to arrive in time, the train will accelerate. Conversely, if the train’s speed is too high, the train coasts. We designed a data-driven inference method DMTD to determine the coasting or accelerating time. This inference method is manually driven and uses the twice MTD to calculate the desired speed range () for the current speed limit interval. As shown in Fig 3, the calculated by this method enables energy-efficient driving by making the train coast as much as possible while ensuring punctuality.
[Figure omitted. See PDF.]
Using the online data of the train, we first use the DMTD algorithm (see Algorithm 1) to obtain the appropriately reserved trip times and in the current speed limit interval. Then, the estimated velocity range for the speed limit of each segment is calculated from the formula below .
Algorithm 1. DMTD algorithm.
1: Get online and offline data including current train location si, speed (point in Fig 2) and speed limit . Reserve travel time , assuming , which means that the train is already in the speed limit interval .
2: Every time the train enters a speed limit interval, as shown in the red dot of Fig 2 (indicating that the train enters the second speed limit interval), we draw the maximum traction speed curve from the train position and each speed limit section. Then, from the left end of each speed limit segment, the maximum brake speed curve is drawn to obtain the minimum travel time curve.
3: Calculate the minimum reserved time from the minimum travel time curve between the current position Si and leaving the current speed limit interval and reaching the destination.
4: Calculate the reserve time for the current speed limit interval .
5: Return .
(14)
The DMTD algorithm and Eq (14) indicate that if the train is between , the train can reach its destination on time (within 3 s of the planned trip time is on schedule, setting T0=3 in the Algorithm 1). Therefore, if the train’s speed is lower than , the train needs to accelerate; if the train’s speed is higher than , the train should coast. Then, the reasoning for determining the mode of operation (coasting or accelerating) is summarized as follows.
1) If , the train should accelerate, and the output of the expert system is defined as Eq (15):
(15)
where is the maximum acceleration, and is the rate of variation of the acceleration in the time interval . Note that the parameter is setted as according to expert experience. If unknown disturbances are considered, such as the resistance and gradient of the line, is not constant. And the value of this parameter will be adjusted by PPO in the next section.
2) If , the train should coast and the output of the expert system can be described as Eq (16):
(16)
In addition, when the speed limit of the next section is less than the speed limit of the current section, as shown in Fig 3, the train may need to brake at a reasonable speed to ensure the safety of the train. In other words, the speed of the train should always be lower than the speed limit. In this case, we define the safe speed to monitor the speed of the train:
(17)
where si is the current position of the train. is the starting position of the next section. is the speed scaling factor caused by the time delay and friction of the railroad, which is taken as 0.95 in this paper. is the maximum deceleration speed. In this paper, . When the train operates to the position indicated by the mark 1 in Fig 3, i.e., when the current speed is higher than or equal to , then the train should immediately apply the maximum deceleration . In addition, if the length of the current speed limit interval is long enough, there will be a situation where exceeds the speed limit, which may cause the train to run beyond the speed limit (shown in Fig 3, marker 2.), so we redefine the safe speed to ensure safe driving:
(18)
We defined the parking accuracy (parking error less than ) in Eq (13), and all three automatic stop control algorithms (TASC) proposed in our previous work can achieve accurate parking of trains, the details of TASC please see the reference [25]. Therefore, we apply the heuristic online learning algorithm (HOA) of TASC at the location shown in Fig 3 mark 3 to ensure the parking accuracy of the train. The EITO’s expert system is implemented after expert rules and heuristic inference methods have been designed. As illustrated in Fig 3, EITOE can make appropriate acceleration, coasting, or deceleration decisions based on online and offline data such as speed limits and gradients, as well as expert reasoning methods, and its speed profile can be divided into acceleration phase, multiple coasting phases, safety braking phase, and parking phase. Furthermore, the output is constrained by expert criteria to assure comfort and punctuality.
However, EITOE cannot optimize energy consumption online, because is specified as a constant value. Therefore, the PPO is introduced to improve the performance of EITOE.
3.2 EITOP
RL is a machine learning paradigm that aims to learn to control systems in environments to maximize numerical performance associated with long-term goals [27]. Three reasons motivate us to adapt deep reinforcement learning in train control tasks:
(1) The EITO algorithm does not require reference to the target speed profile, while RL does not require external supervision. (2) During train control, behavior affects not only the immediate reward but also the reward for future states, which falls into the advantage of RL. (3) The use of deep reinforcement learning can modify the control strategies used in current ATO systems for discrete actions. EITOP. The algorithmic process of EITOP is presented in Algorithm 2.
Markov Decision Process: Before applying the reinforcement learning algorithm, we formulate our problem as a Markov Decision Process (MDP), which provides a mathematical framework for decision making. The key elements of reinforcement learning include its state, action, policy, and reward, which are defined as follows.
* State xi. In this case, the train status, with the current position, speed, and reserved trip time, can be described as:(19)
Let x0 denotes the initial train state and xm denotes the final state. Obviously, the following equations should hold:(20)
* (21)
* Action ai. In contrast to in Eq (15) in EITOE, EITOP has a variable variation of acceleration in each state. As shown in Eqs (22)–(23), we use instead of . Meanwhile, we define the range of ui and as [–1,1] and [–0.3,0.3], respectively. Therefore, action ai can be defined for the EITOP:(22)
* when (23)
* when (24)
* Policy. The policy represents the probability of taking an action while processing a discrete action task. In this paper, since EITO is intended to address continuous action control tasks, a policy is a statistic of the probability distribution, which is expressed as Eq (25):(25)
* where is the weight.
* Reward function: This function defines the reward that the train receives when it takes an action in a given state. In this case, our reward function is defined by the time error , the passenger comfort and the energy consumed per unit mass in the time interval when the train takes action ai in state xi.(26)
* where are determined by expert experience, and te is defined as Eq (27):(27)
The role of and is used to ensure that the agent optimizes energy consumption while ensuring punctuality and comfort as much as possible, rather than just reducing energy consumption.
The EITOP algorithm is based on the PPO algorithm [28]. PPO is a deep reinforcement learning algorithm based on policy gradient (PG). Moreover, it is based on the Actor-Critic framework capable of handling continuous action control and model-free problems. The PPO algorithm limits the update magnitude of the new policy according to the ratio of the old to the new policy so that the PG algorithm can be trained and converge at a larger learning rate. The objective function of the policy gradient algorithm is:
(28)
where denotes the policy function; is the network parameter of Actor; i denotes the state or action of the ith step; is the estimate of the advantage function of ith step, as shown in Eq (29); E denotes the empirical expectation of the time step. The advantage function is chosen at the state to compare the obtained score with the average score. If it is high, then the advantage function is positive. Otherwise, it is inverse. The gradient ascent method is used to update the value function.
(29)
where is the state action-value function, which represents the expected reward of the Agent following the policy , after performing an action ai in state xi until the end of the episode. Similarly, the state value function represents the expected reward of the Agent following the policy from the state xi to the end of the episode.
Because the PG algorithm adopts the online update policy to resampling every parameter update, its learning rate is not easy to determine. The PPO algorithm converts the online update strategy into an offline update strategy, i.e., a new and old Actor strategy is used. The training data of the new Actor can be obtained from the old Actor, while the new strategy weight is expressed using the ratio of action probabilities of the old and new strategies, which is expressed as Eq (30):
(30)
where is the sampled neural network parameter. If the probability distributions obtained for two neural network parameters and in the same state differ greatly and in the case of under-sampling, it leads to a large variance between them. Therefore, the PPO algorithm adds a CLIP function to the base of the objective function to limit the parameters and , given as follows:
(31)
The term “continuous control task” in our work refers to real - time optimization of continuous traction/braking forces without relying on predefined discrete actions or offline speed profiles. Unlike traditional methods that track fixed trajectories, our algorithms dynamically adjust acceleration/braking rates (Eq (22-(24) based on real - time states (position, speed, remaining time) and environmental conditions (speed limits, gradients). This is enabled by:
* EITOE: Expert rules (Sect 3.1) and the DMTD heuristic (Algorithm 1) generate smooth, continuous force adjustments.
* EITOP: The PPO - based reinforcement learning framework (Sect 3.2) optimizes continuous actions () in a policy gradient manner (Eq (25), allowing fine - grained control over acceleration/deceleration.
This approach eliminates abrupt state transitions (e.g., discrete coasting points in prior works [10,12]) and ensures seamless adaptation to varying line conditions (Sect 4.3).
While the core control logic is detailed in Algorithms 1 (EITOE) and 2 (EITOP), we acknowledge that the convergence analysis of PPO training could be elaborated further. For clarity: Control Steps:EITOP iteratively samples actions from a Gaussian policy (Eq 25) and updates actor-critic networks using clipped surrogate objectives (Eq 31). The reward function (Eq 26) penalizes energy consumption, comfort violations, and time deviations, ensuring balanced optimization. Convergence: Fig 5 (training curves) shows energy consumption and running time stabilize after 80 episodes, indicating policy convergence.
[Figure omitted. See PDF.]
4 Simulations
To verify the intelligence, flexibility, and robustness of EITOE and EITOP, we designed three numerical simulation experiments based on field data collected in YLBS. YLBS started operation in Beijing on December 30,2010, with a total length of 23.3, starting from Songjiazhuang station and ending at Ciqu station. The train type used in YLBS is DKZ32 EMU with 6 vehicles, whose parameters are shown in Table 1. To rigorously validate the suitability of EITOP for online control, we conducted experiments on a workstation with the following specifications: CPU: Intel i9-10900K (10 cores, 3.7 GHz) , GPU: NVIDIA RTX 3090 (24 GB VRAM), Memory: 64 GB DDR4 , Software: Python 3.8, TensorFlow 2.6.
[Figure omitted. See PDF.]
Three simulation cases are presented in this section. The manual driving dataset we use in this section was collected in YLBS from May 1, 2015, to May 27, 2015, including 100 groups of up trains and down trains. We select the manual driving data with the best-generalized performance from the recorded dataset as . In Case 1, we compare the results of all algorithms (, ITOR, STON, EITOE and EITOP). In Case 2, we test the intelligence and flexibility of all algorithms by varying the planned trip time of the same rail segment. In Case 3, we test the operational performance of EITO models with complex gradients and speed limits to verify the robustness of proposed EITOE and .
4.1 Case 1
Taking the interval between Rongjing station(RJ) and Wanyuanjie station(WYJ) as an example, the speed limit and gradient of this interval are shown in Fig 4. The planned trip time Tp =101 s is the same as the actual operation, and the distance between the two stations is 1280.
Figs 5 and 6 show the energy consumption E and the running time (/s) during the online learning process of EITOP. The results show that during DRL, the energy consumption is reduced from 380 to 364 after about 80 rounds of training and gradually approaches the optimal value. In addition, the running time floats within 100 s102 s (Tp = 101 s). According to the definition of punctuality i.e., a time error of less than 3 s is allowed. This indicates that applying the PPO algorithm in can reduce the energy consumption online while satisfying the running time error constraint.
It can be seen from Fig 7 that the and start to coast after accelerating to in the first two speed limit intervals, and the coast point of is more advanced. The coast distance of EITOE and EITOP are 899.78 and , respectively. Note that in the last speed limit interval, did not choose to coast at the position where started to coast. Instead, it decelerates slightly based on its current position, speed, and remaining trip time. It causes EITOP’s Ie and Ic can be further reduced compared to , which shows that can consider the constraints of mult-objectives in a more integrated way.
Figs 7 and 8 shows the speed distance curves of the five algorithms at a 101 s planned trip time. It can be seen the speed curve of EITOM can be divided into the full acceleration phase, coasting phase, and full braking phase. EITOE has the highest maximum speed of 19.49 , with four phases, the acceleration phase, multiple coasting phases, safety braking phase, and parking phase in its speed-distance curve. In addition, the coasting distances of , ITOR, and STON are 677.99, 398.78, and 661.46 respectively, which are significantly less than those of and , indicating that the proposed EITO algorithms have lower energy consumption.
We can see from Table 2 that all five algorithms meet the requirements of YLBS in terms of safety, punctuality, and parking accuracy. Among these algorithms, ITOR has the highest energy consumption. Compared with , ITOR is 1.7 higher than EITOE; STON is 11.7 lower than EITOM, EITOE is 34.9 lower than , and can further optimize energy consumption by 4.3 based on EITOE. In terms of comfort for all algorithms, EITOM has the highest Ic, indicating the worst passenger comfort, while the rest of algorithms have similar values for the comfort index, which is much less than EITOM. And EITOP has the best Ie and Ic in 101 s trip time.
[Figure omitted. See PDF.]
4.2 Case 2
In this case, we verified the flexibility of five algorithms in 95 s planned trip time and 115 s planned trip time by simulating different planned trip times in the same railroad section. Since an ATO system generally needs to have offline speed recommendation curves, it is difficult to dynamically adjust the trip time. Furthermore, increasing regenerative energy requires real-time reprogramming of the planned trip time for each train on the metro line. If the train model can adjust the arrival time in real-time according to the notification, the regenerative energy can be better utilized to achieve the energy-saving operation of the metro [29],[30].
Therefore, we similarly carried out two examples of dynamically adjusting the trip time (extending or reducing the trip time) by using EITOP to overcome the above shortcomings. In our simulations, such examples are called EITO with flexible adjustable trip times.
Fig 9 shows the speed distance curves for the five algorithms at a trip time of 95 s. ITOR has the highest maximum velocity of 22.22 and a shorter coasting distance, indicating that ITOR may have higher energy consumption and worse passenger comfort. STON has the lowest maximum velocity, but it decelerates too early in the second speed limit interval, resulting in a shorter coasting distance and higher energy cost. The speed distribution curves of EITOE and EITOP are similar as they are both smoother and have a longer coasting distance, indicating that both algorithms may perform better in terms of comfort and energy consumption. In addition, accelerates slightly in the last section where the speed-limited coasts, indicating that can adjust the arrival time of , which further illustrates the effectiveness of .
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
It can be seen from Table 2 that compared with EITOM, the energy consumption of ITOR is 5.2 higher than that of EITOM; the energy cost of STON is 8.8 lower than that of EITOM; the EITO algorithms perform more superiorly, both saving more than 45 in energy cost compared with . In addition, in terms of riding comfort of the five methods, has the largest Ic, while ITOR, STON, and have similar Ic, which is much smaller than the EITOM. has the best comfort with 2.92.
Fig 10 shows the speed distance curves for the five algorithms at a trip time of 115 s. The maximum speed of is 14.73, the maximum speed of ITOR and STON are 14.69 and 14.65, respectively. The maximum speed of and are 16.15 and 16.10, respectively. We can see that, compared with the speed distance curves at 95 s and 101 s planned trip times, their maximum velocities are much lower than the previous cases, indicating that they have lower average velocities and lower energy consumption.
[Figure omitted. See PDF.]
Furthermore, we can learn from Table 2 that all five algorithms meet the requirements in terms of safety, punctuality, and parking accuracy. Compared with , the energy consumption of ITOR is 0.5 higher than , and the energy consumption of is 4.8 higher than . The energy consumption of STON is 1.5 lower than , and the energy consumption of is 2.7 lower than . It is easy to see that has the highest energy consumption. Meanwhile, the EITOE’s comfort is also higher than the other three Intelligent operation models due to its multiple and large changes in u during the acceleration phase. In addition, although arrived 1 s earlier than the expected arrival time, the Ic and Ie of EITOP are better than . This result further illustrates that can dynamically optimize the train’s operating state, comprehensively considering the constraints of multiple objectives. In this instance, outperforms the other four algorithms in Ic and Ie.
Fig 11 shows the running curve of the with dynamically adjusted (earlier or later) arrival time on the RJ to WYJ rail section and the originally planned trip time is 101 s. It should be noted that 15 s Later is the speed curve where the train is informed of the 15 s later arrival and 10 s Earlier is the speed curve where the train is informed of the 10 s earlier arrival. The Constant trip is the speed curve when the train is running normally within the 101 s planned trip time.
[Figure omitted. See PDF.]
It can be seen from Fig 11 that the first example of 15 s Later, the train will be informed to arrive to the next station 15 s later after running for 30-. The results are shown in Fig 11 and Table 3. In Fig 11, as the current remaining trip time is extended from 71 s to 86 s, the train stops immediately accelerating and starts coasting. The train then continuously reduces its operating speed by braking. In addition, Table 3 summarizes the detailed performance of the after dynamically adjusting the trip time for the inter-station operation. The final running time of 15 s Later is 114 s, meeting the punctuality index (note that Tp has been changed to 116 s). However, the sudden delay in the train’s arrival time caused the train to decelerate in a larger u at the beginning of the last speed-limited section of the interval. As a result, the passengers may feel discomfort for the increase in Ic of EITOP.
[Figure omitted. See PDF.]
The second example is the situation when the train is informed to arrive at the station 10 s earlier after running for 10 s, which is the converse of the first one. It means that the train’s current remaining trip time suddenly reduces from 91 s to 81 s. By comparing the 10 s Earlier and Constant trip curves in Fig 11, we know that if the trip time decreases by 10 s due to an accident, EITOP will intelligently change its driving strategy and accelerate for the rest of the trip. Moreover, it can be known from Table 3 that the final running time is 88 s (note that Tp has been changed to 91 s), which almost exceeds the requirement of the punctuality. It implies that despite the application of PPO and the improvement of the general performance of the metro operation, the punctuality of train operation is still affected by sudden changes in arrival times. Overall, EITOP can be flexible to cope with variable trip times.
Concerning unexpected speed limit changes: During the operation, we will simulate temporary speed restrictions (e.g., due to track maintenance). Preliminary results confirm that EITOP adjusts braking/coasting strategies in real time to comply with new limits while minimizing energy consumption. This new scenario will further validate the algorithm’s effectiveness in handling unforeseen circumstances commonly encountered in real - world train operations, thus enhancing the reliability and practicality of our research findings.
In our study, we’ve already conducted tests on dynamic trip time adjustments, as presented in Case 2 of Sect 4.2. In these tests, EITOP has shown remarkable ability to adapt to sudden changes in the remaining journey time, as evidenced by Fig 11 and Table 3. Specifically, when it comes to sudden time reduction, if the train is notified to arrive 10 seconds earlier midway, EITOP promptly responds by dynamically increasing the acceleration. As clearly shown in the “10 s Earlier” curve in Fig 11, this adjustment enables the train to meet the revised schedule. This vividly demonstrates EITOP’s proficiency in quickly reacting to time - constrained scenarios. It can effectively optimize the train’s operation, ensuring punctuality even under tight time pressures.
Regarding sudden time extension, when the train is informed that it can arrive 15 seconds later, EITOP takes appropriate action. As depicted in the “15 s Later” curve of Fig 11, the algorithm reduces the train’s speed. By doing so, it manages to save energy while still maintaining punctuality. This not only showcases the adaptability of EITOP but also highlights its remarkable capacity to strike a balance between energy consumption and punctuality, two critical factors in train operation.
In conclusion, ITOR, STON, , and can all generate reasonable control strategies and meet operating requirements for different planned trip times, thus demonstrating the flexibility of them. Besides, can dynamically adjust the train operation strategy in real-time by being informed of different arrival times (earlier or later arrival), indicating that EITOP also has a degree of intelligence.
4.3 Case 3
Considering the most current studies have tested models within a single interval, and few tested the robustness of algorithms in continuous line intervals with complex speed limits and gradients. Here, taking the continuous station interval from Songjiazhuang Station(SJZ) to Xiaocun Station(XC) and then from XC to Xiaohongmen Station(XHM) as an example to test the robustness of and EITOP. As can be seen from Figs 4 and 12, the maximum gradient of Fig 12 (Case 3) is 500% of Fig 4 (Case 1), while the speed limits of the latter changes more dramatically.
[Figure omitted. See PDF.]
Among them, the length from SJZ to XC is 2,631 with a planned trip time of 190 , and the length from XC to XHM is 1,274 with a planned trip time of 108 s, i.e. the total trip time and total length are 290 s and 3,905. In this case, we ignore the stopping time, i.e., the train starts after arriving at the station and drives to the next station immediately.
Fig 13 shows the speed distance curves of algorithms running from SJZ to XHM. We can see that the trajectories of the two curves are similar, and the maximum speed of EITOP is slightly lower than that of , which indicates that may consume less energy than . The average inference time for EITOPto generate a control action (acceleration/deceleration) at each time step (0.02 s) is 2.1 ms, which is 10 faster than the required control interval. This ensures real-time applicability even under strict operational deadlines.
[Figure omitted. See PDF.]
In addition, when the train accelerates to the coasting point in the [1161, 2501] interval and starts coasting, the train gains positive acceleration. It means dangerous driving when accelerates to exceed the speed limit. The reason is that the interval has a downhill section with a gradients value of -0.008. However, due to the supervision of the Sect 3.1 safe speed , the train immediately adopts the maximum deceleration when the train is about to exceed the safe speed. This situation verifies that the proposed EITO algorithms can ensure the safety of train trips even with complicated speed limits and gradients.
The performance of EITO algorithms in complex continuous lines is shown in Table 4. The Ic of both algorithms are larger due to the more lengthy and complex line conditions. And the train has to a stop and launch operation at the XC. However, both algorithms ensure that the train arrives at its destination safely and on time. The arrival times of EITOE to XC and XHM are 188 s and 112 s, respectively, while the arrival times of EITOP to XC and XHM are 189 s and 109 s, respectively, and the total travel times of the two algorithms are 300 s and 298 s, respectively. It can be seen that EITOP outperforms EITOE in terms of both inter-station and total travel on-time performance. Furthermore, EITOP’s Ic and Ie are also lower than EITOE. It also indicates that applying the PPO algorithm based on can effectively optimize the indicators of punctuality, energy-saving and comfort of EITOE.
[Figure omitted. See PDF.]
It can be seen from above analysis that energy-saving and comfort of both EITOE and EITOP have decreased moderately under the complex line conditions. However, both algorithms can ensure the train operate safely and punctually. This indicates that the proposed algorithms have good robustness. Dynamic Time Adjustment: Case 2 (Sect 4.2) demonstrates EITOp’s ability to adapt to sudden trip time changes (e.g., s). The reward function (Eq 26) penalizes time deviations (), incentivizing the agent to adjust acceleration/coasting phases dynamically (Fig 11). For disturbance suppression (e.g., resistance uncertainty), EITOp’s model-free PPO framework inherently adapts to unmodeled dynamics. We will include a dedicated robustness test (e.g., sudden resistance changes) in future work. The current experiments (Cases 1–3) validate EITOP’s adaptability to: Variable trip times (95 s, 101 s, 115 s). Complex gradients and speed limits (Fig 12). Mid-journey schedule updates (Fig 11). These scenarios inherently cover “unpredictable conditions” by testing the algorithm’s ability to replan trajectories in real time without prior offline profiles.
5 Conclusion
In this study, two EITO algorithms for intelligent train operation are proposed for addressing continuous metro operation control tasks, showcasing their ability to operate without the need for tracking offline speed profiles or relying on exact train model information. Our approach leverages an expert system to generate EITOE outputs based on driver experience and redefines key elements of the Proximal Policy Optimization (PPO) algorithm to develop EITOP, which optimizes multiple operational objectives online. Through comparative analysis with existing intelligent driving algorithms and manual driving data, we demonstrated the superiority of our proposed algorithms in terms of safety, punctuality, energy efficiency, and passenger comfort. From the research and results of this work, several key conclusions can be drawn:
Flexible Intelligent Control: The EITO algorithms demonstrate the viability of intelligent train operations adaptable to real-time conditions, enhancing efficiency and adaptability beyond traditional, preset profiles.
Expert System Collaboration: Integrating expert system insights with EITOE significantly bolsters performance, underscoring the collaboration between human expertise and machine learning in intelligent train operations.
Multi-Objective Optimization: EITOP utilizes PPO and stands out as it comprehensively manages safety, punctuality, energy efficiency, and comfort, which are often conflicting objectives, through an overall control strategy.
Robustness in Complexity: Both EITOE and EITOP showcase robust performance in complex operational environments, with EITOP particularly adept at adjusting to varying trip times, crucial for real-world operational efficiency.
Energy and Comfort Superiority: Our algorithms surpass current methods in energy conservation and passenger comfort, tackling critical urban rail transit challenges and validating their practical application.
While our algorithms are promising, future work will focus on enhancing EITOP’s dynamic adjustment capabilities and exploring cooperative control strategies for energy savings, including optimizing train schedules for regenerative energy utilization.
Acknowledgments
Data availability statement
References
1. 1. Yin J, Tang T, Yang L, Xun J, Huang Y, Gao Z. Research and development of automatic train operation for railway transportation systems: A survey. Transport Res Part C: Emerg Technol. 2017;85:548–72.
* View Article
* Google Scholar
2. 2. Tu Y, Lin S, Qiao J. Deep traffic congestion prediction model based on road segment grouping. Appl Intell. 2021;51:8519–41.
* View Article
* Google Scholar
3. 3. Yu L, Cui M, Dai S. Deviation of peak hours for metro stations based on least square support vector machine. PLoS One. 2023;18(9):e0291497. pmid:37703275
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Khmelnitsky E. On an optimal control problem of train operation. IEEE Trans Automat Contr. 2000;45(7):1257–66.
* View Article
* Google Scholar
5. 5. Yang X, Ning B, Li X, Tang T. A two-objective timetable optimization model in subway systems. IEEE Trans Intell Transport Syst. 2014;15(5):1913–21.
* View Article
* Google Scholar
6. 6. Wang Y, Ning B, Tang T, van den Boom TJJ, De Schutter B. Efficient real-time train scheduling for urban rail transit systems using iterative convex programming. IEEE Trans Intell Transport Syst. 2015;16(6):3337–52.
* View Article
* Google Scholar
7. 7. ShangGuan W, Yan X-H, Cai B-G, Wang J. Multiobjective optimization for train speed trajectory in CTCS high-speed railway with hybrid evolutionary algorithm. IEEE Trans Intell Transport Syst. 2015;16(4):2215–25.
* View Article
* Google Scholar
8. 8. Akba S, Sylemez M. Coasting point optimisation for mass rail transit lines using artificial neural networks and genetic algorithms. IET Electr Power Appl. 2008;2(3):172–82.
* View Article
* Google Scholar
9. 9. Yang L, Li K, Gao Z, Li X. Optimizing trains movement on a railway network. Omega. 2012;40(5):619–33.
* View Article
* Google Scholar
10. 10. Yin J, Chen D, Li L. Intelligent train operation algorithms for subway by expert system and reinforcement learning. IEEE Trans Intell Transp Syst. 2014;15(6):2561–71.
* View Article
* Google Scholar
11. 11. Zhang C, Chen D, Yin J, Chen L. Data-driven train operation models based on data mining and driving experience for the diesel-electric locomotive. Adv Eng Inform. 2016;30(3):553–63.
* View Article
* Google Scholar
12. 12. Zhou K, Song S, Xue A, You K, Wu H. Smart train operation algorithms based on expert knowledge and reinforcement learning. IEEE Trans Syst Man Cybern Syst. 2020.
* View Article
* Google Scholar
13. 13. Ke B-R, Lin C-L, Lai C-W. Optimization of train-speed trajectory and control for mass rapid transit systems. Control Eng Pract. 2011;19(7):675–87.
* View Article
* Google Scholar
14. 14. Yong-Duan Song, Qi Song, Wen-Chuan Cai. Fault-tolerant adaptive control of high-speed trains under traction/braking failures: A virtual parameter-based approach. IEEE Trans Intell Transport Syst. 2014;15(2):737–48. https://doi.org/10.1109/tits.2013.2290310
15. 15. Liu WY, Han JG, Lu XN. A high speed railway control system based on the fuzzy control method. Expert Syst Appl. 2013;40(15):6115–24.
* View Article
* Google Scholar
16. 16. Gu Q, Tang T, Cao F, Song Y. Energy-efficient train operation in urban rail transit using real-time traffic information. IEEE Trans Intell Transp Syst. 2014;15(3):1216–33.
* View Article
* Google Scholar
17. 17. Gao S, Dong H, Chen Y, Ning B, Chen G, Yang X. Approximation-based robust adaptive automatic train control: An approach for actuator saturation. IEEE Trans Intell Transp Syst. 2013;14(4):1733–42.
* View Article
* Google Scholar
18. 18. Pu Q, Zhu X, Zhang R, Liu J, Cai D, Fu G. Speed profile tracking by an adaptive controller for subway train based on neural network and PID algorithm. IEEE Trans Veh Technol. 2020;69(10):10656–67.
* View Article
* Google Scholar
19. 19. Wen Y, Si J, Brandt A, Gao X, Huang HH. Online reinforcement learning control for the personalization of a robotic knee prosthesis. IEEE Trans Cybern. 2020;50(6):2346–56. pmid:30668514
* View Article
* PubMed/NCBI
* Google Scholar
20. 20. Xian B, Zhang X, Zhang H, Gu X. Robust adaptive control for a small unmanned helicopter using reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2022;33(12):7589–97. pmid:34125690
* View Article
* PubMed/NCBI
* Google Scholar
21. 21. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: A brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.
* View Article
* Google Scholar
22. 22. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971; 2015.
* View Article
* Google Scholar
23. 23. Song Q, Song Y, Tang T, Ning B. Computationally inexpensive tracking control of high-speed trains with traction/braking saturation. IEEE Trans Intell Transp Syst. 2011;12(4):1116–25.
* View Article
* Google Scholar
24. 24. Ruelens F, Iacovella S, Claessens B, Belmans R. Learning agent for a heat-pump thermostat with a set-back strategy using model-free reinforcement learning. Energies. 2015;8(8):8300–18.
* View Article
* Google Scholar
25. 25. Song Q, Song Y, Cai W. Adaptive backstepping control of train systems with traction/braking dynamics and uncertain resistive forces. Energies. 2011;49(9):1441–54.
* View Article
* Google Scholar
26. 26. Murrell S, Plant RT. A survey of tools for the validation and verification of knowledge - based systems: 1985–1995. Decis Support Syst. 1997;21(4):307–23.
* View Article
* Google Scholar
27. 27. Li G, Gomez R, Nakamura K, He B. Human-centered reinforcement learning: A survey. IEEE Trans Human-Mach Syst. 2019;49(4):337–49.
* View Article
* Google Scholar
28. 28. Gu Y, Cheng Y, Chen CLP, Wang X. Proximal policy optimization with policy feedback. IEEE Trans Syst Man Cybern, Syst. 2022;52(7):4600–10.
* View Article
* Google Scholar
29. 29. Chen W, Yang J, Khasawneh MT, Fu J, Sun B. Rules of incidental operation risk propagation in metro networks under fully automatic operations mode. PLoS One. 2021;16(12):e0261436. pmid:34914807
* View Article
* PubMed/NCBI
* Google Scholar
30. 30. Chen D, Chen R, Li Y, Tang T. Online learning algorithms for train automatic stop control using precise location data of balises. IEEE Trans Intell Transp Syst. 2013;14(3):1526–35.
* View Article
* Google Scholar
Citation: Huang Y, Lai W, Chen D, Lin G, Yin J (2025) Enhanced intelligent train operation algorithms for metro train based on expert system and deep reinforcement learning. PLoS One 20(5): e0323478. https://doi.org/10.1371/journal.pone.0323478
About the Authors:
Yunhu Huang
Contributed equally to this work with: Yunhu Huang, Wenzhu Lai
Roles: Conceptualization, Formal analysis, Investigation, Writing – original draft
Affiliation: College of Computer and Data Science, Minjiang University, Fuzhou, Fujian, China
ORICD: https://orcid.org/0000-0002-6265-0274
Wenzhu Lai
Contributed equally to this work with: Yunhu Huang, Wenzhu Lai
Roles: Writing – original draft, Writing – review & editing
Affiliation: Digital Banking Laboratory of Industrial and Commercial Bank of China Software Development Center, Zhuhai, Guangdong, China
Dewang Chen
Roles: Conceptualization, Methodology
E-mail: [email protected], [email protected], [email protected], [email protected]
Affiliations: School of Transportation, Fujian University of Technology, Fuzhou, Fujian, China, Fujian BeiDou Navigation and Intelligent Transportation Collaborative Innovation Center, Fuzhou, Fujian, China
Geng Lin
Roles: Writing – review & editing
E-mail: [email protected], [email protected], [email protected], [email protected]
Affiliation: College of Computer and Data Science, Minjiang University, Fuzhou, Fujian, China
Jiateng Yin
Roles: Validation, Writing – review & editing
E-mail: [email protected], [email protected], [email protected], [email protected]
Affiliation: State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
[/RAW_REF_TEXT]
1. Yin J, Tang T, Yang L, Xun J, Huang Y, Gao Z. Research and development of automatic train operation for railway transportation systems: A survey. Transport Res Part C: Emerg Technol. 2017;85:548–72.
2. Tu Y, Lin S, Qiao J. Deep traffic congestion prediction model based on road segment grouping. Appl Intell. 2021;51:8519–41.
3. Yu L, Cui M, Dai S. Deviation of peak hours for metro stations based on least square support vector machine. PLoS One. 2023;18(9):e0291497. pmid:37703275
4. Khmelnitsky E. On an optimal control problem of train operation. IEEE Trans Automat Contr. 2000;45(7):1257–66.
5. Yang X, Ning B, Li X, Tang T. A two-objective timetable optimization model in subway systems. IEEE Trans Intell Transport Syst. 2014;15(5):1913–21.
6. Wang Y, Ning B, Tang T, van den Boom TJJ, De Schutter B. Efficient real-time train scheduling for urban rail transit systems using iterative convex programming. IEEE Trans Intell Transport Syst. 2015;16(6):3337–52.
7. ShangGuan W, Yan X-H, Cai B-G, Wang J. Multiobjective optimization for train speed trajectory in CTCS high-speed railway with hybrid evolutionary algorithm. IEEE Trans Intell Transport Syst. 2015;16(4):2215–25.
8. Akba S, Sylemez M. Coasting point optimisation for mass rail transit lines using artificial neural networks and genetic algorithms. IET Electr Power Appl. 2008;2(3):172–82.
9. Yang L, Li K, Gao Z, Li X. Optimizing trains movement on a railway network. Omega. 2012;40(5):619–33.
10. Yin J, Chen D, Li L. Intelligent train operation algorithms for subway by expert system and reinforcement learning. IEEE Trans Intell Transp Syst. 2014;15(6):2561–71.
11. Zhang C, Chen D, Yin J, Chen L. Data-driven train operation models based on data mining and driving experience for the diesel-electric locomotive. Adv Eng Inform. 2016;30(3):553–63.
12. Zhou K, Song S, Xue A, You K, Wu H. Smart train operation algorithms based on expert knowledge and reinforcement learning. IEEE Trans Syst Man Cybern Syst. 2020.
13. Ke B-R, Lin C-L, Lai C-W. Optimization of train-speed trajectory and control for mass rapid transit systems. Control Eng Pract. 2011;19(7):675–87.
14. Yong-Duan Song, Qi Song, Wen-Chuan Cai. Fault-tolerant adaptive control of high-speed trains under traction/braking failures: A virtual parameter-based approach. IEEE Trans Intell Transport Syst. 2014;15(2):737–48. https://doi.org/10.1109/tits.2013.2290310
15. Liu WY, Han JG, Lu XN. A high speed railway control system based on the fuzzy control method. Expert Syst Appl. 2013;40(15):6115–24.
16. Gu Q, Tang T, Cao F, Song Y. Energy-efficient train operation in urban rail transit using real-time traffic information. IEEE Trans Intell Transp Syst. 2014;15(3):1216–33.
17. Gao S, Dong H, Chen Y, Ning B, Chen G, Yang X. Approximation-based robust adaptive automatic train control: An approach for actuator saturation. IEEE Trans Intell Transp Syst. 2013;14(4):1733–42.
18. Pu Q, Zhu X, Zhang R, Liu J, Cai D, Fu G. Speed profile tracking by an adaptive controller for subway train based on neural network and PID algorithm. IEEE Trans Veh Technol. 2020;69(10):10656–67.
19. Wen Y, Si J, Brandt A, Gao X, Huang HH. Online reinforcement learning control for the personalization of a robotic knee prosthesis. IEEE Trans Cybern. 2020;50(6):2346–56. pmid:30668514
20. Xian B, Zhang X, Zhang H, Gu X. Robust adaptive control for a small unmanned helicopter using reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2022;33(12):7589–97. pmid:34125690
21. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: A brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.
22. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971; 2015.
23. Song Q, Song Y, Tang T, Ning B. Computationally inexpensive tracking control of high-speed trains with traction/braking saturation. IEEE Trans Intell Transp Syst. 2011;12(4):1116–25.
24. Ruelens F, Iacovella S, Claessens B, Belmans R. Learning agent for a heat-pump thermostat with a set-back strategy using model-free reinforcement learning. Energies. 2015;8(8):8300–18.
25. Song Q, Song Y, Cai W. Adaptive backstepping control of train systems with traction/braking dynamics and uncertain resistive forces. Energies. 2011;49(9):1441–54.
26. Murrell S, Plant RT. A survey of tools for the validation and verification of knowledge - based systems: 1985–1995. Decis Support Syst. 1997;21(4):307–23.
27. Li G, Gomez R, Nakamura K, He B. Human-centered reinforcement learning: A survey. IEEE Trans Human-Mach Syst. 2019;49(4):337–49.
28. Gu Y, Cheng Y, Chen CLP, Wang X. Proximal policy optimization with policy feedback. IEEE Trans Syst Man Cybern, Syst. 2022;52(7):4600–10.
29. Chen W, Yang J, Khasawneh MT, Fu J, Sun B. Rules of incidental operation risk propagation in metro networks under fully automatic operations mode. PLoS One. 2021;16(12):e0261436. pmid:34914807
30. Chen D, Chen R, Li Y, Tang T. Online learning algorithms for train automatic stop control using precise location data of balises. IEEE Trans Intell Transp Syst. 2013;14(3):1526–35.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In recent decades, automatic train operation (ATO) systems have been gradually adopted by many metro systems, primarily due to their cost-effectiveness and practicality. However, a critical examination reveals computational constraints, adaptability to unforeseen conditions and multi-objective balancing that our research aims to address. In this paper, expert knowledge is combined with deep reinforcement learning algorithm (Proximal Policy Optimization, PPO) and two enhanced intelligent train operation algorithms (EITO) are proposed. The first algorithm, EITOE, is based on an expert system containing expert rules and a heuristic expert inference method. On the basis of EITOE, we propose EITOP algorithm using the PPO algorithm to optimize multiple objectives by designing reinforcement learning strategies, rewards, and value functions. We also develop the double minimal-time distribution (DMTD) calculation method in the EITO implementation to achieve longer coasting distances and further optimize the energy consumption. Compared with previous works, EITO enables the control of continuous train operation without reference to offline speed profiles and optimizes several key performance indicators online. Finally, we conducted comparative tests of the manual driving, intelligent driving algorithm (ITOR, STON), and the algorithms proposed in this paper, EITO, using real line data from the Yizhuang Line of Beijing Metro (YLBS). The test results show that the EITO outperform the current intelligent driving algorithms and manual driving in terms of energy consumption and passengers’ comfort. In addition, we further validated the robustness of EITO by selecting some complex lines with speed limits, gradients and different running times for testing on the YLBS. Overall, the EITOP algorithm has the best performance.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer