Content area
As urban traffic complexity continues to rise, challenges related to traffic efficiency, fuel consumption, and safety are becoming increasingly critical. These issues underline the need for multi-objective trajectory optimization models, particularly in environments where both automated and human-driven vehicles coexist. Therefore, this paper developed a multi-objective trajectory planning model utilizing the TD3 algorithm. Here, we design the state space, action space, and reward function, where the state space encompasses variables such as speed, relative speed, distance to the stop line, relative position, phase state, and remaining phase duration, and the action space outputs optimal acceleration and deceleration. The reward function integrates multiple objectives, including safety, fuel consumption, and traffic efficiency. The model is verified using the SUMO tool, examining different levels of CAV penetration and varying traffic flows. The results demonstrate that as CAV penetration increases, vehicle trajectories become increasingly smooth, leading to reductions in average travel time, fuel consumption, and queue length. Specifically, at 100% CAV penetration with a traffic flow of 600 pcu/h, the highest optimization rate for average travel time reaches 15.38%. For average fuel consumption, the peak optimization rate of 19.53% occurs at a traffic flow of 800 pcu/h. Furthermore, under conditions of 300 pcu/h and 400 pcu/h traffic flow, 100% CAV penetration eliminates queues entirely. Beyond 400 pcu/h, minimal queues form with 100% CAV penetration. These results indicate that autonomous driving technology can effectively enhance the efficiency and sustainability of transportation systems, providing robust support for urban traffic management strategies. In particular, under high-density and mixed traffic conditions, the trajectory optimization model significantly improves traffic flow, reduces congestion, decreases energy consumption, and lowers the incidence of traffic accidents, thereby offering a theoretical foundation for the implementation of intelligent transportation systems.
Introduction
Air pollution is primarily attributed to emissions from motor vehicles, with vehicle exhaust serving as the principal contributing factor. According to the China Mobile Source Environmental Management Annual Report [1], total carbon emissions from automobiles in China reached 7.43 million tons in 2022, while hydrocarbon oxides and nitrogen oxides emissions amounted to 1.912 million tons and 5.267 million tons, respectively. The environmental impact of automobile fuel consumption and exhaust emissions is significant and cannot be overlooked. The presence of traffic lights at urban intersections results in frequent vehicle starts and stops, acceleration, and deceleration, which in turn increases fuel consumption, exhaust emissions, and travel time. Therefore, it is crucial to investigate strategies for optimizing vehicle passage through signalized intersections to maximize efficiency and minimize fuel consumption.
In recent years, the rapid development of cooperative vehicle-infrastructure systems (CVIS) and connected automated vehicle (CAV) technologies has positioned vehicle trajectory planning as an effective strategy for enhancing traffic efficiency, reducing fuel consumption, and minimizing air pollutant emissions [2]. Trajectory planning entails providing vehicles with real-time optimal driving speeds, thus enabling them to pass through intersections without stopping and improving traffic flow at these locations [3, 4–5]. However, much of the current research on trajectory planning primarily addresses the reduction of fuel consumption or the enhancement of traffic efficiency [6, 7–8], often overlooking the critical issue of inter-vehicle safety. In such situations, vehicles may exhibit behaviors such as sudden braking or rapid acceleration in efforts to minimize fuel use and navigate intersections without stopping, which can significantly compromise the safety of surrounding vehicles [9].
Furthermore, trajectory optimization models typically rely on fixed acceleration and deceleration rates or cruising speeds, which imposes limitations on real-world applications. The rapid advancement of vehicular communication technologies has facilitated effective information exchange and sharing among vehicles, providing viable solutions to traffic congestion. By depending on accurate and predictable driving behaviors, CAVs can significantly reduce collision risks, traffic delays, and energy consumption [10]. However, the establishment of a 100% CAV traffic environment is unlikely to occur in the near future, thus, mixed traffic scenarios that include both automated and human-driven vehicles will continue to exist [11, 12]. Therefore, the study of trajectory optimization in mixed traffic environments with varying CAV penetration rates holds substantial significance.
In response to the challenges identified, this paper proposes a multi-objective trajectory optimization control method for mixed environments that include both automated and human-driven vehicles. This method is based on the Twin delayed deep deterministic policy gradient (TD3) algorithm to develop a trajectory planning model for CAVs. The implementation involves the design of state space, action space, and reward function. The reward function incorporates multiple objectives—safety, fuel consumption, and traffic efficiency—effectively ensuring driving safety while simultaneously reducing fuel consumption and enhancing traffic efficiency.
Trajectory planning relies on the perception and prediction of traffic information utilizing vehicle sensors and remote infrastructure, thereby delivering real-time speed recommendations that enhance traffic operational efficiency. Recent advancements in vehicular communication technologies have facilitated more effective information exchange and sharing among vehicles, providing a technological foundation for precise control and trajectory planning of automated vehicles.
To optimize the utilization of CAVs, numerous trajectory planning optimization models have been developed by researchers. For instance, Wu et al. [13] introduced a trajectory optimization method aimed at minimizing a weighted combination of total bus delay and car delay, taking into account the significant passenger volume and the large size of buses. Ma et al. [14] proposed a self-organizing CAVs (SOCAVs) trajectory planning strategy, allowing vehicles to operate independently of traffic signals, which enables SOCAVs to effectively guide platoons through intersections. Additionally, Yu et al. [15] presented a mixed-integer linear programming model to cooperatively optimize the trajectories of CAVs along a corridor for system optimality. Results indicated that this model significantly reduces vehicle delays and increases throughput compared to fixed signal timing. However, the aforementioned studies primarily focus on vehicle delays in the trajectory planning process, often overlooking the effects of speed variations (acceleration and deceleration) on energy consumption. This oversight may result in decreased economic and ecological efficiency of the optimization models.
Eco-driving effectively mitigates the increase in energy consumption associated with frequent vehicle acceleration and deceleration. By providing real-time driving suggestions that account for both vehicle delays and fuel consumption, this approach enhances the economic and ecological efficiency of transportation systems. Feng et al. [16] introduced an adaptive coupling control (ACC) method based on vehicle platooning, aimed at optimizing signal timing and vehicle trajectories under mixed traffic conditions, which demonstrated notable improvements in fuel consumption and delays. Hou et al. [17] developed an eco-driving strategy that optimizes intersection signal timing by considering both vehicle arrival times and fuel consumption. Wang et al. [18] proposed a reinforcement learning-based trajectory prediction method designed to reduce traffic oscillation caused by queuing during red light periods by incorporating downstream vehicle trajectory predictions into the reward function. Yao et al. [19] developed a two-level optimization method based on CAV scheduling and trajectory planning to minimize trip delays and energy consumption, while implementing a rolling optimization strategy to enhance model applicability. Numerous studies with similar objectives have also been conducted [20, 21]. Although these methods address delays and energy consumption, they primarily focus on longitudinal trajectory planning for CAVs (e.g., acceleration profiles), often neglecting lateral trajectory changes, such as lane changes. To fill this gap, Ma et al. [22] optimized the acceleration profiles and lane choices of CAVs within the planning horizon in a centralized way while considering interactions between vehicle driving behaviors.
With the ongoing development of trajectory planning strategies, attention has increasingly shifted toward safety concerns within the trajectory planning process. Ying et al. [23] proposed an optimization framework that integrates a comprehensive CAV trajectory planning model with traffic signal control. This framework establishes a discrete-time two-level optimization model designed to minimize total vehicle delays while incorporating constraints related to traffic signals, lane assignments, and safety. In most studies on trajectory planning, safety has received some attention, however, it has typically been treated merely as a constraint rather than incorporated into the objective function. Consequently, safety is often regarded as a constraint in trajectory planning, which fails to effectively capture the benefits associated with safety measures. Therefore, designing a reward function that comprehensively considers delays, energy consumption, and safety is essential to ensure optimal model performance [24].
However, recent advancements in artificial intelligence technology have led to increased scholarly attention on trajectory optimization methods based on reinforcement learning [25, 26]. Such methods allow for continuous interaction with the surrounding environment during training, enabling the real-time generation of optimal trajectories for autonomous vehicles based on varying state information and reward signals. To optimize the trajectories of autonomous vehicles in mixed traffic scenarios, Guo et al. [27] developed a robust framework known as DP-TP3 (dynamic programming with trajectory planning utilizing piecewise polynomials as a subroutine). Additionally, a real-time learning and control framework for signalized intersections, termed DRL-TP3, was proposed by Guo et al. [28], leveraging simulation data from diverse mixed traffic conditions to train the model. This framework employs the TD3 algorithm for the physical modeling of CAV trajectories, resulting in optimal control strategies that significantly enhance vehicle throughput and reduce delays across different penetration rates. Furthermore, Gao et al. [29] developed a deep learning-based lane-level prediction model for mixed traffic flow on urban expressway, which accurately predicts traffic conditions at varying penetration rates and facilitates effective vehicle lane guidance.
In light of the above considerations, this paper presents a multi-objective reward function that addresses safety, fuel consumption, and traffic efficiency, treating CAVs as agents within mixed traffic scenarios and employing the TD3 algorithm for optimal trajectory planning. The primary contributions of this research are summarized as follows:
1. The research incorporates multiple objectives—including safety, fuel consumption, and traffic efficiency—into the design of the reward function. By considering these diverse factors, a comprehensive approach to optimizing CAV trajectories is ensured, effectively addressing the complex challenges associated with mixed traffic flow.
2. The study validates the effectiveness of the proposed algorithm using the SUMO to simulate various scenarios with different CAV penetration rates and traffic flow conditions. By demonstrating the model’s capability to reduce average travel time, fuel consumption, and queue lengths, valuable insights are provided for advancing the field of CAV speed control and traffic optimization strategies.
Methods
Trajectory planning model
The challenges of queuing, congestion, and frequent speed adjustments lead to extended travel durations and increased fuel consumption for vehicles. Accordingly, the current study focuses on the mixed traffic flow scenario involving CAVs and human-driven vehicles (HVs) at signalized intersections. The TD3 algorithm is employed to optimize CAV trajectories. Average travel time, average fuel consumption, and average queue length are utilized as evaluation metrics to assess the effectiveness of the TD3 algorithm. Figure 1 depicts the system architecture of the TD3 algorithm, which further illustrates its application within the trajectory optimization process.
[See PDF for image]
Fig. 1
System architecture of the TD3 algorithm
Model operation process
Figure 2 illustrates the workflow of the CAV trajectory planning model. The parameters of the data input layer in the workflow diagram primarily encompass traffic flow, CAV penetration rate, signal timing information, vehicle status, and various environmental factors (such as road conditions and types). These input data serve as the foundation for the model's trajectory optimization decisions. By utilizing precise input data, the system facilitates informed decisions regarding acceleration, deceleration, and path planning based on varying environmental states, traffic conditions, and vehicle statuses.
[See PDF for image]
Fig. 2
CAV trajectory planning model
Traffic flow is a critical input parameter in any traffic model, as it directly influences road congestion, average vehicle speeds, and overall traffic system efficiency. In this study, fluctuations in traffic flow significantly impact the trajectory optimization of mixed traffic flows that include both CAVs and HVs. Under different flow conditions, the behaviors of CAVs and HVs may diverge, necessitating consideration of this factor to capture dynamic changes across various traffic scenarios.
The CAV penetration rate indicates the proportion of CAVs within the total vehicle count at any given time. Variations in penetration rates have direct implications for roadway traffic flow and safety. At low penetration rates, the behavior of CAVs is constrained by HVs, while higher penetration rates allow CAVs to communicate and cooperate more effectively, enhancing traffic efficiency and safety.
Additionally, the state of traffic signals directly influences vehicle behavior at intersections, particularly in mixed scenarios involving both CAVs and HVs, as traffic signal states determine decision-making timing. For instance, CAVs may efficiently adjust their speeds through interaction with traffic signal systems, whereas human drivers typically rely on visual cues and personal experience.
Vehicle status includes critical information such as position, speed, acceleration, relative speed, and relative position, forming the basis for optimizing CAV trajectories. A vehicle's current position and speed dictate its actions at intersections (e.g., acceleration, deceleration, turning), while relative speed and position assist in collision avoidance and maintaining appropriate following distances. In mixed traffic environments, notable behavioral differences exist between CAVs and HVs. CAVs can compute acceleration and deceleration decisions with greater precision, while human drivers may respond based on experience. Therefore, capturing vehicle status information is essential for simulating these behavioral differences and enhancing simulation accuracy.
Other environmental factors (e.g., road conditions and types) play a significant role in traffic modeling. Different road types (such as urban roads and highways) affect vehicle behavior in distinct ways, for example, vehicles may achieve higher maximum speeds on highways, whereas the influences of traffic flow and signals are more pronounced on urban roads. Furthermore, varying road conditions, including factors such as road width, the number of lane, and road markings, also impact vehicle decision-making regarding acceleration, deceleration, and lane changes.
A single CAV is treated as an agent within the road network, with the intersection representing the environment. The state space includes a series of variables that influence the trajectory of the CAV. The agent receives the current state space from the environment, selects an output action based on this state space, and formulates the subsequent output action in response to the reward obtained after executing the action.
State and action space
In a road network, the driving decisions of a CAV are influenced by various factors, including the driving conditions of vehicles in the same lane, the current signal status, and the remaining phase duration. Incorporating all these factors as state variables would result in an excessively large state space and slow computational speeds, potentially causing convergence issues when certain factors exert minimal influence on the CAV’s driving decisions. Therefore, identifying and filtering out the key influencing factors is essential.
During the operation of a CAV, one of the critical factors directly influencing its decision-making process is the driving state of the preceding vehicle. The driving states of surrounding vehicles also indirectly affect the decisions made by the CAV. The state space encompasses the driving states of both the CAV and the leading vehicle. Among the various driving state data available, factors such as headway and spacing can be derived from the relative speed and position of the two vehicles, while acceleration and deceleration can be inferred from the speeds at two consecutive time points, eliminating the need for these variables to be input separately. The variations in the trajectories of CAVs are illustrated in Fig. 3. Moreover, when navigating a signalized intersection, the current signal phase and its remaining duration play a direct role in shaping the CAV’s decision-making process.
[See PDF for image]
Fig. 3
Trajectories of CAVs
In summary, the state space for the CAV is defined as
1
where is the speed of vehicle , measured in m/s; represents the relative speed to the preceding vehicle, measured in m/s; represents the distance from vehicle to the stop line; represents the relative position to the preceding vehicle, measured in m; is the current phase state, expressed by one-hot encoding. If the current phase is red, it is encoded as (1, 0, 0), for yellow, (0, 1, 0), and for green, (0, 0, 1).; represents the remaining duration of the current phase, measured in s.Setting the output action to the vehicle’s speed may lead to impractical speed fluctuations during simulations. Therefore, the acceleration and deceleration of the vehicle are selected as the output actions, with the range for acceleration and deceleration specified as – 3 m/s2 to 3 m/s2. The specified range for acceleration/deceleration is primarily based on two considerations.
Actual driving behavior: the acceleration of conventional vehicles typically ranges from 0.5 m/s2 to 2.5 m/s2 under normal conditions, while during overtaking maneuvers, it can reach up to 3 m/s2. The normal deceleration of vehicles usually falls between − 2 m/s2 and – 4 m/s2. To mitigate the risk of rear-end collisions caused by emergency braking, a minimum deceleration of − 3 m/s2 has been selected for this study.
Traffic simulation software: the acceleration and deceleration ranges were also informed by commonly used traffic simulation software, such as the Simulation of Urban Mobility (SUMO), which typically employs similar values for acceleration and deceleration parameters in its simulations.
Reward function
The design of the reward function should comprehensively consider security, fuel consumption, and traffic efficiency.
(1) Security.
The safety-oriented reward function encompasses the maintenance of safe speeds and distances between vehicles, as well as the prevention of vehicles from running red lights. Time to contact (TTC) is utilized as an assessment tool for traffic safety, representing the time required for a collision to potentially occur with an obstacle, such as the vehicle ahead, given the current speed and position. A decreased TTC indicates an increased risk of collision and a diminished level of safety. TTC is defined as follows:
2
The reward function based on the TTC settings is presented as follows:
3
Maintaining a short TTC can enhance traffic efficiency under safe conditions, as a reduced TTC allows for a greater number of vehicles to operate on the roadway. However, a short TTC also increases the likelihood of vehicle collisions. To achieve a balance between traffic efficiency and safety, reward values can be employed to inform TTC settings. When the TTC to the leading vehicle falls below 0.5, a reward value of − 1 is assigned to discourage close proximity and reduce the potential for collisions. As the TTC gradually increases, following a logarithmic relationship, the reward value incrementally rises while converging toward 0. This methodology promotes the maintenance of a substantial and safe following distance. By implementing this reward value assignment, a harmonious equilibrium between traffic efficiency and vehicle operational safety can be achieved.
To prevent incidents of red-light running by CAVs in the interest of traffic efficiency, the red-light running reward function is defined accordingly. When a CAV runs a red light, a reward value of − 50 is assigned, conversely, if the CAV complies with the traffic signal and stops at the red light, a reward value of 0 is assigned.
In conclusion, the reward function pertaining to vehicle safety is outlined as follows:
4
(2) Fuel Consumption.
The primary factor contributing to high fuel consumption is the frequent acceleration and deceleration of vehicles. Consequently, both the fuel consumption model and the rates of acceleration and deceleration are considered when defining the fuel consumption reward function.
1) Fuel consumption model.
In this study, the Virginal Tech Microscopic Model (VT-Micro model) developed by Rakha et al. [30] is utilized as the fuel consumption model. The VT-Micro model incorporates the vehicle's instantaneous speed and acceleration/deceleration for estimating fuel consumption, as illustrated in the following equation:
5
where represents the vehicle's instantaneous fuel consumption rate (L/s) or emission rate (mg/s); denotes the instantaneous speed of vehicles (m/s); is the instantaneous acceleration/deceleration (m/s2); and are the coefficients for modeling acceleration and deceleration respectively; is the power function of speed; is the power function of acceleration/deceleration.Subsequently, the instantaneous fuel consumption reward function is established as follows:
6
2) Rate of acceleration/deceleration change.
Frequent acceleration and deceleration of vehicles not only increase fuel consumption but also adversely affect passenger comfort. Therefore, it is essential to establish a reward function for the rate of acceleration and deceleration changes.
7
where represents the rate of acceleration/deceleration change of the vehicle; represents the acceleration/deceleration of the vehicle; represents a time step.In conclusion, the reward function based on fuel consumption is formulated as follows:
8
(3) Traffic efficiency.
The speed of a vehicle primarily influences traffic efficiency. Additionally, the stop-and-restart actions of vehicles prior to reaching the stop line can also affect traffic flow. Therefore, CAVs should aim to minimize stops before the stop line to enhance traffic efficiency.
1) Speed.
The speed of the CAV is treated as a reward function , with lane speed limits set at 16.67 m/s. A higher CAV speed corresponds to a greater reward value. However, if the CAV exceeds the speed limit or travels in reverse, the reward value is assigned as − 10.
9
2) Estimated arrival time.
The reward function integrates the estimated arrival time of the CAV. A reward value of 1 is assigned when the projected arrival time falls within the green light duration, conversely, a reward value of − 1 is assigned when it does not. The calculation of the expected arrival time for the vehicle can be streamlined as follows:
10
Subsequently, the reward function for the vehicle's expected arrival time is established as follows:
11
The efficiency-based reward function is established as follows:
12
The integrated reward function is built as follows:
13
TD3 Algorithm
CAVs must make smoother control decisions based on real-time traffic conditions and road information during operation. Such decisions typically demonstrate continuous characteristics, and the TD3 algorithm is explicitly designed to address decision-making challenges in continuous action spaces. Additionally, multiple objectives—including safety, fuel consumption, and traffic efficiency—are incorporated into the model, which requires the reinforcement learning algorithm to employ an efficient policy search and maintain a stable learning process. The TD3 algorithm enhances stability by implementing delayed updates and a double Q-network, effectively mitigating the risks of overfitting or instability in complex, multi-objective environments.
Furthermore, given the high precision required for trajectory optimization in CAVs, the TD3 algorithm leverages deep neural networks to improve control accuracy, thereby satisfying the high-performance requirements of autonomous driving systems. Compared to the DDPG algorithm, the double Q-network and target smoothing mechanism of TD3 significantly enhance the stability and accuracy of trajectory planning, thus preventing unreasonable decisions such as excessive acceleration or sudden braking. The delayed update strategy for the policy network also ensures stability during multi-objective optimization, improving convergence speed. In contrast to the PPO algorithm, TD3 converges to high-quality policies more rapidly through the use of deterministic strategies and delayed updates. Its high stability and accuracy in continuous control tasks enable better balancing of various objectives, including safety, fuel consumption, and traffic efficiency, as discussed in this study.
In summary, the TD3 algorithm has been selected to address the CAV trajectory planning problem, comprising the following key steps:
1. Initialization: initialize the actor network , the two critic networks and and the experience pool.
2. Sampling and storaging data: the agent interacts with the environment using the current strategy , obtains the state space , action space , reward value and the next state space , and stores the data in the experience pool.
3. Updating the critic network: when the number of samples in the experience pool exceeds , a random selection of sample data is made from the experience pool to update the parameters of the two critic networks and using the following formula.
14
15
4. Delayed update of the Actor network: at regular time intervals, the actor network parameters are updated based on the current sampled data using gradient ascent, which involves updating the actor network parameters in the direction of the policy gradient.
16
where represents the parameter gradient of objective function ; is the weighted average of the gradient of critic network with respect to action ; represents the gradient of the actor network with respect to the parameter .
5. Target network update: at regular intervals, the parameters of the current critic and actor networks are copied to the target critic and actor networks, as follows:
17
18
19
6. The above steps are repeated until the predefined training time is reached or the preset training objectives are achieved.
Results and discussions
Simulation setup
The simulation platform utilized in this study is the SUMO, a well-established open-source transportation simulation tool. SUMO is capable of interfacing with Python through the TraCI interface, allowing for the integration of advanced Python algorithms into the simulation to address various requirements. Simulations are conducted on hardware equipped with an Nvidia GTX 1060 GPU. The TD3 algorithm is implemented using the TensorFlow deep learning framework, and all simulations are performed using SUMO version 1.16.0.
The simulation scenario presented in this paper involves a two-way, four-lane signalized intersection, as illustrated in Fig. 4. Each approach extends 800 m in length. A fixed 4-phase signal scheme is depicted in Fig. 5 and includes the following phases: north–south straight, east–west straight, north–south left turn, and east–west left turn. Each green phase lasts 30 s, followed by a 3-s yellow light interval. Each training simulation is conducted over 3600 s, with vehicles measuring 5 m in length. The initial speeds of vehicles entering the simulated roadway are randomly distributed between 5.5 m/s and 13.8 m/s. The traffic flow for each approach is set at 600 pcu/h. To ensure adequate exploration of diverse traffic state characteristics, vehicle arrivals are generated according to a Weibull distribution. This distribution captures peak and off-peak arrival probabilities, aligning with real-world traffic variations.
[See PDF for image]
Fig. 4
Two-way four-lane signalized intersection
[See PDF for image]
Fig. 5
Phase scheme
The simulation parameters are configured as shown in Table 1. It presents the key parameter settings required for the simulation, including length of vehicle, minimum spacing of vehicles, maximum speed of vehicles, maximum acceleration of vehicles, minimum deceleration of vehicles, safe headway of vehicles, and simulation time.
Table 1. Simulation parameters
Parameters | Sol | Value |
|---|---|---|
Length of vehicle () | 5 | |
Speed of vehicle () | – | |
Acceleration/deceleration of the vehicle () | – | |
Minimum spacing of vehicles () | 2.5 | |
Maximum speed of vehicles () | 16.67 | |
Maximum acceleration of vehicles () | 3 | |
Minimum deceleration of vehicles () | − 3 | |
Safe headway of vehicles () | 1.2 | |
Simulation time () | 3600 |
Length of vehicle is typically determined based on vehicle type. In this research, a vehicle length common to passenger vehicles, typically ranging from 4 to 5 m, is selected. This length aids in accurately estimating following distances, ensuring that vehicle behavior in the simulation aligns with real traffic conditions. Furthermore, variations in vehicle length minimally impact acceleration and deceleration behaviors, without significantly affecting overall traffic efficiency or fuel consumption. Consequently, the default vehicle length of 5 m in the simulation software is adopted.
The determination of minimum spacing of vehicles relies on traffic safety standards and vehicle performance. An increase in this distance may result in longer queues, potentially increasing average travel time, however, a short space raises the risk of collisions, necessitating a balance between safety and efficiency. In this study, the minimum spacing of vehicles is set at 2.5 m, based on established road traffic safety requirements, suitable for scenarios involving both CAVs and HVs.
In urban traffic, speed is influenced by factors such as traffic signals and road conditions, thus the maximum speed primarily defines vehicle behavioral boundaries. Accordingly, the maximum vehicle speed is set at 16.67 m/s (60 km/h) based on standard urban road speed limits.
Appropriate acceleration and deceleration rates enhance traffic flow, however, excessive acceleration can increase energy consumption, while high deceleration may lead to emergency stops and traffic accidents. Thus, acceleration and deceleration are set at 3 m/s2 and – 3 m/s2, respectively, aligning with the performance standards of most modern vehicles, including CAVs.
An increase in safe headway leads to greater distances between vehicles, enhancing traffic stability, although it may also result in longer average queue lengths. An appropriate headway effectively balances safety and traffic efficiency. Following recommendations from the simulation software (SUMO) and traffic safety research, the safe headway is established at 2 s to ensure adequate reaction time between vehicles.
The simulation time impacts the stability of statistical results. Shorter durations may produce unstable outcomes that inadequately reflect long-term traffic trends. In contrast, a longer simulation time enables a more accurate evaluation of the model’s long-term effects while imposing higher demands for real-time simulation. Therefore, the simulation time is set at 3600 s, covering an extended traffic flow operating period and facilitating the assessment of model performance under varying flow conditions.
To ensure that the CAV gathers sufficient empirical data for learning during training, a ratio of 1:1 is established between CAVs and HVs in each training iteration, with a total of 300 predefined iterations.
Model parameters
The parameter configurations in deep reinforcement learning (DRL) significantly influence the performance of the algorithm. Improper parameter settings may lead to issues such as unstable performance, failure to converge, reduced fitting capability, or overfitting. Following extensive evaluations, the parameter combinations selected for this study are presented in Table 2.
Table 2. Model parameters
Parameters | Symbol | Value |
|---|---|---|
The distance from the vehicle to the stop line () | – | |
The current phase state | – | |
The remaining duration of the current phase | – | |
Actor network learning rate | – | 0.0001 |
Critic network learning rate | – | 0.001 |
Discount factor | – | 0.9 |
Soft update parameter | – | 0.005 |
Strategy noise | – | 0.2 |
Strategy delay | – | 2 |
Experience pool capacity | – | 100,000 |
Batch size | – | 300 |
Actor network hidden layer | – | (256,256,256) |
Critic network hidden layer | – | (256,256,256) |
Number of training iterations | – | 300 |
The feedback of the vehicle's total reward function during each training iteration is illustrated in Fig. 6. The trend depicted in the figure indicates significant oscillations in the vehicle’s reward function during the initial 30 iterations, followed by a gradual increase. Convergence is observed after approximately 150 iterations, at which point the final total reward function stabilizes around 450.
[See PDF for image]
Fig. 6
Iteration of the reward function
This section presents an analysis of the simulation results generated by SUMO. A comparison is made regarding the effects of varying CAV penetration rates and traffic flows on vehicle trajectories at intersections. Evaluation metrics such as average travel time, average fuel consumption, and average queue length are utilized. Simulations are conducted to examine different CAV penetration rates under various traffic flow conditions. Traffic flow for each approach ranges from 300 vehicles/h to 800 vehicles/h, with CAV penetration rates set at 0%, 20%, 40%, 60%, 80%, and 100%, respectively.
Average travel time
As illustrated in Table 3 and Fig. 7, under a constant CAV penetration rate, average travel time for vehicles increases as traffic flow rises. Conversely, with consistent traffic flow, average travel time decreases as the CAV penetration rate increases.
Table 3. Average travel time with different traffic flow and CAV penetration rates (s)
CAV penetration (%) | 0% | 20% | 40% | 60% | 80% | 100% |
|---|---|---|---|---|---|---|
Traffic flow (pcu/h) | ||||||
300 | 90.23 | 88.78 | 88.21 | 88.13 | 87.95 | 87.95 |
400 | 92.31 | 90.43 | 89.63 | 88.76 | 89.12 | 89.45 |
500 | 110.57 | 106.32 | 102.58 | 102.14 | 101.26 | 100.87 |
600 | 138.34 | 129.63 | 127.63 | 124.45 | 120.14 | 117.06 |
700 | 152.22 | 140.28 | 137.35 | 134.58 | 133.83 | 130.80 |
800 | 170.79 | 158.65 | 155.74 | 153.35 | 151.68 | 149.83 |
[See PDF for image]
Fig. 7
Rate of decrease in average travel time with different traffic flow and CAV penetration rates
At traffic flow rates of 300 pcu/h and 400 pcu/h, the lower vehicle volume allows for smooth traffic flow, enabling vehicles to maintain higher travel speeds. Consequently, under these conditions, the impact of CAV penetration rate on vehicle travel time is relatively minimal. In the absence of CAV intervention, the average travel times are 90.23 s and 92.31 s. Upon achieving a CAV penetration rate of 100%, the optimization effect on average vehicle travel time results in improvements of only 2.56% and 3.85%, respectively.
At a traffic flow of 600 pcu/h, congestion commences. In the absence of CAV intervention, vehicles experience increased travel times, resulting in an average travel time of 138.34 s. Notably, even with a CAV penetration rate of only 20%, significant optimization is observed in the overall average travel time at intersections, with a reduction of 8.71 s, corresponding to an optimization rate of 6.3%. Furthermore, as the CAV penetration rate increases, the average travel time continues to decrease. When the CAV penetration rate reaches 80% and 100%, CAVs dominate the roadway, resulting in reductions in average travel time of 18.20 s and 21.28 s, with optimization rates of 13.16% and 15.38%, respectively.
At traffic flow levels of 700 pcu/h and 800 pcu/h, the roadway experiences congestion with low service levels. In the absence of CAV intervention, vehicle travel times increase to 152.22 s and 170.79 s, respectively. With a CAV penetration rate of 20%, average travel times decrease by 7.84% and 7.11%, respectively. When a CAV penetration rate of 100% is achieved, average travel times further decrease by 14.07% and 12.27%, respectively. Compared to the 600 pcu/h flow rate, the improvement in average travel times shows a slight decline. This observation indicates that, at high vehicle volumes, the effectiveness of a basic speed optimization model may have reached its limits. Thus, simply optimizing speed is insufficient to improve vehicle efficiency at intersections under high-flow conditions.
Average fuel consumption
As illustrated in Table 4 and Fig. 8, under a consistent CAV penetration rate, average fuel consumption among vehicles increases proportionally with rising traffic flow. Conversely, with a steady traffic flow, average fuel consumption decreases as the CAV penetration rate increases.
Table 4. Average fuel consumption with different traffic flow and CAV penetration rates (ml)
CAV penetration (%) | 0% | 20% | 40% | 60% | 80% | 100% |
|---|---|---|---|---|---|---|
Traffic flow (pcu/h) | ||||||
300 | 112.43 | 110.42 | 108.83 | 108.56 | 108.12 | 107.83 |
400 | 117.25 | 114.50 | 113.10 | 112.42 | 112.34 | 112.18 |
500 | 128.67 | 119.75 | 116.13 | 115.08 | 115.41 | 115.06 |
600 | 142.31 | 130.48 | 124.11 | 120.92 | 119.95 | 119.57 |
700 | 160.45 | 145.44 | 137.32 | 134.84 | 131.85 | 131.04 |
800 | 186.52 | 165.17 | 154.75 | 152.71 | 151.03 | 150.10 |
[See PDF for image]
Fig. 8
Rate of decrease in average fuel consumption with different traffic flow and CAV penetration rates
When traffic flow rates are 300 pcu/h and 400 pcu/h, road traffic flows smoothly, leading to stable vehicle speeds and a decrease in idle instances. Consequently, fuel consumption remains relatively low, at 112.43 ml and 117.25 ml, respectively. The CAV trajectory planning model primarily reduces fuel consumption by facilitating uninterrupted vehicle passage through intersections, allowing vehicles to maintain consistent speeds and accelerations. In scenarios where traffic flows are stable and minimal delays are encountered at intersections, the effects of CAV trajectory planning are less pronounced. However, an increase in CAV penetration rates is associated with a reduction in average vehicle fuel consumption. At a CAV penetration rate of 20%, optimization rates are only 1.79% and 2.35%, respectively. When the CAV penetration rate reaches 100%, the optimization rates increase to 4.09% and 4.32%, respectively.
At a traffic flow rate of 600 pcu/h, mild congestion is encountered, resulting in a noticeable increase in fuel consumption. In the absence of CAV intervention, the average fuel consumption of vehicles is 142.31 ml. With a CAV penetration rate of 20%, the average fuel consumption decreases by 6.91%, and at a penetration rate of 100%, it decreases by 13.87%.
As the traffic flow increases to 800 pcu/h, substantial congestion occurs. Without CAV intervention, the average fuel consumption rises to 186.52 ml. At a CAV penetration rate of 20%, the average fuel consumption decreases by 11.45%, while a 100% CAV penetration rate achieves a reduction of 19.53%. These findings emphasize that significant optimization effects can be realized under congested road conditions, even with a low CAV penetration rate.
Average queue length
As shown in Table 5 and Fig. 9, a consistent CAV penetration rate is correlated with a proportional increase in the average vehicle queue length as traffic flow intensifies. Conversely, under steady traffic flow conditions, the average vehicle queue length decreases as the CAV penetration rate rises.
Table 5. Average queue length with different traffic flow and CAV penetration rates (m)
CAV penetration (%) | 0% | 20% | 40% | 60% | 80% | 100% |
|---|---|---|---|---|---|---|
Traffic flow (pcu/h) | ||||||
300 | 35.19 | 28.03 | 14.41 | 9.09 | 7.00 | 0.00 |
400 | 44.88 | 29.92 | 18.87 | 16.68 | 10.13 | 0.00 |
500 | 65.60 | 43.35 | 34.96 | 29.84 | 17.49 | 9.06 |
600 | 77.62 | 55.78 | 44.64 | 39.32 | 23.59 | 10.61 |
700 | 106.22 | 81.11 | 57.26 | 44.52 | 27.52 | 15.53 |
800 | 123.59 | 88.52 | 69.05 | 45.05 | 33.83 | 23.41 |
[See PDF for image]
Fig. 9
Rate of decrease in average queue length with different traffic flow and CAV penetration rates
At traffic flow rates of 300 pcu/h and 400 pcu/h, indicative of low vehicle density, queuing is not prevalent. In the absence of CAV intervention, the queue lengths are measured at only 35.19 m and 44.88 m, respectively. However, with a 100% CAV penetration rate, queuing at intersections is completely eliminated.
For traffic volumes exceeding 400 pcu/h, even with a 100% CAV penetration rate, not all vehicles can traverse the intersection without stopping. This limitation arises from the design of the reward function, which considers fuel consumption. If vehicles engage in idle driving to facilitate continuous passage, this would result in increased fuel consumption.
At a traffic flow of 600 pcu/h, slight congestion is observed. Without CAV intervention, the average queue length extends to 77.62 m. With a 20% CAV penetration rate, the average queue length decreases by 28.14%. Upon reaching a 100% CAV penetration rate, the average queue length is reduced by 86.33%.
At a traffic flow of 800 pcu/h, severe congestion occurs, with an average queue length of 123.59 m. Even with a 100% CAV penetration rate, the average queue length decreases by 81.06%, yet remains at 23.41 m.
Vehicle trajectory
This section analyzes the spatiotemporal trajectories of vehicles. Figure 10 presents the simulation results illustrating vehicle trajectories under various CAV penetration rates for the first 400 s of simulation time.
[See PDF for image]
Fig. 10
Vehicle trajectory planning with different CAV penetration rates
In scenarios where the CAV penetration rate is 0%, indicating that all vehicles in the traffic network are HVs, vehicles are unable to adapt their trajectories based on signal timing and the status of preceding vehicles. This results in a situation where most vehicles decelerate and halt at intersections, thereby compromising traffic efficiency.
With the integration of CAVs into the traffic network, information from preceding vehicles and traffic signals can be utilized by CAVs to adjust their trajectories. Additionally, HVs are indirectly influenced by CAVs, leading to modifications in their own trajectories. This interaction significantly reduces the total number of queuing vehicles and minimizes instances of vehicle halting. As queuing times decrease, the duration required for deceleration, stopping, and acceleration diminishes, resulting in a marked improvement in overall traffic efficiency. Consequently, a greater volume of vehicles can pass through signalized intersections within a single signal cycle. Thus, even with a CAV penetration rate of only 20%, a noticeable optimization effect is achieved.
At a CAV penetration rate of 40%, the trajectories of most vehicles within the traffic network are optimized, with only a minimal number of HVs experiencing stopping events. When the CAV penetration rate reaches 100%, the presence of even one queuing vehicle further amplifies the optimization effect. This outcome is related to the reward function's emphasis on fuel consumption. Vehicles that choose to idle at intersections to avoid stopping will increase fuel consumption, thereby leading to more stopping incidents.
The trajectory optimization model proposed in this study is compared with the model developed by Fang et al. [31] under high traffic scenarios, with results summarized in the Table 6. At CAV penetration rates of 20%, 60%, and 100%, the proposed model exhibits significant improvements in average travel time, average fuel consumption, and average queue length, thereby further validating its effectiveness.
Table 6. Comparison of proposed method with method by Fang et al. [31]
CAV Penetration | Indexes | Scenarios | ||
|---|---|---|---|---|
Proposed method in this study | Method proposed by Fang et al. [31] | Optimization | ||
20% | Average travel time (s) | 158.65 | 171.03 | 7.24% |
Average fuel consumption (ml) | 165.17 | 183.75 | 9.82% | |
Average queue length (m) | 88.52 | 95.20 | 7.01% | |
60% | Average travel time (s) | 153.35 | 167.66 | 8.53% |
Average fuel consumption (ml) | 152.71 | 168.98 | 9.63% | |
Average queue length (m) | 45.05 | 48.45 | 7.02% | |
100% | Average travel time (s) | 149.83 | 160.11 | 6.42% |
Average fuel consumption (ml) | 150.10 | 162.06 | 7.38% | |
Average queue length (m) | 23.41 | 24.39 | 4.02% | |
Conclusions
This study investigates the scenario of mixed traffic flow involving CAVs and HVs at signalized intersections. A multi-objective CAV trajectory model, based on the TD3 algorithm, is proposed to enhance travel efficiency and reduce fuel consumption by optimizing CAV trajectories. The model utilizes simplified vehicle driving states as inputs to the state space. Factors such as safety, fuel consumption, and traffic efficiency are incorporated into the reward function design, with acceleration and deceleration as the output variables. The effectiveness of the algorithm is validated using the SUMO platform. Simulation results demonstrate that the proposed algorithm significantly reduces average travel time, average fuel consumption, and average vehicle queue lengths. These findings not only provide theoretical support for the trajectory optimization of CAVs but also offer valuable insights for their future widespread application in complex traffic environments. These findings not only provide theoretical support for the trajectory optimization of CAVs but also offer valuable insights for their future widespread application in complex traffic environments. To fully exploit the advantages of the CAV trajectory optimization model and facilitate algorithm implementation in real-world traffic systems, integration with existing intelligent traffic signal systems, such as adaptive signal control systems, is recommended. By real-time collecting data on traffic flow and signal states, and combining this data with the CAV model's outputs, dynamic cooperation between traffic signals and vehicles can be established, thereby optimizing overall intersection capacity.
Future works
This study applies the TD3 algorithm to the trajectory optimization problem for CAVs, leveraging its powerful training and learning capabilities to achieve precise speed control and trajectory optimization. However, this study has some limitations. First, while the existing TD3 algorithm is applied to the trajectory optimization problem, no optimizations to the TD3 algorithm itself are explored. Future work will focus on improving and optimizing the TD3 algorithm in accordance with the specific characteristics of autonomous trajectory optimization. This may include the incorporation of a multi-task learning framework, enabling the agent to handle multiple optimization objectives simultaneously and effectively learn the trade-offs among them. Combining TD3 with classical control methods, such as Model Predictive Control (MPC), will utilize MPC's efficiency in dynamic environments alongside the benefits of reinforcement learning in long-term optimization, thus creating a more robust trajectory planning and control framework. This approach aims to enhance the decision-making accuracy and system response speed of CAVs in complex traffic situations.
Second, this research concentrates solely on the trajectory optimization of CAVs without considering variations in traffic signals at intersections. Future research will integrate signal priority strategies with trajectory optimization to achieve bidirectional coordinated optimization between vehicle trajectories and traffic signals, thereby further enhancing traffic efficiency.
Acknowledgements
Not applicable.
Authors’ contributions
All the authors confirm contribution to the paper as follows: study conception and design: HL and YFG. Data collection: YHG and YG. Analysis and interpretation of results: HL, YFG, YHG, and YG. Draft manuscript preparation: HL, YFG, and YHG. Manuscript review: HL and XZ. All authors reviewed the results and approved the final version of the manuscript.
Funding
This research was funded by the Henan Province Undergraduate Innovation and Entrepreneurship Training Program (202410463058), the Tackle Key Problems in Science and Technology Project of Henan Province (222102240052), the Foundation for High-Level Talents of Henan University of Technology (2018BS029), the Doctoral Foundation of Henan University of Technology for Xu Zhang (31400348), and Research Funds for Xu Zhang, Chief Expert of Traffic Engineering (21410003).
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare that they have no competing interests.
Abbreviations
Twin delayed deep deterministic policy gradient
Simulation of urban mobility
Connected automated vehicle
Cooperative vehicle infrastructure systems
Self-organizing CAVs
Adaptive coupling control
Time to contact
Virginal tech microscopic model
Deep deterministic policy gradient
Proximal policy optimization
Dynamic programming with trajectory planning utilizing piecewise polynomials as a subroutine
Human-driven vehicle
Deep reinforcement learning
Model predictive control
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Ministry of ecology and environment of the people′s republic of China (2023) China Mobile Source Environmental Management Annual Report (2023) Ministry of Ecology and Environment of the People′s Republic of China, Beijing, China
2. Shan, X; Wan, C; Hao, P et al. A novel dynamic bus lane control strategy with Eco-driving under partially connected vehicle environment. IEEE Trans Intell Transp Syst; 2024; 25,
3. Li, J; Fotouhi, A; Liu, Y et al. Review on Eco-driving control for connected and automated vehicles. Renew Sust Energ Rev; 2024; 189, [DOI: https://dx.doi.org/10.1016/j.rser.2023.114025] 114025.
4. Shi, D; Liu, S; Cai, Y et al. Pontryagin’s minimum principle based fuzzy adaptive energy management for hybrid electric vehicle using real-time traffic information. Appl Energy; 2021; 286, [DOI: https://dx.doi.org/10.1016/j.apenergy.2021.116467] 116467.
5. Mohamed, M; Elmitiny, N et al. A simulation-based evaluation of BRT systems in over-crowded travel corridors: a case study of Cairo. Egypt J Eng Appl Sci; 2022; 69,
6. Meng, X; Cassandras, C. Trajectory optimization of autonomous agents with spatio-temporal constraints. IEEE Trans Control Network Syst; 2020; 7,
7. Ma, F; Yang, Y; Wang, J et al. Eco-driving-based cooperative adaptive cruise control of connected vehicles platoon at signalized intersections. Transport Res D; 2021; 92, [DOI: https://dx.doi.org/10.1016/j.trd.2021.102746] 102746.
8. Tajalli, M; Hajbabaie, A. Dynamic speed harmonization in connected urban street networks. Comput-Aided Civ Inf; 2018; 33,
9. Teng, K; Liu, H; Liu, Q et al. A cooperative control method combining signal control and speed control for transit with connected vehicle environment. IET Control Theory A; 2024; 18,
10. Li, D; Zhu, F; Wu, J et al. Managing mixed traffic at signalized intersections: An adaptive signal control and CAV coordination system based on deep reinforcement learning. Expert Syst App; 2024; 238, [DOI: https://dx.doi.org/10.1016/j.eswa.2023.121959] 121959.
11. Chen, S; Zong, S; Chen, T et al. A taxonomy for autonomous vehicles considering ambient road infrastructure. Sustainability; 2023; 15,
12. Shi, H; Zhou, Y; Wu, K et al. Physics-informed deep reinforcement learning-based integrated two-dimensional car-following control strategy for connected automated vehicles. Knowl Based Syst; 2023; 269, [DOI: https://dx.doi.org/10.1016/j.knosys.2023.110485] 110485.
13. Wu W, Meng F, Liu T et al (2024) Optimizing bus operations at autonomous intersection with trajectory planning and priority control. IEEE Trans Intell 25(10):14876–14889
14. Ma, C; Yu, C; Zhang, C et al. Signal timing at an isolated intersection under mixed traffic environment with self-organizing connected and automated vehicles. Comput-Aided Civ Inf; 2023; 38,
15. Yu, C; Feng, Y; Liu, HX et al. Corridor level cooperative trajectory optimization with connected and automated vehicles. Transport Res C; 2019; 105, pp. 405-421. [DOI: https://dx.doi.org/10.1016/j.trc.2019.06.002]
16. Feng, L; Zhao, X; Chen, Z et al. An adaptive coupled control method based on vehicles platooning for intersection controller and vehicle trajectories in mixed traffic. IET Intell Transp Sy; 2024; 18,
17. Hou, Y; Seliman, S; Wang, E et al. Cooperative and integrated vehicle and intersection control for energy efficiency (CIVIC-E2). IEEE Trans Intell; 2018; 19,
18. Wang, S; Wang, Z; Jiang, R et al. Trajectory jerking suppression for mixed traffic flow at a signalized intersection: a trajectory prediction based deep reinforcement learning method. IEEE Trans Intell; 2022; 23,
19. Yao, Z; Jiang, H; Cheng, Y et al. Integrated schedule and trajectory optimization for connected automated vehicles in a conflict zone. IEEE Trans Intell; 2020; 23,
20. Yuan, W; Frey, C; Wei, T. Fuel use and emission rates reduction potential for light-duty gasoline vehicle eco-driving. Transport Res D; 2022; 109, [DOI: https://dx.doi.org/10.1016/j.trd.2022.103394] 103394.
21. Shi, X; Zhang, J; Jiang, X et al. Learning Eco-driving strategies from human driving trajectories. Physica A; 2024; 633, [DOI: https://dx.doi.org/10.1016/j.physa.2023.129353] 129353.
22. Ma, W; Chen, B; Yu, C et al. Trajectory planning for connected and autonomous vehicles at freeway work zones under mixed traffic environment. Transp Res Rec; 2022; 2676,
23. Ying, J; Feng, Y. Infrastructure-assisted cooperative driving and intersection management in mixed traffic conditions. Transport Res C; 2024; 158, [DOI: https://dx.doi.org/10.1016/j.trc.2023.104443] 104443.
24. Liu, C; Sheng, Z; Chen, S et al. Longitudinal control of connected and automated vehicles among signalized intersections in mixed traffic flow with deep reinforcement learning approach. Physica A; 2023; 629, 4671780 [DOI: https://dx.doi.org/10.1016/j.physa.2023.129189] 129189.
25. Chen, S; Dong, J; Ha, P et al. Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles. Comput-Aided Civ Inf; 2021; 36,
26. Dong, J; Chen, S; Li, Y et al. Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment. Transport Res C; 2021; 128, [DOI: https://dx.doi.org/10.1016/j.trc.2021.103192] 103192.
27. Guo, Y; Ma, J; Xiong, C et al. Joint optimization of vehicle trajectories and intersection controllers with connected automated vehicles: combined dynamic programming and shooting heuristic approach. Transport Res C; 2019; 98, pp. 54-72. [DOI: https://dx.doi.org/10.1016/j.trc.2018.11.010]
28. Guo, Y; Ma, J. DRL-TP3: a learning and control framework for signalized intersections with mixed connected automated traffic. Transport Res C; 2021; 132, [DOI: https://dx.doi.org/10.1016/j.trc.2021.103416] 103416.
29. Gao, H; Jia, H; Huang, Q et al. A hybrid deep learning model for urban expressway lane-level mixed traffic flow prediction. Eng Appl Artif Intel; 2024; 133, [DOI: https://dx.doi.org/10.1016/j.engappai.2024.108242] 108242.
30. Rakha, H; Ahn, K; Trani, A. Development of VT-Micro model for estimating hot stabilized light duty vehicle and truck emissions. Transp Res D; 2004; 9, pp. 49-74. [DOI: https://dx.doi.org/10.1016/S1361-9209(03)00054-3]
31. Fang, S; Yang, L; Wang, T et al. Trajectory planning method for mixed vehicles considering traffic stability and fuel consumption at the signalized intersection. J Adv Transport; 2020; 2020, 1456207. [DOI: https://dx.doi.org/10.1155/2020/1456207]
Copyright Springer Nature B.V. Dec 2025