Full Text

Turn on search term navigation

1. Introduction

Transportation plays a substantial role in global energy consumption and greenhouse gas emissions. In the pursuit of energy conservation and emission reduction, the transportation sector is prioritizing the implementation of stricter emission standards and the advancement of new energy vehicles [1]. Electric vehicles have garnered widespread attention from various countries due to their high energy conversion efficiency and lower environmental impact [2,3,4]. Eco-driving, acknowledged and praised by scholars globally, presents a remarkable energy-saving and emission-reducing effect. Achieved through appropriate vehicle speed and acceleration, scientifically selected routes, and suitable vehicle maintenance, eco-driving effectively reduces fuel consumption and tailpipe emissions, ultimately mitigating energy waste and environmental pollution [5,6,7]. Currently, research on eco-driving primarily focuses on the following areas: maintaining vehicle speed under different driving conditions [8], optimizing acceleration and deceleration [9,10], car following [5], and route planning [11,12]. Vehicle-following technology emerges as an effective means to alleviate traffic congestion by optimizing urban traffic flow while enhancing vehicle operational efficiency. Vehicle following describes the interactions between vehicles in the same lane [13]. Treiber et al. [14] integrated previous findings and proposed the intelligent driver model (IDM), a unified mathematical model capable of describing vehicle-following behavior from free-flow to jammed-flow conditions. Kesting et al. [15,16] focused their research on parameter calibration for the following model, treating headway and trajectory as measures of performance. They conducted a thorough analysis of various issues during the validation process. However, the study lacked an evaluation and validation of the selected performance indicators. In the context of hybrid electric vehicle fleet eco-driving, Wang et al. [17] established a multi-objective optimization function for hybrid electric vehicle fleet queue speed. With deeper research on the vehicle-following models by scholars, it has been discovered that the human-driver and vehicle can exhibit distinct vehicle-following characteristics as a unified entity. Investigating human driving styles, Hu et al. [18] developed a car-following driver model that accurately captures the driving characteristics of humans and demonstrated its effectiveness in adapting to various driving styles. Saifuzzaman et al. [19] combined the IDM model and Gipps model with a driving difficulty module to establish the TDIDM model and TDGipps model, respectively. They convincingly demonstrated that the TDIDM model effectively maintains stable following behavior even in intricate driving scenarios. Furthermore, as intelligent connected technology continues to advance, vehicles are increasingly capable of effortlessly accessing real-time traffic information while in operation. Several studies have emphasized the significance of incorporating the influence of diverse road conditions when developing car-following models [20,21,22,23]. And, with the rise of artificial intelligence technology, deep reinforcement learning, as a popular field in the field of artificial intelligence, is capable of solving numerous challenging problems [24,25,26]. It is also widely applied in the domain of car-following. For instance, researchers have proposed several car-following models based on deep reinforcement learning. These include a human-like car-following model, utilizing deep reinforcement learning techniques [27], a personalized car-following model incorporating memory-based deep reinforcement learning [28], and a vehicle-following model that combines deep deterministic policy gradients (DDPG) with stacked denoising autoencoders (SDAE) [29].

In vehicle-following models, time headway (THW) is a critical parameter used to describe vehicle-following behavior, defined as the time interval between consecutive vehicles passing the same section of the road [30]. Many scholars have conducted extensive research on the issue of headway in vehicles’ car-following models, mainly categorized into fixed headway and variable headway. For instance, in the realm of studying fixed-time headways, certain scholars have utilized gray correlation analysis methods to examine the correlation between time headway and vehicle-following behavior [31]. They have subsequently devised a car-following model grounded in fixed-time headways. Conversely, other researchers have leveraged time headway as a metric to comprehend the behavior exhibited by human drivers [32]. Research on variable headway time is limited, Yuan et al. [33] designed a novel car-following model based on dynamic safety headway, effectively preventing collisions and improving driving performance and traffic flow efficiency under emergency situations. However, the current research mainly focuses on minimizing energy consumption or setting a fixed desired time headway in car-following models, with fewer studies on multi-objective optimization of the car-following models. Additionally, existing research often sets the desired headway time as a constant value, while vehicles are subject to varying traffic conditions during operation. Fixed desired headway time cannot accurately reflect the actual traffic flow. Moreover, there is limited research applying the intelligent driver model (IDM) to eco-driving. Thus, this study integrates the IDM model with the deep deterministic policy gradient (DDPG) algorithm, presenting a dynamic expected headway IDM car-following model that incorporates eco-driving principles to attain multi-objective optimization for economy, safety, and comfort.

The contributions of this study can be summarized as follows: Firstly, an electric vehicle (EV) model is established using MATLAB/Simulink, and the relevant parameters of the intelligent driver model (IDM) and the multi-objective function for eco-driving are introduced. Secondly, an eco-driving IDM car-following strategy based on the deep deterministic policy gradient (DDPG) algorithm is proposed. Lastly, simulation verification is conducted by comparing the improved IDM model with dynamic desired headway based on the DDPG agent strategy to the IDM model with a fixed expected headway. The results confirm that the proposed method exhibits superior economy, comfort, and safety performance. Additionally, the proposed strategy is validated under different driving conditions, demonstrating its generalization capability.

2. Materials and Methods

2.1. IDM-Based Vehicle-Following Model

Electric vehicles (EVs) are characterized by their simple structure and high energy conversion efficiency. To investigate eco-driving car-following strategies, it is essential to establish a vehicle model for electric vehicles.

2.1.1. Vehicle Model

In accordance with Equations (1) and (2) governing vehicle motion, a vehicle model for electric vehicles (EVs) is constructed. The main parameters of the vehicle are presented in Table 1.

(1) $F_{t} = F_{f} + F_{w} + F_{i} + F_{j}$

(2) $\frac{T_{t q} i η_{}}{r} = m g f \cos θ + \frac{C_{D} A}{2} ρ μ_{}^{2} + m g \sin θ + m \frac{d u}{d t}$

where F_t represents the driving force, F_f represents the rolling resistance, F_w represents the aerodynamic drag, F_i represents the grade resistance, F_j represents the acceleration resistance, T_tq represents the engine torque, i represents the transmission ratio, η represents the transmission system efficiency, r represents the wheel radius, m represents the vehicle mass, g represents the gravity, g = 9.8 (m·s⁻²), f represents the rolling resistance coefficient, θ represents the road grade, C_D represents the air resistance coefficient, A represents the frontal area, ρ represents the air density, u represents the vehicle speed, and

\frac{d u}{d t}

represents the longitudinal acceleration during travel.

2.1.2. IDM Model

Treiber et al. proposed the intelligent driver model (IDM), which has been developed through the integration of various interdisciplinary theories, such as physics, psychology, automatic control, and vehicle engineering. These interdisciplinary theories together form the core of the IDM. Based on modern artificial intelligence algorithms and extensive data training, the IDM can simulate the decision-making process of human drivers, enabling autonomous vehicle control and rational decision-making in complex traffic environments. Therefore, the development of the IDM holds significant importance for the future advancement of intelligent transportation. The classic IDMs are as follows:

(3) $a_{n} (t) = a [1 - {(\frac{v_{n} (t)}{v_{0}})}^{φ} - {(\frac{s^{*} (v_{n} (t), Δ v_{n} (t))}{Δ s_{n} (t)})}^{2}]$

(4) $s^{*} (v_{n} (t), Δ v_{n} (t)) = s_{1} + τ v_{n} (t) + \frac{v_{n} (t) \cdot Δ v_{n} (t)}{2 \sqrt{a b}}$

Table 2 below explains the parameters in Equations (3) and (4).

Table 3 below shows the parameter calibration results for the IDM model in this study.

2.1.3. Eco-Driving Multi-Objective Function

Eco-driving has become a prominent area of research in the field of intelligent connected vehicles. Its primary goal is to enhance driving behavior to reduce energy consumption and improve traffic conditions, all while ensuring driving safety. To achieve this objective, this paper builds upon the IDM car-following model and formulates a comprehensive multi-objective eco-driving function that takes into account economy, safety, and comfort.

(5) $F = \int_{0}^{T} [α Δ S O C + β {(v (t) - v_{l e a d} (t))}^{2} + γ {(a (t))}^{2}] d t$

where F denotes the eco-driving objective function, T represents the travel time,

Δ S O C

represents the energy consumption, v(t) represents the speed of the following vehicle, v_lead(t) represents the speed of the preceding vehicle, a(t) represents the acceleration of the following vehicle. Additionally, α, β, and γ are the weighting coefficients for the three indicators, respectively.

2.2. DDPG-Based Eco-Driving Car-Following Strategy

2.2.1. Car-Following Model Design

Existing car-following models often set the desired time headway as a fixed value, but vehicles in actual operation are often influenced by traffic flow, which is time-varying. Static time headway cannot accurately reflect the impact of traffic flow. DDPG algorithm is a strategy capable of finding optimal objectives in continuous action spaces. This algorithm consists of two neural networks: one for estimating the action-value function (Critic network) and the other for generating actions (Actor network). The Actor network outputs an action a based on the current state, while the Critic network uses this action and the current state as inputs to estimate the value of the current state. The DDPG algorithm employs a cooperative approach between the Actor and Critic networks, simultaneously updating their parameters by minimizing the error in the action-value function. The training process of the DDPG algorithm involves two stages: sampling and learning. During the sampling stage, the Agent interacts with the environment, storing newly acquired experience data. In the learning stage, the DDPG algorithm samples data from the experience pool for learning and optimizes the parameters of the Actor and Critic networks.

The optimization framework in this study is as follows: (1) state selection: preceding vehicle speed, inter-vehicle distance, and ego vehicle speed; (2) action selection: desired time headway; and (3) rewards: aimed at vehicle fuel economy, comfort, and safety performance. The reinforcement learning strategy employed in this study is illustrated in Figure 1.

2.2.2. Environment and Reward Configuration

As indicated by the IDM car-following model mentioned earlier, the driver’s car-following behavior is influenced by the distance between the two vehicles, relative velocity, and desired time headway. Consequently, in the scope of this research, the preceding vehicle’s velocity, the distance between the two vehicles, and the ego vehicle’s velocity are considered as input states, while the desired time headway is regarded as the output of the driver’s model.

In reinforcement learning, the task of the agent is to acquire current state information from the environment and select actions from the action space based on a policy. After executing an action in the environment, the agent transitions to the next state and receives corresponding rewards (or penalties). This process continues iteratively until a termination condition is met. The objective of the agent is to maximize the cumulative rewards it receives, and the measurement of the goodness of the agent’s actions is typically represented by a reward function. Designing a suitable reward function is crucial in reinforcement learning algorithms as it guides and constrains the agent’s behavior, leading to improvements in autonomous learning and adaptability. Therefore, the design of the reward function must carefully consider practical problems and be flexibly adjusted according to specific application scenarios to enhance the performance and effectiveness of the reinforcement learning algorithm.

In real-world car-following scenarios, drivers adjust their vehicle’s state based on changes in the environment, taking appropriate actions accordingly. Building upon the eco-driving function mentioned earlier, this paper designs the reward function as follows:

(6) $R = {\begin{cases} R_{1} = ω_{1} {(v_{2} - v_{1})}^{2} + ω_{2} a^{2} + ω_{3} Δ S O C \\ R_{1} + R_{2} \begin{matrix} \end{matrix} R_{2} = - 1, i f {(v_{2} - v_{1})}^{2} \leq 0.25 \end{cases}$

where R represents the reward function and v₁ and v₂, respectively, denote the speeds of the preceding and following vehicles. The agent receives a score when the speed difference R₂ between the two vehicles is less than 0.5 m∙s⁻¹. ω₁, ω₂, and ω₃ are the weight coefficients for the three indicators, respectively.

2.2.3. Parameter Updating

The relevant parameters for the DDPG algorithm are provided as follows: the target smoothing factor is set to 1.0 × 10⁻³, the experience replay buffer length is 1.0 × 10⁶, the minimum sample size is 256, and the discount factor is 0.99.

To achieve better convergence speed, the hyperbolic tangent tan H(x) activation function is used to approximate the transformation relationship between input and output signals in the hidden layer, ensuring that the output acceleration falls within the range of [−1, 1]. The expression for tan H(x) is as follows:

(7) $\tan H (x) = \frac{(e^{x} - e^{- x})}{(e^{x} + e^{- x})}$

After receiving experience samples from the experience replay buffer, the Critic network updates the relevant parameters of the policy network by minimizing the loss function. The loss function can be represented as follows:

(8) $L_{Q} = E ({(y_{t} - Q (S_{t}, a_{t} | θ^{Q}))}^{2})$

where θ^Q represents the parameters of the Critic network; S_t and a_t are the state and action at time t, respectively;

Q (S_{t}, a_{t} | θ^{Q})

is the output of the Critic network; y_t is the target Q-value.

(9) $y_{t} = r_{t} + γ Q^{'} (s_{t + 1}, u^{'} (s_{t + 1} ∣ θ^{μ'}) ∣ θ_{Q})$

where r_t is the reward value at time t, Q′ and μ′ are the target Critic network and target Actor network, respectively, and γ is the discount factor.

The $θ^{μ}$ parameter of the Actor network are updated by minimizing the loss function:

(10) $L_{μ} = E (Q (s_{t}, μ (s_{t})))$

The update of the target network parameters $θ^{Q'}, θ^{μ'}$ is carried out using the following method:

(11) ${\begin{cases} θ^{Q'} = ζ θ^{Q} + (1 - ζ) θ^{Q'} \\ θ^{μ'} = ζ θ^{μ} + (1 - ζ) θ^{μ'} \end{cases}$

where ζ represents the soft update rate.

3. Results

3.1. Algorithm Validation

To validate the eco-driving car-following performance of the IDM model integrated with the DDPG strategy, simulations were conducted using Matlab/Simulink. The velocity profile of the United States Federal Test Procedure FTP72 was selected as the velocity curve for the preceding vehicle. FTP72 is a more complex driving cycle compared to the New European Driving Cycle (NEDC), with varying speeds and a longer test duration. Additionally, the FTP72 driving cycle exhibits both high-speed and low-speed operating conditions. The cumulative reward plot of the DDPG agent is shown in Figure 2, indicating that the agent’s rewards tend to converge after 512 episodes.

3.2. Analytical Results

The optimized deep reinforcement learning action in this study is the desired time headway, which is known to vary within a relatively small range for specific drivers, typically between 1.5 and 3.5 (s∙veh⁻¹) [34]. Therefore, in this study, we compare and analyze the results obtained by setting the desired time headway to 2 (s∙veh⁻¹) and 3.5 (s∙veh⁻¹), respectively. Figure 3, Figure 4 and Figure 5 depict the speed profiles of the leading and following vehicles using the IDM model with the DDPG intelligent agent strategy, for the desired time headways of 2 (s∙veh⁻¹) and 3.5 (s∙veh⁻¹), respectively. Additionally, Figure 6 shows the relative distance between the two vehicles for the three different approaches: Method 1, where the desired time headway is determined by the DDPG intelligent agent; Method 2, with a fixed desired time headway of 2 (s∙veh⁻¹); and Method 3, with a fixed desired time headway of 3.5 (s∙veh⁻¹).

It can be observed from Figure 3, Figure 4, Figure 5 and Figure 6 that when the speed is in the range of 0–40 (km∙h⁻¹), the relative distances between the two vehicles are similar for all three strategies and they all maintain an appropriate following distance; also, when the speed is in the range of 40–60 (km∙h⁻¹), Method 1 exhibits a greater distance between the vehicles compared to Method 2 and 3. This is because the speed increases, and while ensuring safety, Method 1 sacrifices the following distance to improve fuel economy. On the other hand, the following distance in Method 2 is too small, indicating relatively aggressive driving behavior, which may not be conducive to driving safety. When the speed exceeds 60 (km∙h⁻¹), Method 1 is capable of maintaining a relatively stable following distance, ensuring driving safety. However, the distance for Method 3 is too large, leading to a decrease in the following effect.

Figure 7 shows the acceleration profiles of the following vehicle under three different methods, where acceleration is an important indicator reflecting comfort. From the three acceleration curves, it can be observed that the acceleration curve of Method 1 is generally lower than those of Method 2 and Method 3, while the acceleration curve of Method 2 is higher than the other two methods at each stage. This indicates that the car-following model of Method 1 exhibits higher stability, better passenger comfort, and a relatively gentle driving style. Figure 8 displays the State of Charge (SOC) curves under the three methods for the same driving conditions. In this case, the final SOC value of Method 1 is 0.7012, while for Method 2 and Method 3, the final SOC values are 0.6985 and 0.7003, respectively. Method 1 improves fuel economy by 2.66% compared to Method 2, and it also shows a slight improvement in fuel economy compared to Method 3. In summary, Method 1 outperforms Method 2 and Method 3 in terms of tracking performance, driving comfort, and economy.

To evaluate the generalization capability of the proposed DDPG intelligent agent car-following strategy, several representative driving scenarios were selected for testing, including FTP72, WLTC CLASS2, and JC08. Figure 9 shows the economy under the three methods for each of the selected driving scenarios. It can be observed that in the FTP72 and JC08 scenarios, Method 1 consistently outperforms Methods 2 and 3 in terms of economy. However, in the WLTC scenario, Method 1 exhibits slightly lower economy compared to the other two methods. The preceding Figure 3, Figure 4, Figure 5 and Figure 6 represent the vehicle speeds and relative distances between the leading and following vehicles for the three methods under the FTP72 operating condition. Figure 10 illustrates the vehicle speeds and relative distances between the leading and following vehicles for the three methods under the selected additional driving scenarios. For speeds below 40 (km∙h⁻¹), the relative distances are relatively close for all three methods. For speeds between 40–60 (km∙h⁻¹), Method 1 maintains a slightly larger relative distance compared to the other methods, striking a balance between tracking performance and driving safety. For speeds above 60 (km∙h⁻¹), Method 1 maintains a stable following distance, while Method 2 results in a smaller following distance, potentially compromising driving safety, and Method 3 exhibits rapidly increasing following distances, leading to inferior tracking performance. Figure 11 presents the average acceleration and deceleration values for the three methods under the selected driving scenarios. It can be observed that the average acceleration and deceleration values of Method 1 and Method 3 are lower than those of Method 2, indicating that Method 2 involves more aggressive driving behavior, while Method 1 and Method 3 provide better comfort.

In conclusion, Method 1 demonstrates superior overall performance compared to Methods 2 and 3, confirming that the IDM model with the desired time headway determined by the DDPG intelligent agent outperforms the fixed-time headway IDM model in various driving scenarios and exhibits a certain level of generalization capability.

4. Discussions

The proposed IDM model based on the DDPG agent policy integrates the concept of eco-driving relative to the fixed-time-headway IDM model. It outperforms the traditional fixed desired time-headway IDM model in terms of economic efficiency, safety, and comfort, as it can adapt to changes in traffic flow. In traffic environments, fixed-time-headway car-following models often fail to adapt to variations in traffic conditions. In contrast, the IDM model based on the DDPG agent policy utilizes the information of the ego vehicle and the leading vehicle as the state space, with the desired time headway as the action output, allowing it to promptly respond to changes in the traffic environment.

Based on the simulation analysis outlined above, it becomes evident that all three strategies maintain commendable tracking performance throughout the FTP72 driving cycle, as well as the low-speed phase of the JC08 cycle. Nevertheless, during the transition phase from low to medium speeds, Method 2, characterized by a desired time headway of 2 (s·vehi⁻¹), gives rise to sudden acceleration or deceleration. This leads to a notable deterioration in driver comfort and safety, concurrently escalating energy consumption. Furthermore, in the context of medium-speed operation, the model that adopts a desired time headway of 3 (s·vehi⁻¹) exhibits suboptimal car-following behavior, primarily due to its inability to promptly respond to alterations in the behavior of the leading vehicle.

Regarding the WLTC CLASS2 driving cycle, the proposed strategy exhibits slightly lower economic efficiency compared to the other two methods. This is because the switching and maintenance of different speed ranges are more variable. The intelligent agent policy sacrifices some economic efficiency in exchange for improved driver comfort and safety, thereby meeting the diversified requirements of eco-driving.

5. Conclusions

5.1. Main Conclusions

Existing IDM car-following models often set the time headway as a fixed value, which does not account for the influence of traffic flow. In response to this issue, this paper integrates the concept of eco-driving and proposes an eco-driving multi-objective function that comprehensively considers economy, comfort, and safety. An electric vehicle model is established based on the vehicle motion equations. An eco-driving car-following strategy based on DDPG is proposed, which adjusts the desired time headway in the IDM model using the deep reinforcement learning algorithm. The optimized desired time headway from the DDPG intelligent agent is compared with the fixed-time headway in the traditional calibrated IDM model through simulations.

The results from the study indicate that the proposed approach performs better than the traditional fixed-time-headway intelligent driver model (IDM) in terms of comfort, safety, and economy under the FTP72 driving condition. The proposed approach shows a 2.66% improvement in the economy compared to the model with a desired time headway of 2 (s∙veh⁻¹). In regard to comfort, the acceleration profile of the following vehicle in the proposed approach demonstrates a smoother variation. This implies that the proposed approach provides a more comfortable driving experience compared to the traditional fixed-time-headway IDM model. In terms of safety, the proposed approach effectively maintains a sufficient safety distance and exhibits good tracking performance across different driving conditions. This suggests that the proposed approach ensures the safety of the vehicles on the road and performs well in various low-to-medium speed driving conditions. Furthermore, the study verifies the generalization ability of the proposed approach by considering three representative driving conditions. The proposed approach proves its ability to adapt and perform well under different driving conditions. The comprehensive performance analysis confirms the generalization ability of the proposed approach. In conclusion, the IDM car-following model improved using the DDPG algorithm aligns with the principles of eco-driving, providing new insights for the enhancement of IDM car-following models and serving as a reference for the research and promotion of eco-driving technology.

5.2. Limitations and Future Research

This model is suitable only for medium- and low-speed driving conditions. We believe that car-following at high speeds is unsafe, so we do not consider high-speed scenarios. Additionally, we trained intelligent agents using electric vehicles. The car-following strategy model described in this paper is not applicable to fuel-powered or hybrid vehicles. Meanwhile, in the study of improving the car-following model’s time-headway action space using DDPG, the integration of data-driven and theory-based models was not explored. In future research, the theory-based model can be combined with real-world driving data sets. Additionally, other model parameters could be jointly optimized to meet different driving requirements, further enhancing the integration of the car-following model with eco-driving principles.

Author Contributions

W.Z., N.W., Q.L., C.P. and L.C. contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by W.Z. and N.W. The first draft of the manuscript was written by N.W. and W.Z. W.Z., Q.L., C.P. and L.C. commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors are grateful for the financial support from the National Natural Science Foundation of China (U20A20331, 52272367); Key Research and Development Program of Jiangsu Province (BE2021011-3).

Conflicts of Interest

The authors declare that they have no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Schematic diagram of the reinforcement learning strategy.

Figure 2. The cumulative reward plot of the DDPG agent.

Figure 3. Speed profile of the leading and following vehicles using the IDM model with the DDPG intelligent agent strategy.

Figure 4. Speed profiles of the leading and following vehicles in the IDM model when the desired time headway is set to 2 (s∙veh−1).

Figure 5. Speed profiles of the leading and following vehicles in the IDM model when the desired time headway is set to 3.5 (s∙veh−1).

Figure 6. Relative distance between the leading and following vehicles based on Method 1, 2, and 3.

Figure 7. Acceleration curves based on Method 1, 2, and 3.

Figure 8. SOC output of the desired time headway in the IDM model based on Method 1, 2, and 3.

Figure 9. ΔSOC using Method 1, 2, and 3 under FTP72, WLTC, and JC08.

View Image - Figure 10. Speed profiles and relative distances between leading and following vehicles for Method 1, 2, and 3 under WLTC and JC08. (a) Vehicle speed profiles under WLTC CLASS2. (b) Relative distance between leading and following vehicles under WLTC CLASS2. (c) Vehicle speed profiles under JC08. (d) Relative distance between leading and following vehicles under JC08.

Figure 10. Speed profiles and relative distances between leading and following vehicles for Method 1, 2, and 3 under WLTC and JC08. (a) Vehicle speed profiles under WLTC CLASS2. (b) Relative distance between leading and following vehicles under WLTC CLASS2. (c) Vehicle speed profiles under JC08. (d) Relative distance between leading and following vehicles under JC08.

View Image - Figure 11. Average acceleration/deceleration for Method 1, 2, and 3 under FTP72, WLTC, and JC08. (a) Average acceleration. (b) Average deceleration.

Figure 11. Average acceleration/deceleration for Method 1, 2, and 3 under FTP72, WLTC, and JC08. (a) Average acceleration. (b) Average deceleration.

Table 1

Vehicle model parameters.

Parameter	Value	Unit
Vehicle mass $m$	826	kg
Rolling resistance coefficient	0.021	/
Road grade $α$	0	rad
Wheel radius $r$	0.284	m
Air resistance coefficient	0.25	/
Air density $ρ$	1.206	kg∙m⁻³
Frontal area $A$	1.87	m²
Transmission ratio $i$	6.515	/
Transmission efficiency $η$	95	%

Table 2

IDM model parameters.

Parameter	Interpretation	Unit
a_n(t)	Ego vehicle acceleration at time t	m·s⁻²
s^*	Desired distance between vehicle under current conditions	m
v_n(t)	Ego vehicle speed at time t	m·s⁻¹
Δv_n(t)	Speed difference between ego vehicle and preceding vehicle at time t	m·s⁻¹
Δs_n(t)	Space headway between ego vehicle and preceding vehicle at time t	m
φ	Acceleration exponent, typically set to 4	/
a	Desired maximum acceleration	m·s⁻²
b	Desired maximum deceleration	m·s⁻²
v ₀	Desired speed	m·s⁻¹
s ₁	Minimum safe distance when stationary	m
τ	Desired time headway	s·veh⁻¹

Table 3

Parameter calibration for the IDM model.

Parameter	v ₀	a	b	s ₁
Value	30 m∙s⁻¹	3 m∙s⁻²	−3 m∙s⁻²	2 m

References

1. Huang, Y.; Ng, E.C.Y.; Zhou, J.L.; Surawski, N.C.; Chan, E.F.C.; Hong, G. Eco-driving technology for sustainable road transport: A review. Renew. Sustain. Energy Rev.; 2018; 93, pp. 596-609. [DOI: https://dx.doi.org/10.1016/j.rser.2018.05.030]

2. Li, Z.; Khajepour, A.; Song, J. A comprehensive review of the key technologies for pure electric vehicles. Energy; 2019; 182, pp. 824-839. [DOI: https://dx.doi.org/10.1016/j.energy.2019.06.077]

3. Woo, J.; Choi, H.; Ahn, J. Well-to-wheel analysis of greenhouse gas emissions for electric vehicles based on electricity generation mix: A global perspective. Transp. Res. Part Transp. Environ.; 2017; 51, pp. 340-350. [DOI: https://dx.doi.org/10.1016/j.trd.2017.01.005]

4. Hwang, J.-J.; Kuo, J.-K.; Wu, W.; Chang, W.-R.; Lin, C.-H.; Wang, S.-E. Lifecycle performance assessment of fuel cell/battery electric vehicles. Int. J. Hydrogen Energy; 2013; 38, pp. 3433-3446. [DOI: https://dx.doi.org/10.1016/j.ijhydene.2012.12.148]

5. Pan, C.; Huang, A.; Chen, L.; Cai, Y.; Chen, L.; Liang, J.; Zhou, W. A review of the development trend of adaptive cruise control for ecological driving. Proc. Inst. Mech. Eng. Part J. Automob. Eng.; 2022; 236, pp. 1931-1948. [DOI: https://dx.doi.org/10.1177/09544070211049068]

6. Tu, R.; Xu, J.; Li, T.; Chen, H. Effective and Acceptable Eco-Driving Guidance for Human-Driving Vehicles: A Review. Int. J. Environ. Res. Public Health; 2022; 19, 7310. [DOI: https://dx.doi.org/10.3390/ijerph19127310]

7. Wang, Z.; Dridi, M.; El Moudni, A. Co-Optimization of Eco-Driving and Energy Management for Connected HEV/PHEVs near Signalized Intersections: A Review. Appl. Sci.; 2023; 13, 5035. [DOI: https://dx.doi.org/10.3390/app13085035]

8. Saerens, B.; Rakha, H.A.; Diehl, M.; Van Den Bulck, E. A methodology for assessing eco-cruise control for passenger vehicles. Transp. Res. Part Transp. Environ.; 2013; 19, pp. 20-27. [DOI: https://dx.doi.org/10.1016/j.trd.2012.12.001]

9. Chen, C.; Huang, C.; Jing, Q.; Wang, H.; Pan, H.; Li, L.; Zhao, J.; Dai, Y.; Huang, H.; Schipper, L. On-road emission characteristics of heavy-duty diesel vehicles in Shanghai. Atmos. Environ.; 2007; 41, pp. 5334-5344. [DOI: https://dx.doi.org/10.1016/j.atmosenv.2007.02.037]

10. Gallus, J.; Kirchner, U.; Vogt, R.; Benter, T. Impact of driving style and road grade on gaseous exhaust emissions of passenger vehicles measured by a Portable Emission Measurement System (PEMS). Transp. Res. Part Transp. Environ.; 2017; 52, pp. 215-226. [DOI: https://dx.doi.org/10.1016/j.trd.2017.03.011]

11. Masikos, M.; Demestichas, K.; Adamopoulou, E.; Theologou, M. Energy-efficient routing based on vehicular consumption predictions of a mesoscopic learning model. Appl. Soft Comput.; 2015; 28, pp. 114-124. [DOI: https://dx.doi.org/10.1016/j.asoc.2014.11.054]

12. Nie, Y.; Li, Q. An eco-routing model considering microscopic vehicle operating conditions. Transp. Res. Part B Methodol.; 2013; 55, pp. 154-170. [DOI: https://dx.doi.org/10.1016/j.trb.2013.06.004]

13. Han, J.; Wang, X.; Wang, G. Modeling the Car-Following Behavior with Consideration of Driver, Vehicle, and Environment Factors: A Historical Review. Sustainability; 2022; 14, 8179. [DOI: https://dx.doi.org/10.3390/su14138179]

14. Treiber, M.; Hennecke, A.; Helbing, D. Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E; 2000; 62, pp. 1805-1824. [DOI: https://dx.doi.org/10.1103/PhysRevE.62.1805] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/11088643]

15. Kesting, A.; Treiber, M. Calibrating Car-Following Models by Using Trajectory Data: Methodological Study. Transp. Res. Rec. J. Transp. Res. Board; 2008; 2088, pp. 148-156. [DOI: https://dx.doi.org/10.3141/2088-16]

16. Treiber, M.; Kesting, A. Microscopic Calibration and Validation of Car-Following Models—A Systematic Approach. Procedia—Soc. Behav. Sci.; 2013; 80, pp. 922-939. [DOI: https://dx.doi.org/10.1016/j.sbspro.2013.05.050]

17. Wang, S.; Yu, P.; Shi, D.; Yu, C.; Yin, C. Research on eco-driving optimization of hybrid electric vehicle queue considering the driving style. J. Clean. Prod.; 2022; 343, 130985. [DOI: https://dx.doi.org/10.1016/j.jclepro.2022.130985]

18. Hu, J.; Luo, S. A Car-Following Driver Model Capable of Retaining Naturalistic Driving Styles. J. Adv. Transp.; 2020; 2020, 6520861. [DOI: https://dx.doi.org/10.1155/2020/6520861]

19. Saifuzzaman, M.; Zheng, Z.; Mazharul Haque, M.; Washington, S. Revisiting the Task–Capability Interface model for incorporating human factors into car-following models. Transp. Res. Part B Methodol.; 2015; 82, pp. 1-19. [DOI: https://dx.doi.org/10.1016/j.trb.2015.09.011]

20. Tang, T.; Wang, Y.; Yang, X.; Wu, Y. A new car-following model accounting for varying road condition. Nonlinear Dyn.; 2012; 70, pp. 1397-1405. [DOI: https://dx.doi.org/10.1007/s11071-012-0542-8]

21. Tang, T.-Q.; Caccetta, L.; Wu, Y.-H.; Huang, H.-J.; Yang, X.-B. A macro model for traffic flow on road networks with varying road conditions: A macro model for traffic flow. J. Adv. Transp.; 2014; 48, pp. 304-317. [DOI: https://dx.doi.org/10.1002/atr.215]

22. Tang, T.Q.; Li, J.G.; Huang, H.J.; Yang, X.B. A car-following model with real-time road conditions and numerical tests. Measurement; 2014; 48, pp. 63-76. [DOI: https://dx.doi.org/10.1016/j.measurement.2013.10.035]

23. Yang, S.; Deng, C.; Tang, T.; Qian, Y. Electric vehicle’s energy consumption of car-following models. Nonlinear Dyn.; 2013; 71, pp. 323-329. [DOI: https://dx.doi.org/10.1007/s11071-012-0663-0]

24. Zhou, M.; Yu, Y.; Qu, X. Development of an Efficient Driving Strategy for Connected and Automated Vehicles at Signalized Intersections: A Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst.; 2020; 21, pp. 433-443. [DOI: https://dx.doi.org/10.1109/TITS.2019.2942014]

25. Qu, X.; Yu, Y.; Zhou, M.; Lin, C.-T.; Wang, X. Jointly dampening traffic oscillations and improving energy consumption with electric, connected and automated vehicles: A reinforcement learning based approach. Appl. Energy; 2020; 257, 114030. [DOI: https://dx.doi.org/10.1016/j.apenergy.2019.114030]

26. Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R. et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion; 2021; 76, pp. 243-297. [DOI: https://dx.doi.org/10.1016/j.inffus.2021.05.008]

27. Zhu, M.; Wang, X.; Wang, Y. Human-like autonomous car-following model with deep reinforcement learning. Transp. Res. Part C Emerg. Technol.; 2018; 97, pp. 348-368. [DOI: https://dx.doi.org/10.1016/j.trc.2018.10.024]

28. Liao, Y.; Yu, G.; Chen, P.; Zhou, B.; Li, H. Modelling personalised car-following behaviour: A memory-based deep reinforcement learning approach. Transp. Transp. Sci.; 2022; pp. 1-29. [DOI: https://dx.doi.org/10.1080/23249935.2022.2035846]

29. Yang, X.; Zou, Y.; Zhang, H.; Qu, X.; Chen, L. Improved deep reinforcement learning for car-following decision-making. Phys. Stat. Mech. Appl.; 2023; 624, 128912. [DOI: https://dx.doi.org/10.1016/j.physa.2023.128912]

30. Ben-Yaacov, A.; Maltz, M.; Shinar, D. Effects of an In-Vehicle Collision Avoidance Warning System on Short- and Long-Term Driving Performance. Hum. Factors J. Hum. Factors Ergon. Soc.; 2002; 44, pp. 335-342. [DOI: https://dx.doi.org/10.1518/0018720024497925]

31. Yu, S.; Shi, Z. An improved car-following model considering headway changes with memory. Phys. Stat. Mech. Appl.; 2015; 421, pp. 1-14. [DOI: https://dx.doi.org/10.1016/j.physa.2014.11.008]

32. Zhang, R.; Masoud, S.; Masoud, N. Impact of Autonomous Vehicles on the Car-Following Behavior of Human Drivers. J. Transp. Eng. Part Syst.; 2023; 149, 04022152. [DOI: https://dx.doi.org/10.1061/JTEPBS.TEENG-7385]

33. Yuan, Z.; Wang, T.; Zhang, J.; Li, S. Influences of dynamic safe headway on car-following behavior. Phys. Stat. Mech. Appl.; 2022; 591, 126697. [DOI: https://dx.doi.org/10.1016/j.physa.2021.126697]

34. Dey, P.P.; Chandra, S. Desired Time Gap and Time Headway in Steady-State Car-Following on Two-Lane Roads. J. Transp. Eng.; 2009; 135, pp. 687-693. [DOI: https://dx.doi.org/10.1061/(ASCE)0733-947X(2009)135:10(687)]

Word count: 5349

Show less

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Traditional car-following models usually prioritize minimizing inter-vehicle distance error when tracking the preceding vehicle, often neglecting crucial factors like driving economy and passenger ride comfort. To address this limitation, this paper integrates the concept of eco-driving and formulates a multi-objective function that encompasses economy, comfort, and safety. A novel eco-driving car-following strategy based on the deep deterministic policy gradient (DDPG) is proposed, employing the vehicle’s state, including data from the preceding vehicle and the ego vehicle, as the state space, and the desired time headway from the intelligent driver model (IDM) as the action space. The DDPG agent is trained to dynamically adjust the following vehicle’s speed in real-time, striking a balance between driving economy, comfort, and safety. The results reveal that the proposed DDPG-based IDM model significantly enhances comfort, safety, and economy when compared to the fixed-time headway IDM model, achieving an economy improvement of 2.66% along with enhanced comfort. Moreover, the proposed approach maintains a relatively stable following distance under medium-speed conditions, ensuring driving safety. Additionally, the comprehensive performance of the proposed method is analyzed under three typical scenarios, confirming its generalization capability. The DDPG-enhanced IDM car-following model aligns with eco-driving principles, offering novel insights for advancing IDM-based car-following models.

Details

Title

Research on Ecological Driving Following Strategy Based on Deep Reinforcement Learning

Author

Zhou, Weiqi¹

; Wu, Nanchi²; Liu, Qingchao¹; Pan, Chaofeng²; Long, Chen²

¹ Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, China; [email protected] (N.W.); [email protected] (Q.L.); [email protected] (C.P.); [email protected] (L.C.); Research Institute of Engineering Technology, Jiangsu University, Zhenjiang 212013, China
² Automotive Engineering Research Institute, Jiangsu University, Zhenjiang 212013, China; [email protected] (N.W.); [email protected] (Q.L.); [email protected] (C.P.); [email protected] (L.C.)

First page

13325

Publication year

2023

Publication date

2023

Publisher

MDPI AG

e-ISSN

20711050

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/su151813325

ProQuest document ID

2869679850

Research on Ecological Driving Following Strategy Based on Deep Reinforcement Learning

Jump to:

Full Text

Abstract

Details

Suggested sources