Longitudinal Hierarchical Control of Autonomous

Full text

Turn on search term navigation

This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

1. Introduction

The surge in vehicles has intensified the sustained trend of increasing traffic accident rates, highlighting the enduring importance of road traffic problems. It is worth noting that in recent years, leading companies including Baidu, Tesla, and Google have conducted extensive research work in the field of autonomous driving. AVs have the potential to contribute to the reduction of such traffic accidents and are poised to become a pivotal factor in transforming road traffic conditions. The design of precise control algorithms of AVs holds practical and theoretical significance.

Up to this point, a range of traditional and intelligent control algorithms for AVs have been developed. Traditional control methods rely on mathematical models of the controlled object to achieve closed-loop control, rendering them the preferred choice for practical vehicle applications. These methods encompass widely adopted approaches such as proportional–integral–derivative (PID) control, optimal control (OC), and model predictive control (MPC). For instance, Simorgh et al. [1] considered aerodynamics and employed a control algorithm based on the PID model reference adaptive control to manage the longitudinal speed of vehicles. HosseinNia et al. [2] proposed an optimized fractional order control approach and designed two distinct PI controllers based on the PID control theory to regulate throttle and brake systems, enhancing safety performance in traffic congestion scenarios. Schrdel et al. [3] derived OC laws based on dynamic programming principles, utilizing OC for tackling speed control challenges involving multiple objectives. Graffione et al. [4] developed a MPC methodology to oversee the spacing and speed of vehicle platoons, ultimately enhancing road safety. Moon et al. [5] introduced an efficient method for identifying target vehicles within road traffic flow environments, exploring the system’s smoothness and safety when tracking target vehicles through optimal controllers. Mekala et al. [6] proposed a longitudinal motion speed control strategy based on MPC for AV, aiming to achieve vehicle stability and safe driving.

In comparison to traditional control approaches, intelligent control methods offer distinct advantages. They operate without the need for fundamental mathematical models, are not reliant on precise control architectures, and possess the capability to autonomously learn from extensive data sources [7]. Intelligent control methods exhibit the ability to adapt to diverse and complex environments, making them invaluable in the realm of autonomous driving. Prominent examples of intelligent control methods include fuzzy control (FC), sliding mode control (SMC), and neural network control methods. For instance, Liangyi et al. [8] considered the integration of vehicle internal information and devised a fuzzy proportional–integral (PI) controller to govern longitudinal speed. This approach effectively ensures the safety of vehicle tracking and maintains stable fleet operation. Jin et al. [9] designed a longitudinal controller using the SMC method, which proves highly effective in tracking vehicle speed. Yuan et al. [10] proposed an end-to-end control methodology based on 3D convolutional neural networks (3DCNNs) and long short-term memory (LSTM) for coupled lateral and longitudinal lane-keeping control. PérezGil et al. [11] explored the use of deep Q-networks (DQN) for learning intelligent vehicle tracking control problems and tested these methods within the simulation environment, CARLA. He et al. [12] introduced an innovative entropy-constrained reinforcement learning scheme for multiobjective longitudinal decision-making of electric AV. Coppola et al. [13] also demonstrated the effectiveness of adaptive cruise control for autonomous electric vehicles using a Q-learning algorithm.

The longitudinal control system of AV presents a typical nonlinear discrete state with time-varying uncertain parameters and multiple disturbances. Therefore, it becomes imperative to consider multiple factors in the actual driving process. Singular reliance on either traditional or intelligent control modes falls short of meeting the precise control requirements for the longitudinal motion of AV, jeopardizing tracking accuracy. Consequently, numerous studies have integrated control methodologies to attain safer and more efficient longitudinal vehicle control. For instance, Sun et al. [14] formulated an objective function based on tracking performance and driving characteristics during the tracking process. They designed an upper-level optimal controller and created a lower-level low-speed tracking controller employing the fuzzy PID control method to track the desired speed. Yao [15] introduced an intelligent control method based on Q-learning with variable domain fuzzy PID. Nie et al. [16], in pursuit of enhanced speed tracking accuracy and safety management for AV, adopted a control approach based on adaptive radial basis function neural networks (RBFNNs) coupled with PID control, employing the vehicle dynamics model to establish the longitudinal speed control model.

In summary, researchers have successively engaged in the study of autonomous driving control technology, yielding numerous research achievements. However, concerning the control methods for autonomous vehicles (AVs), traditional and intelligent longitudinal control methods differ in principles and applicability for distinct scenarios, resulting in significant variations in their application outcomes. In previous longitudinal control research, reinforcement learning can be directly used to calculate the desired control inputs, but its effectiveness is not significantly improved compared to the PID algorithm, and training and testing processes require a lot of time [17]. In addition, the effectiveness of reinforcement learning is sometimes unstable [18], and the training process is easily affected by data bias and variance, so it is not optimal [19]. On the contrary, real vehicles predominantly employed traditional control algorithms, with PID control being suitable for simpler scenarios. It is a classic traditional method that requires three key parameters to be determined. When faced with complex systems, PID parameters are challenging to tune, requiring manual adjustment of controller parameters and posing difficulties in adapting to various operational conditions.

Therefore, this paper adopts a hierarchical control approach to investigate the longitudinal control of AVs. The upper-level control module utilizes a combination of PID control and DDPG algorithm to analyze environmental state information during vehicle operation and generate desired signals. The lower-level control is established based on PID control with drive-brake switching control logic. A joint simulation model is employed to evaluate the system’s responsiveness and stability across various scenarios.

2. Modeling of Vehicle Longitudinal Dynamics System

Carsim is a specialized software tailored for dynamic modeling and simulation of medium and small vehicles [20]. In this paper, the C-Class Hatchback model was used to construct a longitudinal dynamics system model for AVs, which involved several parameters within Carsim (such as vehicle body parameters) and integrating the engine, transmission, hydraulic torque converter, as well as other power and braking components into the control system.

To achieve a closer approximation of the actual vehicle control performance, the primary parameters of the vehicle body in Carsim are calibrated according to the Volkswagen Tanyue model, while the remaining parameters are configured utilizing the built-in parameters of the C-Class model. The specific settings for vehicle and environmental parameters are detailed in Table 1.

Table 1

Vehicle parameter settings.

Parameters	Symbol	Value	Units
Vehicle mass	$m$	1616	kg
Tire radius	$r$	324	mm
Final drive ratio	$i_{0}$	4.1	—
Air density	$ρ$	1.293	g/L
Air resistance coefficient	$C_{D}$	0.3	—
Rolling resistance coefficient	$f$	0.018	—
Tire-road friction coefficient	$μ$	0.85	—
Transmission efficiency	$η$	0.92	—
Windward area	$A$	2.2	m²

To valid the effectiveness of the simulation, this paper conduct modeling tests in conjunction with Carsim and MATLAB/Simulink platforms. The signals ultimately fed back to the vehicle model by the lower controller are throttle and brake pressure. Consequently, the selected input signals comprise the throttle and brake master cylinder pressure under the current state. The output signals are the longitudinal speed $V_{x}$ , longitudinal acceleration $A_{x}$ , engine speed $A V_E n g$ , relative speed between two vehicles $S p d S 1_1$ , relative distance between two vehicles $D i s S 1_1$ , and transmission gear $R g e a r_T r$ .

3. Upper Controller of AV Based on DDPG-PID

3.1. PID Controller Based on DDPG Algorithm

Given the continuous nature of both the input and output spaces in the domain of AV’s motion control, the PID controller emerges as the prevalent choice in industrial applications, owing to its straightforward design and robust performance. Nevertheless, the selection of the three pivotal parameters in the traditional PID controller significantly impacts system control but proves challenging to determine accurately. The conventional tuning process is not only time-consuming and labor-intensive but also lacks adaptability to diverse operational conditions. Deep reinforcement learning (DRL) stands out for its exceptional learning and adaptive capabilities. When employed for parameter adjustment of PID, it facilitates dynamic tuning in response to system variations, thereby enhancing control accuracy. In this study, the DDPG algorithm, integrated within the actor-critic (AC) framework suitable for handling continuous input and output spaces, is utilized to perform real-time PID parameter adjustments. Subsequently, these optimized parameters are input into the PID controller to enable precise target control, as shown in Figure 1.

[figure(s) omitted; refer to PDF]

3.2. PID Control Theory

PID controllers represent the most prevalent control algorithms employed in industrial processes [21]. The outcomes of the PID control algorithm are intricately linked to three key hyperparameters and are typically applied to regulate the state of the controlled system within diverse operational scenarios [22]. The adjustment of PID controller parameters $K_{p}, K_{i}, K_{d}$ and $K_{p}, K_{i}, K_{d}$ typically relies on specific research objectives to attain the desired control outcomes. It involves computing the disparity between the system input (setpoint) and the feedback (process variable) from the controlled system, denoted as the error $t$ . The error $e t$ between the system input $r t$ and the feedback $y t$ from the controlled object is formed, which is $e t = r t - y t$ . Following the calculations, the control output $u t$ is generated [23].

This study employs a positional PID control strategy to derive the actual control output, which can be expressed using the following formula: $\begin{matrix} (1) & u t = k_{p} e t + k_{i} \int_{0}^{t} e t d t + k_{d} \frac{d e t}{d t}, \end{matrix}$ where $u t$ is control signal of the PID controller regulating system; $K_{p}$ is proportional coefficient; $K_{i}$ is integral coefficient; $K_{d}$ is differential coefficient; $T_{i}$ integral time constant; $T_{d}$ is differential time constant.

3.3. DDPG Algorithm

DRL represents a convergence of deep learning and reinforcement learning, capable of addressing challenges that arise in complex state spaces. Among the model-free DRL approaches, the DDPG algorithm has the ability to produce continuous actions. Consequently, this study utilizes the DDPG algorithm for optimizing the PID controller [24].

In general, the DRL method typically models Markov decision processes (MDPs) relevant to specific research challenges [25]. This modeling process involves the integration of the current state $s_{t}$ , action $a_{t}$ , reward $r_{t}$ , and next state $s_{t + 1}$ into a set of elements $s_{t}, a_{t}, r_{t}, s_{t + 1}$ . These smaller sets of elements are amalgamated to create a larger set $S, A, R, S^{'}$ , with optimization objectives $π : s ⟶ a$ . The objective is to ensure that the system’s optimization strategy maximizes the cumulative reward value at time t. To achieve this, the Q function is defined as $Q^{π} = E R_{t} s_{t}, a_{t}$ . The ultimate objective of the continuous optimization within the DRL algorithm is to identify an optimal strategy $π^{*}$ that maximizes the cumulative reward value (R) while establishing as follows: $\begin{matrix} (2) & Q^{π^{*}} s, a \geq Q^{π} s, a, \forall s, a \in S, A . \end{matrix}$

This study offers a comprehensive explanation of the reward function developed in future sections. Regarding the network structure, DDPG relies on the AC framework to facilitate the optimization process, which comprises the actor network $μ s θ^{μ}$ , critic network $Q s, a θ^{Q}$ , target actor network $μ s θ^{μ^{'}}$ , and target critic network $Q s, a θ^{Q^{'}}$ .

The actor network employs a set of parameters $θ^{μ}$ to depict the deterministic strategy in the current state, and it generates corresponding actions based on the current strategy. Taking into account the relationship between cumulative rewards $Q^{π} = E R_{t} s_{t}, a_{t}$ and actions, as depicted in formula (3), a gradient ascent method is employed to update the parameters. $\begin{matrix} (3) & \nabla_{θ^{μ}} J \approx E_{s_{t} - ρ^{β}} \nabla_{θ^{μ}} Q {s, a θ^{Q} ⁣}_{s = s^{'}, a = μ s_{t} θ^{μ}} \\ = E_{s_{t} - ρ^{β}} \nabla_{a} Q s, a θ^{Q} s = s_{t} \\ a = μ s_{t} \cdot \nabla_{θ^{μ}} μ {s θ^{μ} ⁣}_{s = s_{t}} . \end{matrix}$

The input for the actor network proposed in this paper is $s_{t}$ , and the output is $a_{t}$ , which are three parameters of a set of PID controllers. The actor network comprises two hidden layers. The first layer in the network is a fully connected (FC) layer with 300 neurons, utilizing the rectified linear unit (ReLU) activation function. The second layer of the network is also an FC layer with 600 neurons, and it employs the ReLU activation function.

The critic network utilizes a set of parameters $θ^{Q}$ to compute the Q value based on the current action, and the precision of the Q value plays a vital role in network convergence. The parameters are updated by minimizing the loss function described in formulas (4) and (5), thereby enhancing the accuracy of the Q value [26]. $\begin{matrix} (4) & L θ^{Q} = E_{s_{t} \sim ρ^{β}, a_{t} \sim β, r_{t} \sim E} Q {s_{t}, a_{t} θ^{Q} - Q^{\land}}^{2} \\ = \frac{1}{N_{1}} \sum_{t} {Q s_{t}, a_{t} θ^{Q} - Q^{\land}}^{2}, \\ (5) & Q^{\land} = r_{t} + γ Q^{'} s_{t + 1}, μ^{'} s_{t + 1} θ^{μ^{'}} θ^{Q^{'}} . \end{matrix}$

The input layer of the critic network in this study comprises two components: the vehicle state $s_{t}$ and the vehicle’s output actions $a_{t}$ , which correspond to the parameters of the PID controller. The critic network’s hidden layer is structured as a multilayer network. The first hidden layer consists of 300 hidden units and employs the ReLU activation function. The second layer uses the linear function as the activation function and has 600 hidden units to extract features $l_{s}$ . Additionally, after passing through the hidden layers following the vehicle action input $a_{t}$ , there is a FC layer that utilizes the linear activation function with 600 hidden units to obtain further features $l_{a}$ . Subsequently, these features are combined and fused, and then they are output through another FC layer with a ReLU activation function, which contains 600 hidden units. The purpose of the critic network’s output layer is to provide the value function for environmental state behavior, yielding a scalar $Q s_{t}, a_{t}$ , and it does not require any activation functions.

The target strategy network employs parameters $θ^{μ^{'}}$ to estimate the desired action of the target, while the target Q network estimates the target Q value through parameters $θ^{Q^{'}}$ . The target network updates the respective parameter values using formulas (6) and (7), where α represents the sliding average coefficient [27]. $\begin{matrix} (6) & θ^{Q^{'}} ⟵ τ θ^{Q} + 1 - τ θ^{Q^{'}}, \\ (7) & θ^{μ^{'}} ⟵ τ θ^{μ} + 1 - τ θ^{μ^{'}} . \end{matrix}$

The structural design of the target actor network in this paper mirrors that of the actor network.

3.4. MDP Modeling

In this section, the longitudinal upper control problem of AV is expressed as an MDP model [28]. In the system involved in this study, it is assumed that the environment is completely observable, and there is a good communication foundation between the vehicle and the environment. The definition of MDP’s state space, action space, and reward function is as follows.

In a road traffic setting, when contemplating the longitudinal motion of vehicles, it becomes essential to account for the present driving state of the AV and its interaction with the preceding vehicle. If all the gathered information is directly integrated into the design of the state set, the significant volume of data can introduce complexities in the vehicle’s driving status, especially given the strong causal relationship among the data. The chosen state space encompasses the following variables: the vehicle’s speed, relative speed, vehicle acceleration, the acceleration of the preceding vehicle, and relative distance, denoted as $S = v_{x}, v_{r e l}, a_{x}, a_{f}, d_{r e l}$

In the longitudinal controller designed for the AV in this study, a DRL algorithm is employed to perform real-time adjustments to the parameters of the PID controller. These adjusted parameters are subsequently fed into the PID controller, which calculates the desired acceleration value as the output. Consequently, the action space in this article comprises three gain coefficients $= K_{p}, K_{i}, K_{d}$ .

This study takes into account various factors, including driving safety, efficiency, and comfort when designing reward functions. It assesses the significance of each factor by incorporating different weight coefficients. During the training process, AVs receive a notably substantial negative reward in the event of a collision, as illustrated in the following equation: $\begin{matrix} (8) & r_{1} = \begin{cases} 0, & Δ x \geq L, \\ - 100, & Δ x < L, \end{cases} \end{matrix}$ where $r_{1}$ represents the reward function of safety, $Δ x$ represents the distance between ego vehicle and the front vehicle, and $L$ represents the safe distance and the value is five in this paper.

In the desired state, it is essential to maintain a speed error of 0 between the vehicle and the preceding vehicle while driving. To attain this desired state, the reward function takes into account the corresponding rewards when the speed error between the current vehicle and the preceding vehicle is less than a certain threshold. The reward and penalty functions for speed errors are detailed in the following equations: $\begin{matrix} (9) & r_{2} = C_{1} \cdot Δ V_{i} + C_{2}, \\ (10) & C_{1} = \begin{matrix} - 4, & Δ V_{i} \leq 0.5, \\ - 3, & 0.5 < Δ V_{i} < 3, \\ 0, & else, \end{matrix} \\ (11) & C_{2} = \begin{matrix} 2, & Δ V_{i} \leq 1, \\ 1.5, & 1 < Δ V_{i} < 10, \\ - 7.5, & else, \end{matrix} \end{matrix}$ where $r_{2}$ represents the reward function of the speed error and $Δ V_{i}$ represents the speed error between the ego vehicle and the front vehicle.

When the speed error between the ego vehicle and the front vehicle is exactly 0, the intelligent agent receives the maximum reward value. If the speed error equals 0.5 m/s, the feedback value is set to 0. To prevent excessive penalties, if the speed error exceeds a certain threshold, the reward and penalty values are set to fixed values.

Abrupt changes in the vehicle’s motion state within a short period of time can lead to considerable discomfort for both drivers and passengers. The key factors influencing comfort are the magnitude of deceleration and the rate of change in deceleration. This paper computes the rate of acceleration change $r_{3}$ using the following equation: $\begin{matrix} (12) & r_{3} = \lim_{Δ t ⟶ 0} \frac{Δ a}{Δ t}, \end{matrix}$ where the change in acceleration per unit of time is represented as $Δ a$ . $r_{3}$ represents the reward function of abrupt accelerations.

When an AV is on the road, it needs to respond promptly to changes in the motion state of the preceding vehicle, such as sudden acceleration or abrupt deceleration. To assess the vehicle’s responsiveness, a reward function, denoted as $r_{4}$ , is designed to express the vehicle response sensitivity index as described in the following equation: $\begin{matrix} (13) & r_{4} = \begin{matrix} 2 - a_{i}, & a_{i} > 2, \\ 0, & - 3.5 \leq a_{i} \leq 2, \\ 3.5, & a_{i} < - 3.5 . \end{matrix} \end{matrix}$

In summary, the reward function used in the experiment comprises a total of four components, as presented in the following equation: $\begin{matrix} (14) & r_{total} = ω_{1} r_{1} + ω_{2} r_{2} + ω_{3} r_{3} + ω_{4} r_{4}, \end{matrix}$ where $ω_{1}$ , $ω_{2}$ , $ω_{3}$ , and $ω_{4}$ represent the weight coefficients of the reward function. In this study, the weight coefficients were set to 1, 0.2, 0.2, and 0.1, respectively.

In this study, it is assumed that the vehicle controllers are distributed within each vehicle, and there is a communication foundation between vehicles. Each controller can use the obtained preceding vehicle information to obtain environmental status information based on vehicle communication and body sensors. Therefore, the desired acceleration during the car following process is as follows: $\begin{matrix} (15) & u t = K_{p} v_{n - 1} t - v_{n} t + K_{i} Δ x_{n} t - v_{n} t - L \\ + K_{d} a_{n - 1} t - a_{n} t, \end{matrix}$ where $u t$ is the desired acceleration.

The training data for DDPG-PID consist of experiences collected by the agent interacting with the environment, including the current state, action taken, reward received, next state, and a terminal flag indicating whether the episode has ended. These experiences are stored in a replay buffer. During training, mini-batches of experiences are randomly sampled from the replay buffer to update the policy network and value network. This process optimizes the agent’s behavior (parameters of PID), enabling it to perform effectively in continuous action space tasks.

3.5. Comparison of Simulation Results

This chapter primarily focuses on the design of the upper controller for longitudinal control of AVs. To assess the actual tracking performance of the upper control algorithm under varying input speeds and acceleration values, it is validated through the use of different operating conditions. To showcase the complexity, an enhanced speed curve is constructed, which includes uniform linear motion, rapid acceleration, and sudden deceleration. The complex desired speed curve developed is depicted in Figure 2.

[figure(s) omitted; refer to PDF]

The control performance of the DDPG-PID control algorithm proposed in this paper was evaluated under these operating conditions using a complex expected speed curve and compared the control performance with traditional PID controllers and adaptive PID algorithms (gain coefficients have been added to the coefficients of PID [29]). The simulation results are depicted in Figures 3, 4, and 5.

[figure(s) omitted; refer to PDF]

The figures display the longitudinal speed, longitudinal speed tracking error, and longitudinal acceleration tracking outcomes during the testing of the desired speed curve. Based on the analysis of the simulation results for the aforementioned operating conditions, it is evident that, under this specific working condition, the DDPG-PID controller exhibits superior control performance throughout the entire longitudinal tracking process of the AV. In addition, it can be seen from the acceleration graph that DDPG-PID is more sensitive to changes in acceleration and has smaller fluctuations compared to other models, resulting in better driving stability.

To compare the errors of models above more intuitively, this paper selects root mean squared error (RMSE), mean absolute error (MAE), and $R^{2}$ coefficient, to measure the tracking effect. The calculation formulas are shown in formulas (15) to (17). In addition, the maximum, minimum, and mean values of the errors shown in Figure 4 were statistically analyzed, and the results are shown in Tables 2, 3, 4, and 5. $\begin{matrix} (16) & R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {\hat{y}}_{i} - y_{i}^{'}^{2},} \\ (17) & M A E = \frac{1}{n} \sum_{i = 1}^{n} {\hat{y}}_{i} - y_{i}^{'}, \\ (18) & R^{2} = 1 - \frac{\sum_{i} {y_{i} - y_{i}^{'}}^{2}}{\sum_{i} {\bar{y}}_{i} - y_{i}^{'}^{2}}, \end{matrix}$ where $\hat{y_{i}}$ represents the predicted value of the model, ${y_{i}}^{'}$ represents the desired value, $\bar{y_{i}}$ represents the average value of desired value, and $n$ represents the amount of data.

Table 2

Comparison of speed tracking errors.

Model	RMSE (m/s)	MAE (m/s)	R²
PID	0.47	0.38	0.995
Adaptive PID	0.35	0.28	0.997
DDPG-PID	0.23	0.18	0.998

Abbreviations: MAE, mean absolute error; RMSE, root mean squared error.

Table 3

Comparison of acceleration tracking errors.

Model	RMSE (m/s²)	MAE (m/s²)	MAPE (m/s²)
PID	0.46	0.28	0.849
Adaptive PID	0.25	0.14	0.955
DDPG-PID	0.24	0.12	0.957

Abbreviations: DDPG, deep deterministic policy gradient; MAE, mean absolute error; PID, proportional–integral-derivative; RMSE, root mean squared error.

Table 4

Numerical characteristics of speed tracking errors.

Model	Max (m/s)	Min (m/s)	Average (m/s)
PID	0.58	−1.16	−0.07
Adaptive PID	0.60	−0.90	−0.08
DDPG-PID	0.40	−0.90	0.01

Abbreviations: DDPG, deep deterministic policy gradient; Max, maximum error; Min, minimum error; PID, proportional–integral-derivative.

Table 5

Numerical characteristics of acceleration tracking errors.

Model	Max (m/s²)	Min (m/s²)	Average (m/s²)
PID	1.30	−1.20	−0.046
Adaptive PID	1.20	−0.94	0.033
DDPG-PID	1.10	−0.80	0.010

Abbreviations: DDPG, deep deterministic policy gradient; Max, maximum error; Min, minimum error; PID, proportional–integral-derivative.

By reviewing the table above, it is evident that compared to other models, DDPG-PID has improved RMSE, MAE, and $R^{2}$ by 34.3%, 35.7%, and 0.1%, respectively, in terms of speed tracking error. Regarding acceleration tracking error, compared with alternative models, DDPG-PID has improved by 4.0%, 14.3%, and 0.2% in three indicators, respectively. In addition, when examining the statistical values of speed and acceleration tracking errors, DDPG-PID significantly reduces both maximum and average errors compared to other models.

4. Lower Controller of AV

The lower controller manages the vehicle’s acceleration based on desired acceleration output from the upper controller. It is essential to design an efficient lower controller that converts the desired acceleration input into the necessary accelerator or brake pedal positions for the vehicle. The structure is shown in Figure 6.

[figure(s) omitted; refer to PDF]

For the PID controller, the error of the acceleration $a_{r} t$ generated by the desired acceleration of the upper control input $a_{d e s} t$ and the actual acceleration feedback $a_{a c t} t$ from the vehicle model is used as the input, and the output is the control acceleration $a_{c o n} t$ . Therefore, the driving and braking switching strategy will be designed based on the control acceleration. The input acceleration error can be calculated according to the following equation: $\begin{matrix} (19) & a_{r} t = a_{d e s} t - a_{a c t} t . \end{matrix}$

The output acceleration can be calculated as follows: $\begin{matrix} (20) & a_{con} t = k_{p} a_{r} t + k \sum a_{r} t + k_{d} a_{r} t - a_{r} t - 1 . \end{matrix}$

4.1. Reverse Drive System Model

During the driving process of the vehicle, while being driven by the driving force $F_{t}$ , it also receives resistance including air resistance $F_{k}$ , rolling friction $F_{f}$ , slope resistance $F_{p}$ , and acceleration resistance $F_{j}$ , which can be calculated as follows: $\begin{matrix} (21) & F_{f} = f F_{n} = f m g \cos θ, \\ (22) & F_{k} = \frac{1}{2} C_{D} ρ A V^{2}, \\ (23) & F_{p} = G \sin θ = m g \sin θ, \\ (24) & F_{j} = δ m a = δ m \frac{d u}{d t}, \end{matrix}$ where $C_{D}$ is the air resistance coefficient; $ρ$ is the air density, normal can be taken as 1.293 $g / L$ ; $A$ is the windward area of the automatic vehicle; $V$ is the speed; $F_{N}$ is positive pressure; $f$ is the rolling resistance coefficient. $G$ is the gravity of the car; $m$ is the curb weight of the vehicle; $θ$ is the angle of the slope of the driving road, where $θ = \arctan 1 / s$ . $δ$ is the rotational mass conversion coefficient, generally greater than 1; $d u / d t$ is the acceleration of the vehicle.

According to Newton’s laws of motion, the force acting on the car in the x-direction is as follows: $\begin{matrix} (25) & F_{t} - F_{k} - F_{f} - F_{p} - F_{j} = m a_{d e s} . \end{matrix}$

Among them, $\sum f$ represents the sum of resistances.

Without considering the deformation of the vehicle’s transmission system, the driving force can be calculated as follows: $\begin{matrix} (26) & F_{t} = \frac{T_{t q} i_{g} i_{o} η_{T}}{r} τ \frac{n_{t}}{n_{e}} . \end{matrix}$

By using constants $k$ to facilitate the observation of the overall parameters, the following formulas can be obtained: $\begin{matrix} (27) & F_{t} = \frac{T_{t q} i_{g} i_{o} η_{T}}{r} τ \frac{n_{t}}{n_{e}} = k T_{t q}, \\ (28) & k = \frac{i_{g} i_{o} η_{T}}{r} τ \frac{n_{t}}{n_{e}}, \end{matrix}$ where $T_{t q}$ is the output torque of the engine; $i_{g}$ is the transmission gear ratio; $i_{0}$ is main reducer reduction ratio; $η_{T}$ is mechanical efficiency of the transmission system; $r$ is the radius of the wheel; $τ n_{t} / n_{e}$ is the torque characteristic coefficient of the hydraulic torque converter.

According to the drive/brake switching control logic, when the throttle is used for control and is not affected by the braking system, equation (29) for the desired engine torque $T_{t q}$ can be obtained as: $\begin{matrix} (29) & T_{t q} = \frac{\sum f + m a_{d e s}}{k} = \frac{1 / 2 C_{D} ρ A V^{2} + f m g \cos θ + m g \sin θ + δ m d u / d t}{k} . \end{matrix}$

The relationship between the output torque $T_{t q}$ , throttle opening $t h r$ , and engine speed $n$ of the engine can be calculated as follows: $\begin{matrix} (30) & t h r = f T_{t q}, n . \end{matrix}$

The throttle opening can be obtained based on the output torque and speed value of the engine.

4.2. Reverse Braking System Model

When regulating the AV’s braking system, it is influenced by multiple factors, including air resistance, rolling resistance, slope resistance, braking force, and the counteractive forces generated by conventional systems. Under the collective influence of these resistances, the desired acceleration signal is utilized to determine the brake master cylinder pressure based on the longitudinal dynamics model, thereby achieving vehicle deceleration. Throughout the braking procedure, the vehicle’s throttle remains closed at zero. In accordance with Newton’s laws of motion, the force equilibrium relationship in the axial direction is as follows: $\begin{matrix} (31) & - F_{z} - F_{1} - F_{k} - F_{f} - F_{p} = m a_{d e s}, \end{matrix}$ where $F_{z}$ is the braking force; $F_{1}$ is the counteractive force.

When subjected to braking pressure, the wheels that typically rotate forward experience a braking force acting in the opposite direction to the road surface. From the torque balance relationship, equation (32) can be obtained: $\begin{matrix} (32) & F_{z} = \frac{T_{μ}}{r} = \frac{T_{b f} + T_{b r}}{r}, \end{matrix}$ where $T_{b f}$ is the friction torque of the front wheel brake; $T_{b r}$ is the friction torque of the rear wheel brake.

When the desired acceleration $a_{d e s}$ output from the upper layer, the friction torque acting on the wheel brake can be calculated as follows: $\begin{matrix} (33) & T_{μ} = - F_{1} - F_{k} - F_{f} - F_{p} - m a_{d e s} \cdot r . \end{matrix}$

When the road adhesion limit is less than or equal to unity, the relationship between vehicle braking force and braking pressure is presented as follows: $\begin{matrix} (34) & F_{z} = K_{b} P_{b} . \end{matrix}$

Moreover, the relationship between braking torque and brake master cylinder pressure is as follows: $\begin{matrix} (35) & T_{μ} = K_{b} P_{b} R, \end{matrix}$ where $K_{b}$ is the efficiency factor; $P_{b}$ is the braking pressure of the vehicle; $R$ is the radius of action of the vehicle brake.

The relationship between desired acceleration and brake master cylinder pressure can be obtained as follows: $\begin{matrix} (36) & P_{b} = \frac{- F_{1} - F_{k} - F_{f} - F_{p} - m a_{d e s} \cdot r}{K_{b} R} . \end{matrix}$

By establishing a reverse braking system model in the lower controller, the conversion of brake master cylinder pressure can be obtained, which serves as the output of the lower controller for AVs during braking control.

4.3. Drive Brake Switching Logic

Frequently toggling between throttle and brake control pedals can result in severe damage to the vehicle’s transmission system and other automotive components, which can also lead to vibrations and performance discrepancies within the power system, ultimately causing a suboptimal driving experience for passengers. To mitigate these issues, a mode switch with a width of $Δ h$ is introduced, aimed at reducing the need for frequent transitions between braking and driving modes. The width of this switch band is set at 0.2, with upper and lower limits corresponding to intermediate control acceleration values.

The force of the vehicle during driving without being affected by the driving brake system is as follows: $\begin{matrix} (37) & F - \sum F v = m a_{1}, \\ (38) & a_{1} = \frac{F - \sum F v}{m}, \end{matrix}$ where $F$ is the driving force exerted by the road surface on the vehicle; $a_{1}$ is the acceleration of the vehicle; $m$ is the mass; $\sum F v$ is the sum of various resistances experienced by an AV.

When the desired acceleration is greater than $a_{1} + 0.1$ , the lower controller should adopt drive control; When the desired acceleration output by the upper controller at the current vehicle speed is less than $a_{1} - 0.1$ , the lower controller should adopt braking control; When the desired acceleration value output by the upper controller $a_{1}$ meets the relationship $a_{1} - 0.1 < a < a_{1} + 0.1$ , the control system does not use either drive control or brake control mode. The AV decelerates under the combined action of system reverse drag, rolling resistance, air resistance, and slope resistance.

4.4. Model Simulation Verification

To verify the actual tracking ability and response delay of the designed lower controller to the desired acceleration output from the upper controller, a step operating condition was designed and simulation verification was conducted with Carsim and Simulink. The lower level PID control parameters were set to $k_{p} = 0.5; k_{i} = 0.22; k_{d} = 0.01$ . Set the initial speed to 40 km/h and the input desired acceleration to 0. When t = 10 s, the input desired acceleration value is 1 m/s². When t = 20 s, the desired acceleration value is 0. Therefore, the overall desired acceleration is a step function with an amplitude of 1 m/s² and a duration of 10 s. The results are shown in Figures 7 and 8.

[figure(s) omitted; refer to PDF]

Analysis of the step response acceleration simulation results reveals that at 10 s, there is a deviation in the actual acceleration. However, this deviation quickly recovers to a normal tracking state within a very short time. Around 15 s, a sudden change occurs in the actual acceleration, attributed to abrupt engine speed variations during a gear shift in the vehicle’s acceleration process. Overall, the controller exhibits relative stability in the tracking of desired acceleration and speed.

5. Joint Simulation Verification of Control System

In this paper, Simulink and Carsim platforms are combined to carry out the overall control algorithm and model verification. The effectiveness and reasonableness of the designed control algorithm are tested on a good road surface with a road adhesion coefficient of 0.85, with a simulation step size of 0.01 s. Typical traffic scenarios in actual road traffic (steady-state following conditions and start-up following conditions) are selected for simulation testing to verify the effectiveness of the algorithm, and the specific design is as follows:

The steady-state following scenario set in this section is intended to start with the vehicle cruising at a constant speed of 60 km/h. There is a front vehicle traveling at 50 km/h 100 m ahead of this lane, and its speed varies from 40 km/h to 60 km/h. During this period, ego vehicle always follows the front vehicle.

Figures 9, 10, and 11 show that after the speed change of the front vehicle, ego vehicle also undergoes acceleration and deceleration to follow the front vehicle, and the changes of the motion state of ego vehicle are in line with the simulated driving scenario. At the beginning, the vehicle was traveling at a constant speed of 60 km/h with the acceleration kept at zero. At this time, it was far away from the front vehicle. After monitoring the front vehicle at 20 s, it starts to do deceleration. During this period, the speed changes gently, and after a small fluctuation of acceleration, it immediately returns to its original tracking state. At about 40 s, the two vehicles have the same speed, and the acceleration is 0. At this time, the distance between the cars remains unchanged. In the subsequent deceleration and acceleration, although there is a delay in ego vehicle, the amplitude is small and there is no extreme value. Ultimately, it can maintain the same speed as the front vehicle, indicating that the controller has a good effect on the speed control. The acceleration of ego vehicle is basically consistent with the desired acceleration, and both positive and negative changes in acceleration correspond to the changes in vehicle speed, with its maximum peak value reaching 0.37 m/s² and minimum peak value −0.45 m/s², indicating that the acceleration and deceleration of the vehicle are relatively comfortable. Moreover, the control of the distance between the two vehicles is relatively gentle, greater than the safe distance, which indicates that the controller designed in this paper has good following performance.

[figure(s) omitted; refer to PDF]

The driving scenario proposed in this section is when the vehicle is about to start at the beginning, or when the vehicle is restarted after braking in a traffic congested road. Assuming that at the beginning of the state, ego vehicle and the front vehicle remain stationary, with a distance of 6 m between the two vehicles. At the initial moment, the front vehicle starts to start and accelerates to 30 km/h within 5 s. At the same time, ego vehicle also reacts accordingly and follows the front vehicle to start and accelerate, ultimately maintaining the state of driving at a constant speed with a certain distance between the vehicles.

According to the results shown in Figures 12, 13, and 14, at the initial moment, ego vehicle and the front vehicle remained stationary, with a distance of 6 m between the two vehicles. The front vehicle suddenly accelerates, and the acceleration process lasts for 5 s, with the speed rising from 0 to 30 km/h. Afterward, the speed of the front vehicle remains stable at 30 km/h and travels at a constant speed. After monitoring a change in the state of the front vehicle, the vehicle reacts quickly and immediately does acceleration in 1–2 s. At this point, the acceleration suddenly surges to 1.54 m/s². Although there is a small fluctuation, the amplitude is small, and the response time was short, with an immediate deceleration to 1.44 m/s². After that, the acceleration stabilizes, and then the workshop distance continues to increase. Until 7.2 s, the speed of ego vehicle and the front vehicle becomes the same, after that the speed remains unchanged. At this time, the distance between the vehicles reaches a maximum of 21.63 m, and the acceleration also returns to zero. The actual acceleration and desired acceleration curves are basically consistent, and their magnitude changes are in line with the trend of vehicle speed. The overall switching is relatively smooth, which shows that the controller is more effective in starting the vehicle to follow the driving.

[figure(s) omitted; refer to PDF]

6. Conclusion

This paper established a hierarchical longitudinal control system based on the DDPG and PID control algorithms, aiming to address challenges in the current state of longitudinal control technology for AVs. The conclusions can be drawn as follows:

1. A longitudinal control algorithm was designed under a hierarchical control structure. We use Carsim to model the longitudinal dynamics of the vehicle and generate the required acceleration through the upper layer DDPG and PID algorithms. The research results indicate that compared with other models, DDPG-PID has improved RMSE, MAE, and R2 by 34.3%, 35.7%, and 0.1%, respectively, demonstrating excellent performance in handling sudden changes in perception information in driving scenarios.

2. The lower level control adopts a PID-based driving and braking switching control strategy. Through joint simulation verification using Carsim and Simulink, the longitudinal control method has demonstrated robustness and adaptability in various simulation scenarios, affirming its reliability under different road conditions.

3. The designed method effectively controls the speed and following distance of the vehicle, provides a feasible longitudinal control solution for the AV, and provides effective technical support for the safe operation of autonomous driving.

Future research directions include further optimizing algorithm performance, considering more complex traffic scenarios and vehicle behavior models, to enhance the robustness of the system in complex environments. In addition, exploring real-world experiments under actual road conditions to validate simulation results will help improve the reliability of research results. These efforts will help further promote and apply autonomous driving technology.

Funding

This work was supported by the National Natural Science Foundation of China (72371019).

Acknowledgments

This work was supported by the National Natural Science Foundation of China (72371019). We did not use AI in our paper, including text and images generated by artificial intelligence.

References

[1] A. Simorgh, A. Marashian, A. Razminia, "Adaptive PID Control Design for Longitudinal Velocity Control of Autonomous Vehicles," Proceedings of the 2019 International Conference on Control, Instrumentation and Automation (ICCIA),DOI: 10.1109/iccia49288.2019.9030856, .

[2] S. H. HosseinNia, I. Tejado, B. M. Vinagre, V. Milanés, J. Villagrá, "Low Speed Control of an Autonomous Vehicle Using a Hybrid Fractional Order Controller," Proceedings of the 2nd International Conference on Control, Instrumentation and Automation, pp. 116-121, DOI: 10.1109/icciautom.2011.6356641, .

[3] S. Frank, P. Herrmann, N. Schwarz, "An Improved Multi-Object Adaptive Cruise Control Approach," Proceedings of the 10th IFAC Symposium on Intelligent Autonomous Vehicles, .

[4] S. Graffione, C. Bersani, R. Sacile, E. Zero, "Model Predictive Control of a Vehicle Platoon," Proceedings of the 2020 IEEE 15th International Conference of System of Systems Engineering (SoSE), pp. 513-518, .

[5] S. Moon, H. J. Kang, K. Yi, "Multi-vehicle Target Selection for Adaptive Cruise Control," vol. no. 11, pp. 1325-1343, DOI: 10.1080/00423114.2010.499952, .

[6] G. K. Mekala, N. R. Sarugari, A. Chavan, "Speed Control in Longitudinal Plane of Autonomous Vehicle Using MPC," Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON),DOI: 10.1109/inocon50539.2020.9298213, .

[7] J. Zhao, Z. Yu, X. Yang, Z. Gao, W. Liu, "Short Term Traffic Flow Prediction of Expressway Service Area Based on STL-OMS," Physica A: Statistical Mechanics and Its Applications, vol. 595,DOI: 10.1016/j.physa.2022.126937, 2022.

[8] L. Yang, D. Sun, F. Xie, J. Zhu, "Study of Autonomous Platoon Vehicle Longitudinal Modeling," Proceedings of the IET International Conference on Intelligent and Connected Vehicles (ICV 2016), .

[9] M. Jin, "Improvement of Road-Following Intelligent Speed Control Based on Road Curvature," Proceedings of the 2013 Third International Conference on Intelligent System Design and Engineering Applications, pp. 870-873, .

[10] W. Yuan, M. Yang, C. Wang, B. Wang, "Longitudinal and Lateral Coupling Model Based End-To-End Learning for Lane Keeping of Self-Driving Car," Proceedings of the 4th International Conference on Cognitive Systems and Information Processing (ICCSIP), pp. 425-436, .

[11] Ó. Pérez-Gil, R. Barea, E. López-Guillén, "Dqn-Based Deep Reinforcement Learning for Autonomous Driving," Proceedings of the 21st International Workshop of Physical Agents (WAF 2020), .

[12] X. He, C. Fei, Y. Liu, K. Yang, X. Ji, "Multi-Objective Longitudinal Decision-Making for Autonomous Electric Vehicle: A Entropy-Constrained Reinforcement Learning Approach," Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece,DOI: 10.1109/itsc45102.2020.9294736, .

[13] A. Coppola, A. Petrillo, R. Rizzo, S. Santini, "Adaptive Cruise Control for Autonomous Electric Vehicles Based on Q-Learning Algorithm," 2021 AEIT International Annual Conference (AEIT),DOI: 10.23919/aeit53387.2021.9627059, .

[14] Z. Sun, R. Wang, Q. Ye, Z. Wei, B. Yan, "Investigation of Intelligent Vehicle Path Tracking Based on Longitudinal and Lateral Coordinated Control," IEEE Access, vol. 8, pp. 105031-105046, DOI: 10.1109/access.2020.2994437, 2020.

[15] Y. Yao, N. Ma, C. Wang, Z. Wu, C. Xu, J. Zhang, "Research and Implementation of Variable-Domain Fuzzy PID Intelligent Control Method Based on Q-Learning for Self-Driving in Complex Scenarios," Mathematical Biosciences and Engineering, vol. 20 no. 3, pp. 6016-6029, DOI: 10.3934/mbe.2023260, 2023.

[16] L. Nie, J. Guan, C. Lu, H. Zheng, Z. Yin, "Longitudinal Speed Control of Autonomous Vehicle Based on a Self-Adaptive PID of Radial Basis Function Neural Network," IET Intelligent Transport Systems, vol. 12 no. 6, pp. 485-494, DOI: 10.1049/iet-its.2016.0293, 2018.

[17] M. Fayyazi, M. Abdoos, D. Phan, "Real-Time Self-Adaptive Q-Learning Controller for Energy Management of Conventional Autonomous Vehicles," Expert Systems with Applications, vol. 222,DOI: 10.1016/j.eswa.2023.119770, 2023.

[18] C. Desjardins, B. Chaib-draa, "Cooperative Adaptive Cruise Control: A Reinforcement Learning Approach," IEEE Transactions on Intelligent Transportation Systems, vol. 12 no. 4, pp. 1248-1260, DOI: 10.1109/tits.2011.2157145, 2011.

[19] L. Song, J. Li, Z. Wei, K. Yang, E. Hashemi, H. Wang, "Longitudinal and Lateral Control Methods from Single Vehicle to Autonomous Platoon," Green Energy and Intelligent Transportation, vol. 2,DOI: 10.1016/j.geits.2023.100066, 2023.

[20] L. Guo, Z. Ren, P. Ge, J. Chang, "Advanced Emergency Braking Controller Design for Pedestrian Protection Oriented Automotive Collision Avoidance System," The Scientific World Journal, vol. 2014, 2014.

[21] L. Wang, X. Fang, S. Duan, X. Liao, "PID Controller Based on Memristive CMAC Network," Abstract and Applied Analysis, vol. 2013,DOI: 10.1155/2013/510238, 2013.

[22] Y. Chen, D. Li, H. Zhong, R. Zhao, "The Method for Automatic Adjustment of AGV’s PID Based on Deep Reinforcement Learning," Journal of Physics: Conference Series, vol. 2320 no. 1,DOI: 10.1088/1742-6596/2320/1/012008, 2022.

[23] L. Yu, Y. Chen, S. Chen, Y. Zhang, H. Zhang, C. Liu, "Numerical Analysis of the Performance of a PID-Controlled Air Curtain for Fire-Induced Smoke Confinement in a Tunnel Configuration," Fire Safety Journal, vol. 141,DOI: 10.1016/j.firesaf.2023.103930, 2023.

[24] S. Wang, Y. Hu, Z. Liu, L. Ma, "Research on Adaptive Obstacle Avoidance Algorithm of Robot Based on DDPG-DWA," Computers & Electrical Engineering, vol. 109,DOI: 10.1016/j.compeleceng.2023.108753, 2023.

[25] Y. Qian, S. Feng, W. Hu, W. Wang, "Obstacle Avoidance Planning of Autonomous Vehicles Using Deep Reinforcement Learning," Advances in Mechanical Engineering, vol. 14 no. 12,DOI: 10.1177/16878132221139661, 2022.

[26] P. Kuo, J. Hu, K. Chen, W. Chang, X. Chen, C. Huang, "Sequential Sensor Fusion-Based W-DDPG Gait Controller of Bipedal Robots for Adaptive Slope Walking," Advanced Engineering Informatics, vol. 57,DOI: 10.1016/j.aei.2023.102067, 2023.

[27] D. Li, O. Okhrin, "Modified DDPG Car-Following Model With a Real-World Human Driving Experience With CARLA Simulator," Transportation Research Part C: Emerging Technologies, vol. 147,DOI: 10.1016/j.trc.2022.103987, 2023.

[28] L. Ye, K. Ling, Q. Han, "DIMDP: A Driving Intention-Based MDP Service Migration Model Under MEC/MSCN Architecture," Mobile Information Systems, vol. 2022,DOI: 10.1155/2022/4988266, 2022.

[29] K. J. Åström, T. Hägglund, Advanced PID Control, 2005.

Word count: 6552

Show less

Copyright © 2024 Jialu Ma et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Longitudinal control of autonomous vehicles (AVs) has long been a prominent subject and challenge. A hierarchical longitudinal control system that integrates deep deterministic policy gradient (DDPG) and proportional–integral–derivative (PID) control algorithms was proposed in this paper to ensure safe and efficient vehicle operation. First, a hierarchical control structure was employed to devise the longitudinal control algorithm, utilizing a Carsim-based model of the vehicle’s longitudinal dynamics. Subsequently, an upper controller algorithm was developed, combining DDPG and PID, wherein perceptual information such as leading vehicle speed and distance served as input state for the DDPG algorithm to determine PID parameters and output the desired acceleration of the vehicle. Following this, a lower controller was designed employing a PID-based driving and braking switching strategy. The disparity between the desired and actual accelerations was fed into the PID, which calculated the control acceleration to enact the driving and braking switching strategy. Finally, the effectiveness of the designed control algorithm was validated through simulation scenarios using Carsim and Simulink. Results demonstrate that the longitudinal control method proposed herein adeptly manages vehicle speed and following distance, thus satisfying the safety requirements of AVs.

Details

Title

Longitudinal Hierarchical Control of Autonomous Vehicle Based on Deep Reinforcement Learning and PID Algorithm

Author

Ma, Jialu¹

; Zhang, Pingping²

; Li, Yixian³

; Gao, Yuhang³

; Zhao, Jiandong⁴

¹ School of Computer and Information Technology Beijing Jiaotong University Beijing 100044 China
² Component Purchasing Department Beijing Hyundai Motor Company Beijing 101300 China
³ School of Traffic and Transportation Beijing Jiaotong University Beijing 100044 China
⁴ School of Systems Science Beijing Jiaotong University Beijing 100044 China

Editor

Peng Hang

Publication year

2024

Publication date

2024

Publisher

John Wiley & Sons, Inc.

ISSN

01976729

e-ISSN

20423195

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2024/2179275

ProQuest document ID

3126584894

Longitudinal Hierarchical Control of Autonomous Vehicle Based on Deep Reinforcement Learning and PID Algorithm

Jump to:

Full text

Abstract

Details

Suggested sources