H∞ Differential Game of Nonlinear Half-Car Active

Full text

Turn on search term navigation

1. Introduction

The active suspension system is a critical component for intelligent vehicle operation, significantly impacting driving, steering, braking, and obstacle navigation [1]. With advancements in actuators and drive-by-wire technology, the application of active suspensions is becoming increasingly widespread. Particularly during the pilot phase of intelligent driving technology, active suspension control based on AI large models has emerged as a challenging and vital research area [2,3].

In recent years, researchers have proposed numerous actuator and controller design methods for active suspension systems, including air suspension, electromagnetic suspension, and a magneto-rheological damper [4,5,6]. Reference [7] investigated finite-time neural control of electromagnetic suspension, taking partial actuator failures into account. Reference [8] designed a static output feedback controller for active suspension, which reduces implementation costs. With the advancement of sensing technology, road preview-based active suspension control techniques have also been developed, such as wheelbase preview control [9,10] and comfort-oriented longitudinal velocity planning [11].

In the design of active suspension control, it is crucial to consider the system’s nonlinearity and fault tolerance. References [12,13] developed a nonlinear model predictive controller and a robust fault-tolerant controller with saturation for a quarter-car active suspension, achieving favorable practical results. Additionally, various robust control methods have been extensively researched, including finite-time H_∞ control [14], fuzzy sampled H_∞ control [15], and finite-frequency H_∞ control [16], all of which address system robustness. Reference [17] explored model predictive control for active suspension, achieving optimal control performance while satisfying constraints. However, most methods assume known model parameters or only consider vertical dynamics, with limited focus on nonlinear coupling models in complex and variable environments. Some researchers have investigated model-free control methods, such as neuro-fuzzy H₂/H_∞ control [18], adaptive optimal control based on neural networks [19], and approximation-free preset-time control [20]. Given that parameter calibration is time-consuming and parameters often vary in complex and dynamic environments, model-free control methods are more practical for real-world applications.

As a technical approach within AI large models, data-driven reinforcement learning (RL) methods have garnered considerable attention. References [21,22,23] proposed data-driven optimal control methods for active suspensions, utilizing driving data to train and optimize control policies. Data-driven RL algorithms are highly effective for nonlinear H_∞ differential games, nonlinear optimal control, static output-feedback, and event-triggered control [24,25,26]. While some researchers have applied data-driven RL methods to active suspensions, most have only considered vertical dynamics, neglecting the nonlinear coupling characteristics of front and rear wheels and pitch stability. In previous studies on H_∞ control [21,23,27,28], the system is typically assumed to be linear, and only simplified quarter-car active suspension models are considered, which deviates from real-world scenarios. Additionally, most methods require system model parameters, which inevitably increase control costs and complexity. This is the primary motivation for our research. In summary, research on nonlinear H_∞ differential games for active suspensions remains insufficient. This paper establishes a more realistic half-car active suspension model and transforms the nonlinear H_∞ control problem into a differential game between two players. An off-policy RL algorithm, which does not require model parameters, is designed to approximate the solution to the Hamilton–Jacobi–Isaacs (HJI) equation by collecting a portion of vehicle vibration data. Finally, the effectiveness of control optimization and implementation is verified through hardware-in-the-loop simulation, and the effective vibration reduction range of the control policy is analyzed in detail. The main contributions of this paper are as follows:

To enhance the vibration control performance of active suspension systems, a more realistic half-car suspension dynamics model is established, and a nonlinear H_∞ differential game method is proposed;
A neural network-based approach is utilized to derive an off-policy RL algorithm for solving the HJI equation, providing an optimal solution without requiring any model parameters;
A hardware-in-the-loop simulation platform is developed, validating the effectiveness and feasibility of the proposed method through numerical simulations.

The remainder of the paper is structured as follows. Section 2 introduces the nonlinear half-car active suspension control model. Section 3 describes the methodology in detail. Section 4 presents the numerical simulation results. Section 5 concludes with a summary of the findings.

2. Mathematical Model

To represent the nonlinear coupling relationship between the front and rear suspensions, a widely used half-car active suspension model is established, as shown in Figure 1, with the symbols defined in Table 1. According to the second kind of Lagrange’s equations, the dynamic equations for the half-car active suspension are as follows [9,10,29,30]:

(1) $\begin{matrix} M {\ddot{z}}_{c} = f_{1} + f_{2} \\ J \ddot{θ} = a f_{1} - b f_{2} \\ m_{t 1} {\ddot{η}}_{1} = - k_{t 1} (η_{1} - μ_{1}) - f_{1} \\ m_{t 2} {\ddot{η}}_{2} = - k_{t 2} (η_{2} - μ_{2}) - f_{2} \end{matrix}$

where

$\begin{array}{l} f_{1} = f_{s 1} + f_{d 1} + u_{1} \\ f_{2} = f_{s 2} + f_{d 2} + u_{2} \\ f_{s 1} = k_{s 1} (η_{1} - z_{1}) + k_{s n 1} {(η_{1} - z_{1})}^{2} + k_{s n 2} {(η_{1} - z_{1})}^{3} \\ f_{d 1} = b_{s 1} ({\dot{η}}_{2} - {\dot{z}}_{2}) + b_{s n 1} {({\dot{η}}_{2} - {\dot{z}}_{2})}^{2} \\ f_{s 2} = k_{s 2} (η_{2} - z_{2}) + k_{s n 3} {(η_{1} - z_{1})}^{2} + k_{s n 4} {(η_{1} - z_{1})}^{3} \\ f_{d 2} = b_{s 2} ({\dot{η}}_{2} - {\dot{z}}_{2}) + b_{s n 2} {({\dot{η}}_{2} - {\dot{z}}_{2})}^{2} \end{array}$

In Equation (1), the quadratic and cubic nonlinearities of the spring and damper are considered, along with the nonlinear coupling of the front and rear suspensions. For larger road excitations, these nonlinear characteristics should not be ignored.

Define

(2) $\begin{array}{l} z_{1} = z_{c} + a \sin θ \\ z_{2} = z_{c} - b \sin θ \end{array}$

Choose the state variables of the system as

(3) $x = {[\begin{matrix} x_{1} & x_{2} & x_{3} & x_{4} & x_{5} & x_{6} & x_{7} & x_{8} \end{matrix}]}^{T}$

where

$\begin{matrix} x_{1} = z_{c} + a \sin θ - η_{1}, x_{2} = z_{c} - b \sin θ - η_{2}, x_{3} = η_{1} - μ_{1}, x_{4} = η_{2} - μ_{2}, \\ x_{5} = {\dot{z}}_{c} + a \cos θ \cdot \dot{θ}, x_{6} = {\dot{z}}_{c} - b \cos θ \cdot \dot{θ}, x_{7} = {\dot{η}}_{1}, x_{8} = {\dot{η}}_{2} \end{matrix}$

Define

(4) $a_{1} = \frac{1}{M} + \frac{a^{2} \cos θ}{J}, a_{2} = \frac{1}{M} - \frac{a b \cos θ}{J}, a_{3} = \frac{1}{M} + \frac{b^{2} \cos θ}{J}$

combining Equations (1) to (4), the nonlinear state-space equations of the half-car active suspension are obtained:

(5) $\begin{matrix} \dot{x} (t) = A (x) + B (x) u + C ω \\ y (t) = x (t) \end{matrix}$

where

$\begin{matrix} A (x) = [\begin{matrix} x_{5} - x_{7} \\ x_{6} - x_{8} \\ x_{7} \\ x_{8} \\ - a_{1} k_{s 1} x_{1} - a_{2} k_{s 2} x_{2} - b_{s 1} a_{1} x_{5} - b_{s 2} a_{2} x_{6} + a_{1} b_{s 1} x_{7} + a_{2} b_{s 2} x_{8} - a \sin θ \cdot {\dot{θ}}^{2} + a_{1} f_{n 1} + a_{2} f_{n 2} \\ - a_{2} k_{s 1} x_{1} - a_{3} k_{s 2} x_{2} - b_{s 1} a_{2} x_{5} - b_{s 2} a_{3} x_{6} + a_{2} b_{s 1} x_{7} + a_{3} b_{s 2} x_{8} + b \sin θ \cdot {\dot{θ}}^{2} + a_{2} f_{n 1} + a_{3} f_{n 2} \\ - \frac{k_{t 1}}{m_{t 1}} x_{3} + \frac{k_{s 1}}{m_{t 1}} x_{1} - \frac{b_{s 1}}{m_{t 1}} x_{7} + \frac{b_{s 1}}{m_{t 1}} x_{5} - \frac{1}{m_{t 1}} f_{n 1} \\ - \frac{k_{t 2}}{m_{t 2}} x_{4} + \frac{k_{s 2}}{m_{t 2}} x_{2} - \frac{b_{s 2}}{m_{t 2}} x_{8} + \frac{b_{s 2}}{m_{t 2}} x_{6} - \frac{1}{m_{t 2}} f_{n 2} \end{matrix}], \\ B (x) = [\begin{matrix} 0_{4 \times 1} & 0_{4 \times 1} \\ a_{1} & a_{2} \\ a_{2} & a_{3} \\ - \frac{1}{m_{t 1}} & 0 \\ 0 & - \frac{1}{m_{t 2}} \end{matrix}], C = [\begin{matrix} 0 & 0 \\ 0 & 0 \\ - 1 & 0 \\ 0 & - 1 \\ 0_{4 \times 1} & 0_{4 \times 1} \end{matrix}], u = [\begin{matrix} u_{1} \\ u_{2} \end{matrix}], ω = [\begin{matrix} ω_{1} \\ ω_{2} \end{matrix}] = [\begin{matrix} {\dot{μ}}_{1} \\ {\dot{μ}}_{2} \end{matrix}] \\ \begin{array}{l} f_{n 1} = k_{s n 1} {(η_{1} - z_{1})}^{2} + k_{s n 2} {(η_{1} - z_{1})}^{3} + b_{s n 1} {({\dot{η}}_{2} - {\dot{z}}_{2})}^{2} \\ f_{n 2} = k_{s n 3} {(η_{1} - z_{1})}^{2} + k_{s n 4} {(η_{1} - z_{1})}^{3} + b_{s n 2} {({\dot{η}}_{2} - {\dot{z}}_{2})}^{2} \end{array} \end{matrix}$

In the Equation (5), the dimensions of all matrices are $A (x) \in ℜ^{8 \times 1}$ , $B (x) \in ℜ^{8 \times 2}$ , and $C (x) \in ℜ^{8 \times 2}$ .

Define the evaluation metrics for control performance as follows:

(6) $\begin{matrix} Λ_{1} = {[\begin{matrix} {\ddot{z}}_{c} & \ddot{θ} \end{matrix}]}^{T} \\ Λ_{2} = {[\begin{matrix} \frac{z_{1} - η_{1}}{z_{1 \max}} & \frac{z_{2} - η_{2}}{z_{2 \max}} & \frac{k_{t 1} (η_{1} - μ_{1})}{9.8 (\frac{b M}{a + b} + m_{t 1})} & \frac{k_{t 2} (η_{2} - μ_{2})}{9.8 (\frac{a M}{a + b} + m_{t 2})} & \frac{u_{1}}{u_{1 \max}} & \frac{u_{2}}{u_{2 \max}} \end{matrix}]}^{T} \end{matrix}$

Here, $Λ_{1}$ represents the variables that need to be minimized, namely vertical body acceleration and pitch acceleration. $Λ_{2}$ signifies that suspension dynamic travel, tire dynamic load, and control forces must all be below their physical limits, i.e., $Λ_{2} (i) < 1$ , $i = 1, 2, \dots 6$ .

For vehicles traveling at a certain speed, instantaneous impacts caused by road irregularities typically have a significant effect on ride comfort and stability. Therefore, the following road excitation model is considered [31]

(7) $μ (t) = \frac{ℏ}{2} (1 - \cos (\frac{2 π υ}{ƛ} t))$

where

ℏ

is the height of the road irregularity,

ƛ

represents the width of the road irregularity, and

υ

denotes the vehicle speed. The excitation frequency of the impact model is related to the vehicle speed, allowing the evaluation of different excitation frequencies by adjusting the vehicle speed.

3. Methodology

3.1. H_∞ Differential Game

In this section, we design a nonlinear H_∞ controller based on off-policy RL for the established nonlinear active suspension state-space Equation (5). The road and the controller are considered as two independent players; thus, the design of the H_∞ controller can be transformed into an H_∞ differential game.

First, define the cost function as follows:

(8) $\begin{array}{l} J (x (0), u, ω) \\ = \int_{0}^{\infty} (x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2}) d t \end{array}$

where

Q

and

R

are positive definite matrices, and

γ

is a positive constant.

Q

and

R

are also the weight matrices of the cost function and are closely related to the performance metrics (6). Under zero initial conditions, if

J (x (0), u, ω) \leq 0

, then the system has an H_∞ performance, and

γ

is called the disturbance attenuation level.

Define the value function of the H_∞ differential game as

(9) $\begin{array}{l} V (x (t)) \\ = \int_{t}^{\infty} (x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2}) d τ \end{array}$

Combining Equations (8) and (9), the H_∞ differential game can be expressed as

(10) $V^{*} (x (t)) = \min_{u} \max_{d} J (x (0), u, ω)$

where

V^{*} (x (t))

is the optimal value function.

In Equation (10), the control policy $u$ needs to minimize the cost function, while the disturbance policy $ω$ needs to maximize the cost function. Therefore, there exists the following Nash equilibrium point:

(11) $\begin{array}{l} u^{*} = \underset{u}{\arg \min} J (x (0), u, ω) \\ ω^{*} = \underset{ω}{\arg \max} J (x (0), u, ω) \end{array}$

Equation (11) is also referred to as the optimal game policy.

Definition 1.

If inequality (12) holds, the policy pair $\{u^{*}, ω^{*}\}$ is a Nash equilibrium point for the H_∞ differential game.

(12) $J (x (0), u^{*}, ω) \leq J (x (0), u^{*}, ω^{*}) \leq J (x (0), u, ω^{*})$

To find the Nash equilibrium point of the H_∞ differential game, first establish the Hamilton–Jacobi–Isaacs (HJI) equation for the active suspension. By differentiating the value function (9), we get

(13) $\begin{array}{l} x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2} \\ + \nabla V^{T} (A (x) + B (x) u + C ω) = 0 \end{array}$

where

\nabla V = \partial V / \partial x

. Equation (13) is called the Bellman equation, and solving this partial differential equation yields the value function

V (x)

Define the Hamiltonian function of Equation (13) as

(14) $\begin{array}{l} H (x, \nabla V, u, ω) \\ = x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2} \\ + \nabla V^{T} (A (x) + B (x) u + C ω) \end{array}$

Based on the static condition of (14),

\frac{\partial H}{\partial u} = 0

and

\frac{\partial H}{\partial ω} = 0

, the Nash equilibrium point of the H_∞ differential game can be obtained as [1–1]

(15) $\begin{array}{l} u^{*} = u^{*} (V (x)) = - \frac{1}{2} R^{- 1} B {(x)}^{T} \nabla V \\ ω^{*} = ω^{*} (V (x)) = \frac{1}{2 γ^{2}} C^{T} \nabla V \end{array}$

Equation (15) is the saddle point of the Hamiltonian function, and substituting it into Equation (13) yields the HJI equation

(16) $\begin{array}{l} x^{T} Q x + \frac{1}{4} \nabla V^{T} B (x) R^{- 1} B {(x)}^{T} \nabla V - \frac{1}{4 γ^{2}} \nabla V^{T} C C^{T} \nabla V \\ + \nabla V^{T} (A (x) - \frac{1}{2} B (x) R^{- 1} B {(x)}^{T} \nabla V + \frac{1}{2 γ^{2}} C C^{T} \nabla V) = 0 \end{array}$

The analytical solution of (16) is the optimal value function

V^{*} (x (t))

, and

V^{*} (x (t))

satisfies the positive semi-definite condition.

From the above analysis, it is evident that solving the HJI equation can yield the optimal value function and the optimal game policies. Substituting the analytical solution back into Equation (14) gives $H (x, \nabla V^{*}, u^{*}, ω^{*}) = 0$ . However, directly solving the HJI equation is extremely difficult. This paper will design a method based on off-policy RL for approximate solutions.

Theorem 1.

If a positive semi-definite solution $V^{*} (x (t))$ exists for the HJI Equation (16), then $\{u^{*}, ω^{*}\}$ satisfies the following conditions: (1) For all $ω \in L_{2} [0, \infty]$ , the half-car active suspension closed-loop system (5) under zero initial conditions meets H_∞ performance and is asymptotically stable in the absence of disturbances; (2) $\{u^{*}, ω^{*}\}$ is the Nash equilibrium of the H_∞ differential game.

Proof.

Since $V^{*} (x (t)) \geq 0$ is a solution of the HJI equation and $V^{*} (x) = 0$ with $x = 0$ , we choose $V (x (t))$ as the Lyapunov function. Differentiating it, we obtain

(17) $\begin{array}{l} \dot{V} (x) \\ = \nabla V^{T} (A (x) + B (x) u + C ω) \\ = - (x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2}) \end{array}$

Clearly, combining Equation (17) with Equation (15), we obtain

(18) $\begin{array}{l} \nabla V^{T} (A (x) + B (x) u + C ω) \\ + (x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2}) \\ = H (x, \nabla V, u, ω) \\ = H (x, \nabla V, u^{*}, ω^{*}) + {(u - u^{*})}^{T} R (u - u^{*}) \\ - γ^{2} {(ω - ω^{*})}^{T} (ω - ω^{*}) \end{array}$

If $u = u^{*}$ and $V = V^{*}$ , then Equation (18) satisfies the condition

(19) $H (x, \nabla V^{*}, u^{*}, ω) = - γ^{2} {(ω - ω^{*})}^{T} (ω - ω^{*}) \leq 0$

Further integrating Equation (17) gives

(20) $\begin{array}{l} V^{*} (x (T)) - V^{*} (x (0)) \\ \leq - \int_{0}^{T} (x^{T} Q x + {u^{*}}^{T} R u^{*} - γ^{2} {‖ω‖}^{2}) d τ \end{array}$

Since

V^{*} (x (0)) = 0

and

V^{*} (x (T)) \geq 0

, the closed-loop system satisfies H_∞ performance under zero initial conditions as

(21) $\int_{0}^{T} (x^{T} Q x + {u^{*}}^{T} R u^{*}) d τ \leq \int_{0}^{T} γ^{2} {‖ω‖}^{2} d τ, \forall ω \in L_{2} [0, \infty]$

Note that when

ω = 0

, Equation (17) satisfies

{\dot{V}}^{*} (x) \leq 0

, indicating asymptotic stability of the closed-loop system in the absence of disturbances.

To prove that $\{u^{*}, ω^{*}\}$ is a Nash equilibrium point of the H_∞ differential game, rewrite the cost function (8) as

(22) $\begin{array}{l} J (x (0), u, ω) \\ = \int_{0}^{\infty} (x^{T} Q x + u^{T} R u - γ^{2} {‖ω‖}^{2}) d t \\ + \int_{0}^{\infty} \dot{V} d t - V (x (\infty)) + V (x (0)) \\ = \int_{0}^{\infty} H (x, \nabla V, u, ω) d t - V (x (\infty)) + V (x (0)) \end{array}$

Considering

t \to \infty

and

V (x (\infty)) \to 0

, we have

(23) $\begin{array}{l} J (x (0), u, ω) \\ = \int_{0}^{\infty} H (x, \nabla V, u^{*}, ω^{*}) d t + V (x (0)) \\ + \int_{0}^{\infty} [{(u - u^{*})}^{T} R (u - u^{*}) - γ^{2} {(ω - ω^{*})}^{T} (ω - ω^{*})] d t \end{array}$

Let

V (x) = V^{*} (x)

, and from (23) we obtain inequality (12). According to Definition 1,

\{u^{*}, ω^{*}\}

is indeed a Nash equilibrium point of the H_∞ differential game. Thus, the proof is complete. □

3.2. Off-Policy RL Algorithm

To solve the HJI Equation (16) numerically, this section designs an off-policy RL algorithm. Compared to on-policy RL algorithms, off-policy methods do not require real-time updates, making them safer as they do not affect the actual physical system. Additionally, this method does not require any model parameter information, which can reduce design costs. The algorithm structure is illustrated in Figure 2. In Figure 2, the state data of the active suspension during the vehicle’s operation is first collected as input. The actor and critic neural networks are then employed to learn the solution to the integral Bellman equation online, with the neural network (NN) weights being continuously updated until convergence is achieved. A portion of the active suspension vibration data is extracted and used to update the value function, control policy, and disturbance policy in real time within an actor–critic RL framework.

Firstly, rewrite the nonlinear state-space equations of the active suspension as

(24) $\begin{array}{l} \dot{x} = A (x) + B (x) u_{k} + C ω_{k} \\ + B (x) (u - u_{k}) + C (ω - ω_{k}) \end{array}$

where

u

and

ω

are arbitrary but reasonable control inputs and road disturbances, and

u_{k}

and

ω_{k}

are the control and disturbance policies to be updated.

Combining (15) and (24), differentiating the value function (9) yields

(25) $\begin{array}{l} {\dot{V}}_{k} (x) \\ = \nabla {V_{k}}^{T} (A (x) + B (x) u_{k} + C ω_{k}) \\ + \nabla {V_{k}}^{T} (B (x) (u - u_{k}) + C (ω - ω_{k})) \\ = - (x^{T} Q x + {u_{k}}^{T} R u_{k} - γ^{2} {‖ω_{k}‖}^{2}) \\ - 2 {u_{k + 1}}^{T} R (u - u_{k}) + \nabla {V_{k}}^{T} C (ω - ω_{k}) \end{array}$

Integrating (25) yields

(26) $\begin{array}{l} {V_{k}|}_{x (t)}^{x (t + T)} = \\ - \int_{t}^{t + T} (x^{T} Q x + {u_{k}}^{T} R u_{k} - γ^{2} {‖ω_{k}‖}^{2}) d τ \\ - \int_{t}^{t + T} 2 {u_{k + 1}}^{T} R (u - u_{k}) d τ \\ + \int_{t}^{t + T} \nabla {V_{k}}^{T} C (ω - ω_{k}) d τ \end{array}$

In Equation (26), to simultaneously solve for $V_{k}$ and $u_{k + 1}$ , define the actor NN and critic NN as follows:

(27) $\begin{array}{l} u_{k + 1} (x) = Θ_{k + 1}^{T} σ (x) \\ V_{k} (x) = W_{k}^{T} ϕ (x) \end{array}$

where

σ (x) \in ℜ^{α}

and

ϕ (x) \in ℜ^{β}

are the basis functions of the NNs, and

Θ_{k + 1}^{T}

and

W_{k}^{T}

are appropriate weight coefficients.

Substituting (27) into (26) gives

(28) $\begin{array}{l} {W_{k}^{T} ϕ (x)|}_{x (t)}^{x (t + T)} = \\ - \int_{t}^{t + T} (x^{T} Q x + {u_{k}}^{T} R u_{k} - γ^{2} {‖ω_{k}‖}^{2}) d τ \\ - \int_{t}^{t + T} 2 σ {(x)}^{T} Θ_{k + 1} R (u - u_{k}) d τ \\ + \int_{t}^{t + T} W_{k}^{T} [\begin{matrix} \frac{\partial ϕ (x)}{\partial x_{1}} & \frac{\partial ϕ (x)}{\partial x_{2}} & \dots & \frac{\partial ϕ (x)}{\partial x_{8}} \end{matrix}] C (ω - ω_{k}) d τ \end{array}$

From Equation (28), it can be seen that the NN weight coefficients can be solved offline without the need for model parameters. To ensure the uniqueness of solution in Equation (28), define

(29) ${ξ (t) = ϕ {(x)}^{T}|}_{x (t)}^{x (t + T)} - \int_{t}^{t + T} {(ω - ω_{k})}^{T} C^{T} {[\begin{matrix} \frac{\partial ϕ (x)}{\partial x_{1}} & \frac{\partial ϕ (x)}{\partial x_{2}} & \dots & \frac{\partial ϕ (x)}{\partial x_{8}} \end{matrix}]}^{T} d τ$

(30) $ζ (t) = \int_{t}^{t + T} 2 {(u - u_{k})}^{T} R^{T} \otimes σ {(x)}^{T} d τ$

(31) $ς (t) = - \int_{t}^{t + T} (x^{T} Q x + {u_{k}}^{T} R u_{k} - γ^{2} {‖ω_{k}‖}^{2}) d τ$

where

$\begin{array}{l} t = t_{k, 1}, t_{k, 2}, \dots t_{k, 𝓁} \\ 0 \leq t_{k, i} + T \leq t_{k, i + 1} \\ t_{k, i} + T \leq t_{k + 1, 1}, k = 0, 1, \dots \infty, i = 1, 2, \dots 𝓁 \end{array}$

Combining (29)–(31), Equation (28) can be rewritten compactly as the following equation:

(32) $Φ_{k} \cdot Ξ_{k} = Ω_{k}$

where

$Φ_{k} = [\begin{matrix} ξ (t_{k, 1}) & ζ (t_{k, 1}) \\ ξ (t_{k, 2}) & ζ (t_{k, 2}) \\ ⋮ & ⋮ \\ ξ (t_{k, 𝓁}) & ζ (t_{k, 𝓁}) \end{matrix}], Ξ_{k} = [\begin{matrix} v e c (W_{k}) \\ v e c (Θ_{k + 1}) \end{matrix}], Ω_{k} = [\begin{matrix} ς (t_{k, 1}) \\ ς (t_{k, 2}) \\ ⋮ \\ ς (t_{k, 𝓁}) \end{matrix}]$

To ensure Equation (32) has a unique solution, it is necessary to satisfy

(33) $r a n k (Φ_{k}) \geq α + β$

where

𝓁 \geq α + β

is the dimension of the data, and

α

and

β

are the dimensions of the basis functions of the actor NN and critic NN, respectively. Thus, the off-policy RL algorithm (Algorithm 1) can be summarized as follows:

Algorithm 1. Off-policy RL algorithm for nonlinear active suspension H_∞ differential game.

Step 1: set

k = 1

, initialize parameters

Θ_{1}

and

W_{0}

, apply any reachable

u

and

ω

, and collect data

x_{1} \dots x_{𝓁}

;

Step 2: Solve the Equation (32) to obtain NN weight coefficients

Θ_{k + 1}^{T}

and

W_{k}^{T}

if the Equation (33) holds;

Step 3: Update the actor

u_{k + 1}

ω_{k + 1}

and critic

V_{k}

using (15) and (27);

Step 4: Set

k = k + 1

, repeat steps 2–3 until

‖Ξ_{k} - Ξ_{k - 1}‖ \leq ε

(

ε

is a small positive number).

4. Numerical Simulation

4.1. Implementation of Algorithm 1

To validate the feasibility of the off-policy RL algorithm for the nonlinear active suspension differential game, this section applies Algorithm 1 for policy optimization. The simulation setup uses a hardware-in-the-loop platform, as depicted in Figure 3. The nonlinear active suspension model runs on the MicroAutoBox II, while Algorithm 1 operates on the Speedgoat Controller. The MicroAutoBox II was purchased from the German company dSPACE. The Speedgoat Controller was purchased from Speedgoat in Bern, Switzerland. The MicroAutoBox II is equipped with an IBM PPC 750GL processor (900 MHz), and the Speedgoat Controller features an Intel Celeron 2 GHz CPU with four cores. Real-time data are displayed on the host computer, and the Simulink program runs concurrently.

During the simulation process, the parameters of the vehicle active suspension are listed in Table 2. The simulation parameters for Algorithm 1 are set as follows: $T = 0.01 s$ , $𝓁 = 132$ , $ε = 1$ , $Q = diag ([10, 10, 10, 10, 1000, 1000, 10, 10])$ , $R = 0.00001 \times eye (2)$ , $γ = 35$ . All other parameters are set to zero. The NN basis functions are defined as

$\begin{array}{l} ϕ (x) = {[\begin{array}{l} x_{1}^{2}, 2 x_{1} x_{2}, 2 x_{1} x_{3}, 2 x_{1} x_{4}, 2 x_{1} x_{5}, 2 x_{1} x_{6}, 2 x_{1} x_{7}, 2 x_{1} x_{8}, x_{2}^{2}, 2 x_{2} x_{3}, 2 x_{2} x_{4}, 2 x_{2} x_{5}, 2 x_{2} x_{6}, 2 x_{2} x_{7}, 2 x_{2} x_{8}, \\ x_{3}^{2}, 2 x_{3} x_{4}, 2 x_{3} x_{5}, 2 x_{3} x_{6}, 2 x_{3} x_{7}, 2 x_{3} x_{8}, x_{4}^{2}, 2 x_{4} x_{5}, 2 x_{4} x_{6}, 2 x_{4} x_{7}, 2 x_{4} x_{8}, x_{5}^{2}, 2 x_{5} x_{6}, 2 x_{5} x_{7}, 2 x_{5} x_{8}, \\ x_{6}^{2}, 2 x_{6} x_{7}, 2 x_{6} x_{8}, x_{7}^{2}, 2 x_{7} x_{8}, x_{8}^{2}, x_{1}^{3}, x_{2}^{3}, x_{3}^{3}, x_{4}^{3}, x_{5}^{3}, x_{6}^{3}, x_{7}^{3}, x_{8}^{3} \end{array}]}^{T} \\ σ (x) = {[\begin{array}{l} x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}, x_{7}, x_{8}, x_{1}^{2}, 2 x_{1} x_{2}, 2 x_{1} x_{3}, 2 x_{1} x_{4}, 2 x_{1} x_{5}, 2 x_{1} x_{6}, 2 x_{1} x_{7}, 2 x_{1} x_{8}, x_{2}^{2}, \\ 2 x_{2} x_{3}, 2 x_{2} x_{4}, 2 x_{2} x_{5}, 2 x_{2} x_{6}, 2 x_{2} x_{7}, 2 x_{2} x_{8}, x_{3}^{2}, 2 x_{3} x_{4}, 2 x_{3} x_{5}, 2 x_{3} x_{6}, 2 x_{3} x_{7}, 2 x_{3} x_{8}, x_{4}^{2}, \\ 2 x_{4} x_{5}, 2 x_{4} x_{6}, 2 x_{4} x_{7}, 2 x_{4} x_{8}, x_{5}^{2}, 2 x_{5} x_{6}, 2 x_{5} x_{7}, 2 x_{5} x_{8}, x_{6}^{2}, 2 x_{6} x_{7}, 2 x_{6} x_{8}, x_{7}^{2}, 2 x_{7} x_{8}, x_{8}^{2} \end{array}]}^{T} \end{array}$

The simulation results are shown in Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. Figure 4 displays the training results of the critic NN weight coefficients, Figure 5 shows the training results of the actor NN weight coefficients, and Figure 6 illustrates the number of policy updates. It can be observed from the figures that the convergence condition is met after 10 updates. Given the sampling time of 0.01 s, the total update time amounts to 13.2 s. Notably, the control and disturbance inputs required for data collection are random white noise signals, which do not necessitate a specific functional form, thereby offering greater applicability. Figure 7 and Figure 8 illustrate the control and disturbance inputs during the data collection process, demonstrating that small inputs are sufficient to meet the requirements. This method ensures the safety of real vehicle data collection.

4.2. Vibration Control Performance Analysis

To further verify the effectiveness of the trained control policies, simulations are conducted using a road impact model (7). The vehicle speed is set to $υ = 0.36 ~ 108 km / h$ , $L = 5 m$ , corresponding to a frequency range from 0.1 Hz to 8 Hz, covering typical road conditions. The simulation results are depicted in Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13. The classic linear H_∞ algorithm [23,27,28] is compared.

Figure 9 shows the road excitation profile, Figure 10 displays the pitch acceleration of the vehicle, Figure 11 illustrates the vertical acceleration of the vehicle body, and Figure 12 presents the response of the suspension performance constraints. From Figure 10 and Figure 11, it can be observed that the trained control policies effectively attenuate vibrations in the frequency range of 0.1–6 Hz, corresponding to vehicle speeds below $80 km / h$ . For intelligent driving vehicles equipped with visual sensors, vehicle speed can be controlled and planned according to road conditions [1]. Figure 10 and Figure 11 show that, in terms of ride comfort, the off-policy RL solution outperforms traditional linear H_∞ algorithm and passive suspension system. This advantage arises from the method’s ability to account for system nonlinearities, resulting in more effective control performance. Notably, this method requires no model information. Figure 12 indicates that the suspension travel index of the linear H_∞ algorithm exceeds the maximum limit, whereas the proposed method consistently remains within a reasonable range. Additionally, Figure 12 demonstrates that all constraint indicators of the proposed method are less than 1, thereby satisfying the physical constraints during actual driving conditions.

To quantitatively evaluate the simulation results, Table 3 presents the root mean square (RMS) values for $Λ_{1}$ and $u$ . As shown in Table 3, the off-policy RL solution achieves smaller RMS values, whereas traditional linear H_∞ algorithms, which do not account for nonlinearities, exhibit deviations in control performance.

Table 4 provides the peak values for $Λ_{2}$ . As evident from Table 4, the off-policy RL solution meets the constraint condition, i.e., $Λ_{2} (i) < 1$ . In contrast, the linear H_∞ algorithm’s suspension travel index exceeds the constraint value, resulting in a loss of constraint effectiveness. This occurs because the linear H_∞ method does not account for system nonlinearities, leading to certain indicators being uncontrolled.

To further illustrate the H_∞ performance of the closed-loop system, define the function

(34) $r (t) = \frac{\int_{0}^{t} (x^{T} Q x + u^{T} R u) d τ}{\int_{0}^{t} {‖ω‖}^{2} d τ}$

Figure 13 shows the response of

r (t)

, indicating that its value remains consistently below the maximum value

γ^{2}

, thereby satisfying the H_∞ performance requirement.

5. Conclusions

This paper investigates the H_∞ differential game problem for nonlinear active suspensions, accounting for both geometric nonlinearity and higher-order nonlinearity of the damping elements in the design scheme. An off-policy RL method utilizing an actor–critic structure is employed to approximate the solution to the HJI equation without requiring any model parameters. The simulation results demonstrate the method’s effectiveness. By extracting a portion of the vehicle driving data, the policies can be optimized and converge to the optimal solution after several iterations. Frequency sweep excitation tests reveal that the control policy is effective within the low-frequency range of 0–6 Hz, significantly reducing body vibrations at the first mode while maintaining the physical constraints of the vehicle suspension. Compared to passive suspension and traditional methods, the proposed method reduces vertical acceleration by 20% and 10%, respectively, and pitch acceleration by 10% and 5%. Additionally, the peak control force of the proposed method is 10% smaller than that of traditional methods. All other metrics of the proposed method are below 1, satisfying the time-domain constraints. Future research will explore multi-player collaborative game mechanisms, control saturation constraints, and finite-frequency constraints to further enhance the nonlinear vibration control performance of active suspensions.

Author Contributions

Methodology, G.W.; writing—original draft preparation, J.D.; writing—review and editing, T.Z.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data are contained within the article. The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Half-car suspension model.

Figure 2. Actor–critic structure for off-policy RL.

Figure 3. Hardware-in-the-loop simulation.

Figure 4. Critic NN weight coefficients.

Figure 5. Actor NN weight coefficients.

Figure 6. Number of updates.

Figure 7. Control actions for data collection.

Figure 8. Disturbance actions for data collection.

Figure 9. Road excitation profile for test.

Figure 10. The evolution of the pitch acceleration.

Figure 11. The evolution of the vertical acceleration.

Figure 12. The evolution of the suspension performance [Forumla omitted. See PDF.].

Figure 13. Evolution of [Forumla omitted. See PDF.].

Table 1

Symbol definitions.

Symbol	Meaning
$M$	Sprung mass
$J$	Pitch moment of inertia
$m_{t 1}$ , $m_{t 2}$	Unsprung mass
$a$	Distance from front axle to center of mass
$b$	Distance from rear axle to center of mass
$k_{t 1}$ , $k_{t 2}$	Tire stiffness
$k_{s 1}$ , $k_{s 2}$	Suspension spring linear stiffness
$k_{s n 1}$ , $k_{s n 2}$ , $k_{s n 3}$ , $k_{s n 4}$	Suspension spring nonlinear stiffness
$b_{s 1}$ , $b_{s 2}$	Suspension hydraulic linear damping
$b_{s n 1}$ , $b_{s n 2}$	Suspension hydraulic nonlinear damping
$u_{1}$ , $u_{2}$	Active control force
$z_{c}$	Vertical displacement of the center of mass
$θ$	Pitch angle
$η_{1}$ , $η_{2}$	Vertical displacement of unsprung mass
$μ_{1}$ , $μ_{2}$	Road disturbance

Table 2

Model parameters.

Parameter	Value	Parameter	Value
$M (kg)$	500	$a (m)$	1.25
$J (kg \cdot m^{2})$	910	$b (m)$	1.45
$m_{t 1} (kg)$	30	$k_{s 1} (N / m)$	10,000
$m_{t 2} (kg)$	40	$k_{s 2} (N / m)$	10,000
$k_{s n 1} (N / m^{2})$	1000	$k_{s n 3} (N / m^{2})$	1000
$k_{s n 2} (N / m^{3})$	20,000	$k_{s n 4} (N / m^{3})$	20,000
$b_{s 1} (Ns / m)$	1000	$b_{s 2} (Ns / m)$	1000
$k_{t 1} (N / m)$	100,000	$u_{1 \max} (N)$	2000
$k_{t 2} (N / m)$	100,000	$u_{2 \max} (N)$	2000
$z_{1 \max} (m)$	0.1	$b_{s n 1} ({Ns}^{2} / m^{2})$	200
$z_{2 \max} (m)$	0.1	$b_{s n 2} ({Ns}^{2} / m^{2})$	200

Table 3

RMS evaluation of simulation results.

Method	${\ddot{z}}_{c}$ (m/s²)	$\ddot{θ}$ (rad/s²)	$u_{1}$ (N)	$u_{2}$ (N)
Passive suspension	1.371	1.081	——	——
Linear H_∞ algorithm	1.228	1.004	253.6	255.1
Off-Policy RL solution	1.131	0.964	247.2	249.2

Table 4

Peak evaluation of simulation results.

Method	$Λ_{2}$ (1)	$Λ_{2}$ (2)	$Λ_{2}$ (3)	$Λ_{2}$ (4)	$Λ_{2}$ (5)	$Λ_{2}$ (6)
Passive suspension	0.9301	0.8131	0.5832	0.9135	——	——
Linear H_∞ algorithm	1.076	1.082	0.6153	0.899	0.5484	0.5513
Off-Policy RL solution	0.9725	0.9735	0.6514	0.9894	0.4921	0.4928

References

1. Yu, M.; Evangelou, S.A.; Dini, D. Advances in Active Suspension Systems for Road Vehicles. Engineering; 2023; 33, pp. 160-177. [DOI: https://dx.doi.org/10.1016/j.eng.2023.06.014]

2. Pan, H.; Zhang, C.; Sun, W. Fault-tolerant multiplayer tracking control for autonomous vehicle via model-free adaptive dynamic programming. IEEE Trans. Reliab.; 2022; 72, pp. 1395-1406. [DOI: https://dx.doi.org/10.1109/TR.2022.3208467]

3. Li, Q.; Chen, Z.; Song, H.; Dong, Y. Model predictive control for speed-dependent active suspension system with road preview information. Sensors; 2024; 24, 2255. [DOI: https://dx.doi.org/10.3390/s24072255]

4. Zhang, J.; Yang, Y.; Hu, C. An adaptive controller design for nonlinear active air suspension systems with uncertainties. Mathematics; 2023; 11, 2626. [DOI: https://dx.doi.org/10.3390/math11122626]

5. Su, X.; Yang, X.; Shi, P.; Wu, L. Fuzzy control of nonlinear electromagnetic suspension systems. Mechatronics; 2014; 24, pp. 328-335. [DOI: https://dx.doi.org/10.1016/j.mechatronics.2013.08.002]

6. Humaidi, A.J.; Sadiq, M.E.; Abdulkareem, A.I.; Ibraheem, I.K.; Azar, A.T. Adaptive backstepping sliding mode control design for vibration suppression of earth-quaked building supported by magneto-rheological damper. J. Low Freq. Noise Vib. Act. Control; 2022; 41, pp. 768-783. [DOI: https://dx.doi.org/10.1177/14613484211064659]

7. Liu, L.; Sun, M.; Wang, R.; Zhu, C.; Zeng, Q. Finite-Time Neural Control of Stochastic Active Electromagnetic Suspension System with Actuator Failure. IEEE Trans. Intell. Veh.; 2024; pp. 1-12. [DOI: https://dx.doi.org/10.1109/TIV.2024.3386693]

8. Kim, J.; Yim, S. Design of Static Output Feedback Suspension Controllers for Ride Comfort Improvement and Motion Sickness Reduction. Processes; 2024; 12, 968. [DOI: https://dx.doi.org/10.3390/pr12050968]

9. Li, P.; Lam, J.; Cheung, K.C. Multi-objective control for active vehicle suspension with wheelbase preview. J. Sound Vib.; 2014; 333, pp. 5269-5282. [DOI: https://dx.doi.org/10.1016/j.jsv.2014.06.017]

10. Pang, H.; Wang, Y.; Zhang, X.; Xu, Z. Robust state-feedback control design for active suspension system with time-varying input delay and wheelbase preview information. J. Frankl. Inst.; 2019; 356, pp. 1899-1923. [DOI: https://dx.doi.org/10.1016/j.jfranklin.2019.01.011]

11. Liu, Z.; Si, Y.; Sun, W. Ride comfort oriented integrated design of preview active suspension control and longitudinal velocity planning. Mech. Syst. Signal Process.; 2024; 208, 110992. [DOI: https://dx.doi.org/10.1016/j.ymssp.2023.110992]

12. Rodriguez-Guevara, D.; Favela-Contreras, A.; Beltran-Carbajal, F.; Sotelo, C.; Sotelo, D. A Differential Flatness-Based Model Predictive Control Strategy for a Nonlinear Quarter-Car Active Suspension System. Mathematics; 2023; 11, 1067. [DOI: https://dx.doi.org/10.3390/math11041067]

13. Guo, X.; Zhang, J.; Sun, W. Robust saturated fault-tolerant control for active suspension system via partial measurement information. Mech. Syst. Signal Process.; 2023; 191, 110116. [DOI: https://dx.doi.org/10.1016/j.ymssp.2023.110116]

14. Xue, W.; Li, K.; Chen, Q.; Liu, G. Mixed FTS/H_∞ control of vehicle active suspensions with shock road disturbance. Veh. Syst. Dyn.; 2019; 57, pp. 841-854. [DOI: https://dx.doi.org/10.1080/00423114.2018.1490023]

15. Li, H.; Jing, X.; Lam, H.K.; Shi, P. Fuzzy sampled-data control for uncertain vehicle suspension systems. IEEE Trans. Cybern.; 2013; 44, pp. 1111-1126.

16. Sun, W.; Gao, H.; Kaynak, O. Finite frequency H_∞ control for vehicle active suspension systems. IEEE Trans. Control Syst. Technol.; 2010; 19, pp. 416-422. [DOI: https://dx.doi.org/10.1109/TCST.2010.2042296]

17. Dogruer, C.U. Constrained model predictive control of a vehicle suspension using Laguerre polynomials. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci.; 2020; 234, pp. 1253-1268. [DOI: https://dx.doi.org/10.1177/0954406219889078]

18. Esmaeili, J.S.; Akbari, A.; Farnam, A.; Azad, N.L.; Crevecoeur, G. Adaptive Neuro-Fuzzy Control of Active Vehicle Suspension Based on H₂ and H_∞ Synthesis. Machines; 2023; 11, 1022. [DOI: https://dx.doi.org/10.3390/machines11111022]

19. Han, X.; Zhao, X.; Karimi, H.R.; Wang, D.; Zong, G. Adaptive optimal control for unknown constrained nonlinear systems with a novel quasi-model network. IEEE Trans. Neural Netw. Learn. Syst.; 2021; 33, pp. 2867-2878. [DOI: https://dx.doi.org/10.1109/TNNLS.2020.3046614]

20. Huang, T.; Wang, J.; Pan, H. Approximation-free prespecified time bionic reliable control for vehicle suspension. IEEE Trans. Autom. Sci. Eng.; 2023; pp. 1-11. [DOI: https://dx.doi.org/10.1109/TASE.2023.3310335]

21. Qin, Z.C.; Xin, Y. Data-driven H_∞ vibration control design and verification for an active suspension system with unknown pseudo-drift dynamics. Commun. Nonlinear Sci. Numer. Simul.; 2023; 125, 107397. [DOI: https://dx.doi.org/10.1016/j.cnsns.2023.107397]

22. Mazouchi, M.; Yang, Y.; Modares, H. Data-driven dynamic multiobjective optimal control: An aspiration-satisfying reinforcement learning approach. IEEE Trans. Neural Netw. Learn. Syst.; 2021; 33, pp. 6183-6193. [DOI: https://dx.doi.org/10.1109/TNNLS.2021.3072571]

23. Wang, G.; Li, K.; Liu, S.; Jing, H. Model-Free H_∞ Output Feedback Control of Road Sensing in Vehicle Active Suspension Based on Reinforcement Learning. J. Dyn. Syst. Meas. Control; 2023; 145, 061003. [DOI: https://dx.doi.org/10.1115/1.4062342]

24. Wang, A.; Liao, X.; Dong, T. Event-driven optimal control for uncertain nonlinear systems with external disturbance via adaptive dynamic programming. Neurocomputing; 2018; 281, pp. 188-195. [DOI: https://dx.doi.org/10.1016/j.neucom.2017.12.010]

25. Wu, H.N.; Luo, B. Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear H_∞ Control. IEEE Trans. Neural Netw. Learn. Syst.; 2012; 23, pp. 1884-1895. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24808144]

26. Luo, B.; Wu, H.N.; Huang, T. Off-policy reinforcement learning for H_∞ control design. IEEE Trans. Cybern.; 2014; 45, pp. 65-76. [DOI: https://dx.doi.org/10.1109/TCYB.2014.2319577]

27. Kiumarsi, B.; Lewis, F.L.; Jiang, Z.P. H_∞ control of linear discrete-time systems: Off-policy reinforcement learning. Automatica; 2017; 78, pp. 144-152. [DOI: https://dx.doi.org/10.1016/j.automatica.2016.12.009]

28. Wu, H.N.; Luo, B. Simultaneous policy update algorithms for learning the solution of linear continuous-time H_∞ state feedback control. Inf. Sci.; 2013; 222, pp. 472-485. [DOI: https://dx.doi.org/10.1016/j.ins.2012.08.012]

29. Valadbeigi, A.P.; Sedigh, A.K.; Lewis, F.L. H_∞ Static Output-Feedback Control Design for Discrete-Time Systems Using Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst.; 2019; 31, pp. 396-406. [DOI: https://dx.doi.org/10.1109/TNNLS.2019.2901889]

30. Sun, W.; Zhao, Z.; Gao, H. Saturated adaptive robust control for active suspension systems. IEEE Trans. Ind. Electron.; 2012; 60, pp. 3889-3896. [DOI: https://dx.doi.org/10.1109/TIE.2012.2206340]

31. Li, W.; Du, H.; Feng, Z.; Ning, D.; Li, W.; Sun, S.; Tu, L.; Wei, J. Singular system-based approach for active vibration control of vehicle seat suspension. J. Dyn. Syst. Meas. Control; 2020; 142, 091003. [DOI: https://dx.doi.org/10.1115/1.4047011]

Word count: 4347

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

This paper investigates a parameter-free H_∞ differential game approach for nonlinear active vehicle suspensions. The study accounts for the geometric nonlinearity of the half-car active suspension and the cubic nonlinearity of the damping elements. The nonlinear H_∞ control problem is reformulated as a zero-sum game between two players, leading to the establishment of the Hamilton–Jacobi–Isaacs (HJI) equation with a Nash equilibrium solution. To minimize reliance on model parameters during the solution process, an actor–critic framework employing neural networks is utilized to approximate the control policy and value function. An off-policy reinforcement learning method is implemented to iteratively solve the HJI equation. In this approach, the disturbance policy is derived directly from the value function, requiring only a limited amount of driving data to approximate the HJI equation’s solution. The primary innovation of this method lies in its capacity to effectively address system nonlinearities without the need for model parameters, making it particularly advantageous for practical engineering applications. Numerical simulations confirm the method’s effectiveness and applicable range. The off-policy reinforcement learning approach ensures the safety of the design process. For low-frequency road disturbances, the designed H_∞ control policy enhances both ride comfort and stability.

Details

Title

H_∞ Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning

Author

Wang, Gang

; Deng, Jiafan; Zhou, Tingting; Liu, Suqi

First page

2665

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math12172665

ProQuest document ID

3104059182

H_∞ Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning

Jump to:

Full text

1. Introduction

2. Mathematical Model

3. Methodology

3.1. H_∞ Differential Game

3.2. Off-Policy RL Algorithm

4. Numerical Simulation

4.1. Implementation of Algorithm 1

4.2. Vibration Control Performance Analysis

5. Conclusions

Abstract

Details

Suggested sources

H∞ Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning

Jump to:

Full text

1. Introduction

2. Mathematical Model

3. Methodology

3.1. H∞ Differential Game

3.2. Off-Policy RL Algorithm

4. Numerical Simulation

4.1. Implementation of Algorithm 1

4.2. Vibration Control Performance Analysis

5. Conclusions

Abstract

Details

H_∞ Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning

3.1. H_∞ Differential Game