Full Text

Turn on search term navigation

1. Introduction

Recent research on quadcopters has established them as a significant platform for the development of unmanned aerial vehicles (UAVs) owing to their simple construction and maintenance. These quadcopters demonstrate exceptional maneuverability and are capable of hovering, taking off, flying, and landing in confined spaces, owing to their vertical takeoff and landing (VTOL) capability [1]. A quadcopter comprises four rotors positioned at the corners of a cross-shaped frame, and it is controlled by adjusting the rotational speeds of these rotors [2,3]. Quadcopters are employed in various applications such as surveillance, search-and-rescue missions, and hazardous operations in congested and Global Positioning System (GPS)-denied settings, such as dense forests, crowded offices, corridors, and warehouses. To perform these tasks effectively, quadcopters must have robust navigation abilities and a thorough understanding of their operational environment. Substantial research work examined various aspects of quadcopter dynamics and control, motion planning, and trajectory generation in unstructured environments [4,5,6,7,8].

Sampling-based optimal control methods have attracted considerable interest in path planning and trajectory tracking applications. These control techniques function by generating multiple trajectories at each control step through input sampling (referred to as rollouts) and subsequently determining a sub-optimal sequence of control inputs based on the sampled trajectories. This process involves executing a finite time-horizon optimization, utilizing the optimal input for the initial time step and then iteratively repeating the optimization for subsequent time steps. The path integral optimal control framework [9,10,11] provides a mathematical foundation for developing optimal control algorithms through stochastic trajectory sampling. The main principle is based on the transformation of the value function for the optimal control problem using the Feynman–Kac lemma [12,13] into an expectation over all possible trajectories, referred to as a path integral. This transformation enables the resolution of stochastic optimal control problems by a Monte Carlo approximation employing a forward sampling of stochastic diffusion processes. Furthermore, the utilization of path integral control theory has gained prominence in recent research.

Model Predictive Control (MPC) has been used on various robotic systems, such as self-driving cars [14] and quadcopters [15], and it is widely employed in industrial applications [16]. The path integral control framework can be integrated with the Model Predictive Control paradigm. In this scenario, an open-loop control sequence is continuously optimized in the background, whereas the system finds the current optimal estimate of the controller in parallel. One big problem with this approach is that it needs to sample a lot of different trajectories simultaneously, which is challenging when the systems being considered have complex dynamics. A possible solution to this issue involves simplifying the system model through a hierarchical scheme [17]. This employs path integral control to create trajectories for a simplified point mass model that a low-level controller can then follow. Model Predictive Path Integral (MPPI) control was introduced in [18], displaying its ability in aggressive autonomous driving. It explored information-theoretic dualities between free energy and relative entropy, which led to the formation of the Model Predictive Path Integral control logic. Williams developed the iterative path integral method, known as the Model Predictive Path Integral control framework [16]. That method iteratively modified the control sequence to find optimal solutions based on the importance of sampling trajectories. In [19], an alternative iterative method was proposed that removed the control and noise affine constraints inherent in the original Model Predictive Path Integral framework, which could accommodate non-affine dynamics. The central idea behind that extension was to use the information-theoretic interpretation of optimal control through the Kullback–Leibler (KL) divergence and free energy. It was different from the original method, which used a linearization of the Hamilton–Jacobi–Bellman (HJB) equation and applied the Feynman–Kac lemma. Despite these methods coming from different derivations, they were practically equivalent and theoretically linked. A learning model to generate informed sampling distributions was presented in [20], which was an extension of Williams’s information-theoretic-based method. In [15], the generation of time-optimal trajectories through multiple waypoints for quadrotors was investigated.

The MPPI controller has been mainly utilized in aggressive driving and UAV navigation in congested conditions. However, UAV navigation in natural conditions often involves incomplete state information. In environments such as forests and urban canyons, the navigation system cannot directly measure all state variables due to sensor limitations or occlusion problems [1,6]. Although most control schemes depend on the accuracy of the underlying models, specific adaptive control, sliding mode control, and robust control methods can remain effective despite model inaccuracies [21,22,23]. A comprehensive analysis of advanced aerodynamic effects controlling quadrotor flight, such as blade flapping and airflow dynamics, is conducted in [24]. These effects are fundamentally complicated to model, posing considerable challenges in controller design. The standard Model Predictive Path Integral approach may necessitate substantial computational resources or manual parameter adjustment to maintain control performance during dynamic environmental variations or severe system state fluctuations. Therefore, relying exclusively on the Model Predictive Path Integral control mechanism is inadequate for responding to modern complicated missions. Incorporating an adaptive mechanism into the Model Predictive Path Integral framework allows for the dynamic modification of essential parameters according to system states, and control performance enhance the robustness and efficiency of the control strategy. Moreover, to overcome modeling problems, data-driven and learning-based control strategies have been proposed [25]. A promising approach employs neural networks (NNs) to model system dynamics. Neural networks are acknowledged as universal function approximators, capable of modeling highly nonlinear functions and latent states directly from observed data, which may be challenging to model otherwise explicitly [26]. The study in [27] demonstrates that simple feedforward networks can proficiently learn and generalize system dynamics with high accuracy.

This paper proposes an adaptive control method that integrates the enhanced Model Predictive Path Integral (MPPI) method and a Multilayer Perceptron (MLP) neural network, which can dynamically modify essential parameters according to system states or control performance. In experiments, the method achieves a root-mean-square error (RMSE) in trajectory tracking that is more than 10% lower than alternative methods, surpassing the MPPI control without learning dynamics and the Model Predictive Path Integral method employed with a two-layer neural network. The paper is organized as follows: Section 2 explains the dynamic model of quadcopter; Section 3 presents the training and validation of the MLP; and Section 4 proposes the adaptive Model Predictive Path Integral approach, which can adjust its sampling based on trajectory error. Section 5 presents a comparison of Model Predictive Path Integral control without learning, utilizing two hidden layers of neural network (NN), to adaptive Model Predictive Path Integral control employing a Multilayer Perceptron. Experimental results prove that the adaptive Model Predictive Path Integral method utilizing a Multilayer Perceptron outperforms alternative methodologies.

2. Material

In this section, we first present the nonlinear dynamic model of the quadcopter to be used and the associated model uncertainties and external disturbances.

A general discrete-time stochastic nonlinear system is of the form [1,9]:

(1) $x_{t + 1} = f (x_{t}, u_{t} + ε_{t}) + ω_{t}$

where

x \in ℝ^{N}

is the state, and

u \in ℝ^{M}

is the control input. The term

ε_{t} \sim N (0, σ_{ε}^{2})

represents a noise disturbance with a normal distribution in the control signals generated in the lower-level controllers for real-world robot systems. The term

ω_{t}

represents the combined errors due to inaccurate model representations and purely external disturbances, such as wind in the case of quadcopters.

Quadcopter Model

Figure 1 shows the quadcopter model, with an inertial reference frame defined as $W$ and a body-fixed frame defined as $B$ , with the origin defined at the center of the quadcopter. The Euler angles to describe the quadrotor rotations in $W$ about the body-fixed axes $x_{B}$ , $y_{B}$ , and $z_{B}$ , are roll $ϕ$ , pitch $θ$ , and yaw $ψ$ , respectively.

The rotation from $W$ to $B$ with a ZYX transformation is expressed as:

(2) ${}^{W}R_{B} = [\begin{matrix} c_{ψ} c_{θ} & c_{ψ} s_{ϕ} s_{θ} - c_{ϕ} s_{ψ} & s_{ψ} s_{ϕ} + c_{ϕ} c_{ψ} s_{θ} \\ c_{θ} s_{ψ} & c_{ϕ} c_{ψ} + s_{ψ} s_{ϕ} s_{θ} & c_{ϕ} s_{ψ} s_{θ} - c_{ψ} s_{ϕ} \\ - s_{θ} & c_{θ} s_{ϕ} & c_{ϕ} c_{θ} \end{matrix}]$

where

s_{x} = \sin (x), c_{x} = \cos (x)

for

\forall x \in \{ϕ, θ, ψ\}

The body angular velocity $Ω$ represents the angular velocity vector relative to the body frame and is defined as:

(3) $Ω = [\begin{matrix} p \\ q \\ r \end{matrix}]$

where

p

is the rotation rate around the

x_{B}

axis;

q

is the rotation rate around the

y_{B}

axis;

r

is the rotation rate around the

z_{B}

axis.

The Euler angles $(ϕ, θ, ψ)$ define the orientation of the quadrotor by rotating from the inertial frame to the body frame. The relationship between the body angular velocity $Ω$ and the Euler angle derivatives $(\dot{ϕ}, \dot{θ}, \dot{ψ})$ is given by:

(4) $Ω = K \dot{Φ}$

where

\dot{Φ} = [\begin{matrix} \dot{ϕ} \\ \dot{θ} \\ \dot{ψ} \end{matrix}]

To derive the transformation matrix $K$ , we decompose the rotation in the inertial frame. After applying the sequence of rotations, we can express $Ω$ in terms of $\dot{ϕ}, \dot{θ}, \dot{ψ}$ as:

(5) $Ω = [\begin{matrix} 1 & 0 & - \sin (θ) \\ 0 & \cos (ϕ) & - \sin (ϕ) \cos (θ) \\ 0 & - \sin (ϕ) & \cos (ϕ) \cos (θ) \end{matrix}] [\begin{matrix} \dot{ϕ} \\ \dot{θ} \\ \dot{ψ} \end{matrix}]$

Taking the inverse, we can express the Euler angle rates as:

(6) $\dot{Φ} = K^{- 1} Ω$

where

K^{- 1} = T

is the required transformation matrix.

(7) $T = [\begin{matrix} 1 & s_{ϕ} t_{θ} & c_{ϕ} t_{θ} \\ 0 & c_{ϕ} & - s_{ϕ} \\ 0 & \frac{s_{ϕ}}{c_{θ}} & \frac{c_{ϕ}}{c_{θ}} \end{matrix}]$

The linear motion equations describe the translational motion in space. Using Newton’s second law, we have:

(8) $m \ddot{r} = F_{t o t a l}$

where m is the mass of the quadrotor;

r

is the position vector of the quadrotor;

F_{t o t a l}

is the total force acting on the quadrotor.

The total force consists of two components: the gravitational force $- m g e_{3}$ , where $e_{3} = {[0, 0, 1]}^{T}$ is the unit vector pointing upwards, and the thrust force $F = {}^{W}R_{B} F e_{3}$ , where the thrust is always directed along the $z_{B}$ -axis of the body frame. The rotation matrix ${}^{W}R_{B}$ converts that thrust into the inertial frame. The dynamics of the quadcopter are given by:

(9) $\dot{x} = v$

(10) $m \dot{v} = - m g e_{3} + {}^{W}R_{B} F e_{3} + d_{F}$

(11) $\dot{Φ} = T^{- 1} Ω$

(12) $J \dot{Ω} = Γ + d_{Γ} - Ω \times J Ω$

where m is the mass of the quadcopter,

J

is the inertia matrix expressed in

B

x = {[x, y, z]}^{T}

is the position vector,

v = {[v_{x}, v_{y}, v_{z}]}^{T}

is the velocity vector,

Φ = {[ϕ, θ, ψ]}^{T}

is the Euler angle vector,

Ω = {[p, q, r]}^{T}

is the body angular rate vector,

e_{3} = {[0, 0, 1]}^{T} \in ℝ^{3}, F \in ℝ^{+}

is the body thrust input generated by the motors,

Γ = {(τ_{x}, τ_{y}, τ_{z})}^{T} \in ℝ^{3}

is the torque input to the vehicle in

B

, and

L \in ℝ^{+}

is the arm length of the quadrotor. The disturbances

ε

and

ω

are explicitly included in the quadrotor dynamics equations, affecting both thrust

F

and torque

Γ

d_{F}

represents the external force disturbance vector affecting the thrust

F

d_{Γ}

represents the external torque disturbance vector affecting the torque

Γ

. The state of the system is defined as

x = {[x, y, z, ϕ, θ, ψ, v_{x}, v_{y}, v_{z}, p, q, r]}^{T} \in ℝ^{12}

. Each rotor produces a vertical force

F_{i} = k_{F} ω_{i}^{2}

and moment

M_{i} = k_{M} ω_{i}^{2}

, where

ω_{i}

is the angular velocity of

i^{t h}

rotor. We map the control inputs

F

and

Γ

to the system inputs

u = [ω_{1}, ω_{2}, ω_{3}, ω_{4}]

(13) $[\begin{matrix} F \\ τ_{x} \\ τ_{y} \\ τ_{z} \end{matrix}] = [\begin{matrix} k_{F} & k_{F} & k_{F} & k_{F} \\ 0 & k_{F} L & 0 & - k_{F} L \\ - k_{F} L & 0 & k_{F} L & 0 \\ k_{M} & - k_{M} & k_{M} & - k_{M} \end{matrix}] [\begin{matrix} ω_{1}^{2} \\ ω_{2}^{2} \\ ω_{3}^{2} \\ ω_{4}^{2} \end{matrix}]$

where

k_{F}

is the thrust coefficient,

k_{M}

is the moment (torque) coefficient,

L

is the distance from the rotor to the center of mass.

We deal with the actuator dynamics by introducing a first-order lag in the control inputs. This accounts for the physical limitations and response times of the quadrotor’s actuators, ensuring that rapid changes in control commands do not lead to unrealistic actuator behaviors. Specifically, the actuator dynamics are represented as follows.

The actuator is modeled as a first-order linear system characterized by its time constant $τ$ :

(14) $τ \dot{u} + u = u_{c m d}$

where

τ

is the actuator time constant,

u

is the actual control input, and

u_{c m d}

is the commanded control input. Rearranging the equation, we obtain:

(15) $\dot{u} = - \frac{1}{τ} u + \frac{1}{τ} u_{c m d}$

This first-order model captures the delayed and smoothed response of the actuators to control commands, reflecting real-world actuator behavior.

By simulating the actuator dynamics, the dataset more accurately reflects real-world quadcopter behavior, including the effects of rotor inertia and drag. In subsequent sections, we simplify the dynamics during control implementation because the neural network has already learned the influence of actuator dynamics. The inclusion of Equation (14) in the data generation process ensures that the learned model accurately captures the effects of actuators on the quadcopter’s overall dynamics.

3. Methods

3.1. Neural Network

This section describes the neural network architecture employed to mimic the quadrotor dynamics. The improvement in computational capacity enables the execution of large-scale batch predictions within minimal timeframes. The adequately trained neural networks can effectively generalize the system’s dynamic equations using supervised learning techniques. Subsequently, employing dynamic equations directly in iterative control schemes has disadvantages, as these methods frequently neglect model uncertainties and external disturbances. These constraints can be mitigated by replacing traditional dynamic models with neural networks, which can be continuously updated online as new data are acquired during quadrotor flights. This adaptability enables the neural networks to adjust changes in the model, hence improving control performance. This section provides a detailed explanation of the data collection process, neural network model design, training procedures, and the resulting training outcomes.

3.1.1. Data Generation

First, we created a dataset of input–output pairs, where the inputs were the current states and control inputs, and the outputs were the corresponding state derivatives. The input $x_{i n p u t} = [x, y, z, v_{x}, v_{y}, v_{z}, ϕ, θ, ψ, p, q, r, σ_{1}, σ_{2}, σ_{3}, σ_{4}, γ]$ to the neural network was 17-dimensional data, and the output $y_{o u t p u t} = [\dot{x}, \dot{y}, \dot{z}, {\dot{v}}_{x}, {\dot{v}}_{y}, {\dot{v}}_{z}, \dot{ϕ}, \dot{θ}, \dot{ψ}, \dot{p}, \dot{q}, \dot{r}]$ was the 12-state derivatives. Multiple input combinations were sampled from uniform distributions within the operational range defined for the quadrotor. The state and their ranges are specified in Table 1.

The sampled variables $(σ_{1}, σ_{2}, σ_{3}, σ_{4})$ were not used as inputs to the NN directly for training. For a quadcopter, the thrust $T_{i}$ produced by each rotor can be represented as:

(16) $T_{i} = k_{F} ω_{i}^{2}$

where

k_{F}

is the thrust coefficient, which represents the efficiency of thrust generation by the rotor, and

ω_{i}

is the angular velocity of the ith rotor.

The total thrust $F$ is the sum of the thrusts produced by all four rotors:

(17) $F = \sum T_{i} = k_{F} (ω_{1}^{2} + ω_{2}^{2} + ω_{3}^{2} + ω_{4}^{2})$

In addition to thrust, each rotor generates roll, pitch, and yaw moments. The moments consist of two parts: (1) roll and pitch moments and (2) yaw moments. These are moments caused by the eccentric position of the rotors. For roll and pitch, the moments are:

(18) $τ_{x} = L k_{F} (- σ_{2}^{2} + σ_{4}^{2}), τ_{y} = L k_{F} (- σ_{1}^{2} + σ_{3}^{2})$

where

L

is the distance from the rotor to the center of mass of the quadcopter.

The sign of the moment depends on the arrangement and rotation direction of the rotors.

The yaw moment: this moment is generated by the reaction torques of the rotors and can be expressed as:

(19) $τ_{z} = k_{M} (- σ_{1}^{2} + σ_{2}^{2} - σ_{3}^{2} + σ_{4}^{2})$

where

k_{M}

is the coefficient representing the efficiency of moment generation.

Gyroscopic precession effect: differences in the speed of the rotors also lead to a net gyroscopic effect, which affects the system’s dynamic response.

(20) $γ = σ_{1} - σ_{2} + σ_{3} - σ_{4}$

where

γ

is the net effect due to differences in motor speeds, contributing to gyroscopic precession.

We used these sampled variables to calculate the following additional inputs to the NN as

(21) $\begin{array}{l} F = k_{F} (σ_{1}^{2} + σ_{2}^{2} + σ_{3}^{2} + σ_{4}^{2}) \\ τ_{x} = L k_{F} (- σ_{2}^{2} + σ_{4}^{2}) \\ τ_{y} = L k_{F} (- σ_{1}^{2} + σ_{3}^{2}) \\ τ_{z} = k_{M} (- σ_{1}^{2} + σ_{2}^{2} - σ_{3}^{2} + σ_{4}^{2}) \\ γ = σ_{1} - σ_{2} + σ_{3} - σ_{4} \end{array}$

where

F

is the sum of thrusts produced by all four rotors,

τ_{x}

is the roll torque causing a rotation around the X-axis,

τ_{y}

is the pitch torque causing a rotation around the Y-axis,

τ_{z}

is the yaw torque causing rotation around the Z-axis,

γ

is the net effect due to differences in motor speeds, contributing to gyroscopic precession.

These inputs were passed through the quadrotor dynamics defined in Section 2. The training output for the NN was the state derivatives obtained for each input.

The use of neural networks allowed for the implicit approximation of disturbances in the dynamic model. During training, the neural network employed supervised learning, using input data, which included the effects of disturbances, and output data to learn the dynamic relationships of the entire system and establish an approximate dynamic model:

$\hat{f} (x_{t}, u_{t}) \approx f (x_{t}, u_{t} + ε_{t}) + ω_{t}$

where

\hat{f}

is the dynamic model learned by the neural network.

By approximating $ε_{t}$ and $ω_{t}$ through the neural network, the design phase of the controller did not need to explicitly introduce stochastic disturbances. Instead, these disturbances were considered as part of the learned deterministic model.

3.1.2. NN Model and Training

This section describes the selection of a Multilayer Perceptron (MLP) neural network architecture to imitate the quadrotor dynamics [27,28]. The network consisted of six hidden layers containing 256, 256, 128, 128, 64, and 64 neurons, respectively, as shown in Figure 2. Every hidden layer employed the Leaky Rectified Linear Unit (LeakyReLU) activation function to incorporate nonlinearity and mitigate the vanishing gradient problem. The training process utilized a dataset consisting of 15,000 samples, processed over 2000 epochs with a batch size of 64. A Graphics Processing Unit (GPU) was employed to accelerate training, facilitating fast computation and supporting a gradual decrease in the loss function through multiple iterations, thus improving the model’s predictive performance. The optimization utilized the Adam optimizer in conjunction with a custom learning rate scheduler to enhance accuracy. The learning rate followed an exponential decay strategy, starting with an initial rate of 0.001 and reducing to 0.0001 over 1000 epochs. That method ensured stable convergence and prevented overshooting during training. The neural network model was developed with TensorFlow 2 and trained on a system comprising an Intel i9-13900 HX processor, 16 GB of RAM, and an NVIDIA GeForce RTX 3060 Ti GPU with 8 GB of memory. The model achieved a Mean Squared Error (MSE) of 0.013 and a Mean Absolute Error (MAE) of 0.047 on the test dataset on completion of training, which indicated high predictive accuracy and robustness.

The LeakyReLU activation function is as follows:

(22) $f (x) = \{\begin{matrix} x \\ α x \end{matrix} \begin{matrix} i f \\ i f \end{matrix} \begin{matrix} x > 0 \\ x \leq 0 \end{matrix}$

where

α

is a small positive constant, typically set to

α = 0.01

or another small value. Specifically, when the input

x

is positive, Leaky ReLU outputs

x

; when the output is negative, it outputs

α x

, retaining a slight negative gradient.

The Mean Squared Error was used for the loss function; it can measure the average squared difference between the predicted and actual state derivatives:

(23) $L (θ) = \frac{1}{N} {\sum_{i = 1}^{N} ({\hat{y}}_{i} - y_{i})}^{2}$

where

{\hat{y}}_{i}

is the predicted state derivative from the MLP,

y_{i}

is the ground-truth state derivative.

On completion of training, the network approximated the function $f : x_{i n p u t} \to y_{o u t p u t}$ , essentially functioning as a surrogate model for the quadrotor dynamics.

3.1.3. Validation

A distinct validation set was used to assess the model’s performance on unseen data after completing training, thereby preventing the model from overfitting and from memorizing the training data. The MSE and MAE outputs on the validation set indicated the model generalization capability. Figure 3 shows the desired value and actual value on validation set. Points situated along the diagonal line y = x in Figure 3 indicate flawless predictions. Figure 4 displays the actual position and predicted position of $(x, y, z)$ over time on the validation set. Figure 5 shows the actual angular and predicted angular velocity of $(r, p, q)$ over time on the validation set.

3.2. Improved Model Predictive Path Integral

For a discrete-time dynamical system with states $x_{t} \in ℵ$ , control inputs $u_{t} \in U$ , $ε$ error in control input, $ω$ as external disturbances, and mapping $f : ℵ \times U \to ℵ$ from the current state and input to the next state given by

(24) $x_{k + 1} = f (x_{k}, u_{k})$

a general optimal control problem consists of finding a control policy

π (x)

, a map from the current state to the optimal input,

π : ℵ \to U

, such that a cost

J : ℵ \times U \to ℝ^{+}

is minimized and is given by

(25) $π (x) = \underset{u}{\arg \min} J (x, u)$

(26) $J (x, u) = \sum_{t = 0}^{T} ({‖x_{t} - x_{r e f}‖}_{Q}^{2} + {‖u_{t}‖}_{R}^{2})$

where

x_{t}

is the current state,

x_{r e f}

is the reference state,

u_{t}

is the control input, and

Q

and

R

are weighting matrices, subject to

x_{0} = x

(27) $x_{k + 1} = f (x_{k}, u_{k})$

(28) $x_{k} \in ℵ, u \in U$

The path integral control method for developing optimal control algorithms is based on stochastic trajectory samplings [29,30]. This method’s benefit is in its avoidance of using derivatives for the system dynamics or the cost function, thus providing robustness in estimating the system dynamics and formulating the cost function. Given a transition model F, a sample size N, number of time steps (horizon) T, and an initial control sequence $u_{t}^{i} = \{u_{t | t}^{i}, u_{t + 1 | t}^{i}, \dots u_{t + T - 1 | t}^{i}\}$ , the algorithm calculates K rollouts for T time steps for the inputs and evaluates different possible trajectories by running them through the dynamic model F. The cost and weight for each of these rollouts are calculated using the provided cost function. The K weights are used to evaluate the contribution of each rollout to the final optimal input sequence, with only the first input being applied. The remainder of the sequence is used as the initial input for the subsequent iteration, and the process continues. The traditional MPPI method may require considerable computational resources or manual parameter adjustments to maintain control performance in response to dynamic environmental changes or significant swings in system states. Consequently, incorporating an adaptive mechanism into the MPPI control method enables the dynamic adjustment of its essential parameters according to system states or control performance, thereby enhancing the control’s robustness and efficiency.

At each time step t, multiple control sequences were generated by randomly sampling the control inputs. The sampled control input sequences were denoted as follows:

(29) ${\{u_{t}^{i}\}}_{t = 1}^{N}$

Each control sequence $u_{t}^{i} = \{u_{t | t}^{i}, u_{t + 1 | t}^{i}, \dots u_{t + T - 1 | t}^{i}\}$ had the length of the prediction horizon T.

The real-time system error $e (t)$ as the distance between the current system state $x$ and the target state $x_{d}$ was defined as:

(30) $e (t) = ‖x_{d} - x‖$

The Euclidean distance was used to calculate the system error as follows:

(31) $e (t) = \sqrt{{(x_{d} - x)}^{T} (x_{d} - x)}$

Based on the real-time error $e (t)$ , we dynamically adjusted the sampling number N and the prediction horizon T:

(32) $N = N_{o} + k_{N} e (t)$

(33) $T = T_{o} + k_{T} e (t)$

where

N_{o}

and

T_{o}

are the baseline sampling number and baseline prediction horizon used when the error is small.

k_{N}

and

k_{T}

are gain coefficients that control the responsiveness of the sampling number and prediction horizon to the error.

When the error $e (t)$ increased, the sampling number N was increased to enhance sampling density and control precision. Conversely, when the error decreased, the sampling number was reduced to save computational resources. Similarly, the prediction horizon T was dynamically adjusted based on the error magnitude. An increased error led to a longer prediction horizon, allowing the controller to consider state changes over a more extended period. A decreased error shortened the prediction horizon, improving computational efficiency.

The cumulative cost for each trajectory was calculated as follows:

(34) $J^{i} = \sum_{k = 0}^{T - 1} l (x_{t + k | t}^{i}, u_{t + k | t}^{i})$

The weight for each trajectory was calculated as follows:

(35) $ω^{i} = \frac{\exp (- J^{i} / λ)}{\sum_{j = 1}^{N} \exp (- J^{j} / λ)}$

Subsequently, the current control input using a weighted average of all sampled control inputs was updated as follows:

(36) $u_{t} = \sum_{i = 1}^{N} ω^{i} u_{t}^{i}$

After applying the updated control input $u_{t}$ to the system, we moved to the next time step t + 1 and repeated the above steps.

The MLP and improved MPPI control block diagram is shown in Figure 6:

Flight data and dynamic inputs: Provide a realistic dynamic description of the system, serving as the true data source in the simulation. These dynamics describe the motion behavior of the quadrotor UAV under external influences, including position, velocity, angular velocity, noise, and disturbance.

Multilayer Perceptron: The MLP is used to learn the system dynamics, effectively replacing traditional dynamic models to some extent. This approach is particularly effective in complex dynamic environments or under external disturbances. The neural network’s inputs are the current system state and control inputs, and the output is the derivative of the system state.

Improved MPPI controller: Based on the path integral method, the controller generates multiple possible trajectories through stochastic sampling and selects the optimal control input from them. By adaptively adjusting the sampling number and prediction horizon, the improved MPPI controller can better handle dynamic changes in the environment. When the error increases, the sampling number is increased to improve control accuracy; when the error decreases, the sampling number is reduced to save computational resources.

Trajectory output and control adjustment: The MPPI controller dynamically adjusts the control input based on real-time state errors, generating the trajectory output for the UAV. When there is an error between the actual trajectory and the target trajectory, the control adjustment module provides real-time feedback on this error and optimizes the process by dynamically adjusting the sampling number and prediction horizon, thereby enhancing control performance.

3.3. Stability Analysis

3.3.1. Neural Network Convergence Analysis

According to the universal approximation theorem, neural networks with appropriate structures and sufficient training data can approximate any continuous function with arbitrary precision. Therefore, we assumed that the MLP neural network satisfied the following after training:

(37) $‖\hat{f} (x, u) - f (x, u)‖ \leq ε, \forall x, u \in D$

where

\hat{f} (x, u)

is the neural network-approximated dynamics model, and

f (x, u)

is the true dynamics model;

ε

is a small positive constant, and D is the system’s operating space.

During the training process, the neural network reduced the approximation error $ε$ iteratively through learning, ultimately keeping it within a small bounded range. Through learning mechanisms and Figure 3, Figure 4 and Figure 5, it ensured that the approximation error always remained within a controllable range. The above analysis demonstrates that the neural network reliably converged to an accurate approximation of the quadcopter dynamics.

3.3.2. Stability Analysis of the Closed-Loop System

To prove the stability of the closed-loop system, we adopted a control framework based on passivity, following the method proposed by Ruggiero et al. [31]. The passivity-based control method is suitable for nonlinear systems and effectively handles system uncertainties and external disturbances.

Choose the control law as:

(38) $u = u_{M P P I} - K e$

where

u_{M P P I}

is the control input obtained from the Model Predictive Path Integral (MPPI) controller, and

K

is the gain matrix designed to stabilize the tracking error,

e

is the error.

Define a Lyapunov function V to evaluate the stability of the system. For the quadrotor system, we selected the following candidate Lyapunov function:

(39) $V (e) = e^{T} P e$

where

e = x - x_{r e f}

is the tracking error of the system, and P is a symmetric positive definite matrix.

Compute the time derivative of the Lyapunov function:

(40) $\dot{V} = 2 e^{T} P \dot{e}$

Based on the system dynamics:

(41) $\dot{x} = f (x, u) + d$

where

d

represents all external disturbances and model uncertainties.

Based on $e = x - x_{r e f}$ and Equation (40), we obtained:

(42) $\dot{e} = f (x, u) + d - {\dot{x}}_{r e f}$

The neural network approximation becomes:

(43) $f (x, u) \approx \hat{f} (x, u) - ε (x, u)$

Thus,

(44) $\dot{e} = \hat{f} (x, u) - ε (x, u) + d - {\dot{x}}_{r e f}$

Assuming that the MPPI controller ensures that in the absence of errors and disturbances:

(45) $\hat{f} (x_{r e f}, u_{M P P I}) = {\dot{x}}_{r e f}$

we can linearize the tracking error dynamics around

x_{r e f}

(46) $\dot{e} \approx (A - B K) e - ε (x, u) + d$

Substituting

\dot{e}

into

\dot{V}

(47) $\dot{V} = 2 e^{T} P ((A - B K) e - ε (x, u) + d)$

where

A

and

B

are the Jacobian matrices of the system dynamics with respect to

x

and

u

, respectively.

Under the passivity-based control framework, the control law and Lyapunov function parameters are designed such that the Lyapunov derivative satisfies:

(48) $\dot{V} \leq - λ {‖e‖}^{2} + γ {‖d‖}^{2} + β {‖ε‖}^{2}$

where

λ > 0, γ > 0, a n d β > 0

are design parameters.

To ensure stability, we require the following:

(49) $\dot{V} \leq 0$

To achieve this, the matrix $P$ and the control gain matrix $K$ are selected by solving the Lyapunov equation.

Choose $P$ and $K$ so that they satisfy the following equation:

(50) ${(A - B K)}^{T} P + P (A - B K) = - Q$

where

Q > 0

is a positive definite matrix. This condition ensures that the linear part of

\dot{V}

is negative definite:

(51) $2 e^{T} P (A - B K) e \leq - λ {‖e‖}^{2}$

where

λ > 0

is the smallest eigenvalue of

Q

Apply the Cauchy–Schwarz inequality to the remaining terms:

(52) $- 2 e^{T} P ε + 2 e^{T} P d \leq 2 ‖P‖ ‖ε‖ ‖e‖ + 2 ‖P‖ ‖d‖ ‖e‖$

To maintain

\dot{V} \leq 0

, we require the following:

(53) $- λ {‖e‖}^{2} + 2 ‖P‖ ‖ε‖ ‖e‖ + 2 ‖P‖ ‖d‖ ‖e‖ \leq 0$

To satisfy the above inequality, we can bound the terms involving $‖e‖$ by ensuring that

(54) $λ {‖e‖}^{2} \geq 2 ‖P‖ ‖ε‖ ‖e‖ + 2 ‖P‖ ‖d‖ ‖e‖$

Dividing both sides by

‖e‖

(assuming

‖e‖ > 0

(55) $λ ‖e‖ \geq 2 ‖P‖ (‖ε‖ + ‖d‖)$

This condition holds if:

(56) $‖e‖ \geq \frac{2 ‖P‖ (‖ε‖ + ‖d‖)}{λ}$

When the tracking error $‖e‖$ is sufficiently large, $\dot{V} \leq 0$ . For smaller errors, the bounded approximation error and disturbances are dominated by the negative definite term, ensuring that $\dot{V} \leq 0$ overall.

From the above analysis, we conclude that the tracking error $e$ converges to a bounded region around zero, with the bound determined by the approximation error $ε$ and disturbances $‖d‖$ . The tracking error remains bounded all the time, ensuring robust performance despite disturbances and model uncertainties. The control law satisfies the passivity condition, preventing energy injection that could destabilize the system.

Therefore, the closed-loop quadrotor system under the proposed MPPI controller and MLP neural network approximation achieves global asymptotic stability, ensuring reliable trajectory tracking in the presence of disturbances and model uncertainties.

4. Results

The neural network was trained using TensorFlow 2 on Google Colab, utilizing local system resources. Drone simulations were conducted in MATLAB, leveraging the MATLAB Deep Learning Toolbox, which facilitates the import and utilization of pre-trained TensorFlow models. We implemented three control methods: (1) Model Predictive Path Integral (MPPI) control without learning dynamics, (2) MPPI control with dynamics learned by a simple fully connected neural network, (3) MPC with dynamics learned by a Multilayer Perceptron (MLP) and (4) MPPI control with dynamics learned by a Multilayer Perceptron (MLP). Simulations were performed across two distinct trajectories: (1) waypoint navigation and (2) tracking a figure-eight trajectory. In all simulations, the plant was integrated at a frequency of 100 Hz (every 0.01 s), and the controller operated at 20 Hz (every 0.05 s). The initial position of the quadcopter was set to [0,10] with an initial control input of [620.6108, 620.6108, 620.6108, 620.6108]. For the waypoint navigation task, the desired position was [3,10]. The figure-eight trajectory was defined with a scale factor of 2, and an additional spiral path was utilized to evaluate the algorithm’s performance further. All algorithms were evaluated using a comprehensive cost function that penalized deviations in position tracking, velocity magnitudes, angular rotations, deviations of control signals from the nominal hovering control (indirectly reflecting control effort), and changes in control signals. A critical penalty was imposed if the sampled trajectory descended below z = 0 (below the reference ground level). The Root-Mean-Square Error (RMSE) metric was employed to assess the experimental results across the three trajectory tasks:

$R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} e_{i}^{2}}$

4.1. MPPI Control Without Learning Dynamics

(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.12 s to execute. The program execution time was 618.76 s. The final tracking error was 0.1426 m. The results are shown in Figure 7a–f. The RMSE was 0.7124 m.

(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.21 s to execute. The program execution time was 938.49 s. The results are shown in Figure 8a–f. The RMSE was 1.9756 m.

4.2. MPPI Control with Learning Dynamics with Two Hidden Layers Connected to Neural Network

(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.48 s to execute. The program execution time was 723.15 s. The results are shown in Figure 9a–f. The RMSE was 0.7053 m.

(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.5 s to execute. The program execution time was 986.37 s. The results are shown in Figure 10a–f. The RMSE was 1.3883 m.

4.3. MPC Control with Learning Dynamics by Multilayer Perceptron

(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 1.6 s to execute. The program execution time was 650.25 s. The results are shown in Figure 11a–f. The RMSE was 0.9747 m.

(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.53 s to execute. The simulation time was 30 s, and the actual program execution time was 1253.27 s. The results are shown in Figure 12a–f. The RMSE was 1.8973 m.

4.4. MPPI Control with Learning Dynamics by Multilayer Perceptron

(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 1.8 s to execute. The program execution time was 819.68 s. The results are shown in Figure 13a–f. The RMSE was 0.6515 m.

(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.45 s to execute. The simulation time was 30 s, and the actual program execution time was 1142.27 s. The results for the same are shown in Figure 14a–f. The RMSE was 1.1093 m.

5. Discussion

Although existing MPPI methods perform well in path planning and trajectory tracking, they typically require high computational resources and manual parameter adjustments to maintain control performance. In this study, we proposed an improved MPPI controller that incorporates an adaptive mechanism based on system errors, allowing a dynamic adjustment of the sampling number and prediction horizon, thereby improving the controller’s robustness and efficiency in complex dynamic environments.

This paper used a multilayer perceptron (MLP) neural network to learn the dynamics of the quadcopter and combined it with the MPPI controller. Compared to other control methods based on traditional dynamic models, our approach can handle modeling inaccuracies and external disturbances by learning the system’s nonlinear dynamic behavior. Compared to classical model-based approaches, this data-driven control method better adapts to real-world uncertainties.

Unlike other studies that use traditional Model Predictive Control (MPC) and robust control methods, our research continuously learns the system dynamics using a neural network and introduces an adaptive adjustment mechanism, enabling the system to maintain good performance even in the face of dynamic environmental changes and partial observability. This characteristic is particularly suitable for UAVs operating in unknown or dynamic environments, such as forests and urban canyons.

6. Conclusions

This paper presented an approach that integrated MPPI control and an MLP. The improved MPPI method could adjust the sampling quantity according to trajectory error to achieve a better trajectory. The MLP neural network could learn the dynamics of a quadrotor by its state parameter and model the quadcopter. The improved MPPI employed the model to guide and control the quadcopter along the desired trajectory. Three trajectories were employed to test the effectiveness of the proposed method. The results were compared with MPPI control without learning, MPPI control with two hidden layers, MPC with MLP, and improved MPPI control utilizing MLP. The experimental results showed that improved MPPI control with MLP had better performance than MPPI control without learning and MPPI control with two hidden layers. In the future, reinforcement learning methods could be integrated, allowing the neural network to automatically optimize control parameters based on flight variations, thereby improving adaptability and resilience to external disturbances. We could extend the improved MPPI method to not only perform single-objective trajectory tracking but also to execute multi-objective path planning and collision avoidance tasks in complex environments. Future work could incorporate graph search algorithms and global path optimization techniques to improve navigation efficiency and safety.

Author Contributions

Conceptualization, Y.L. and Q.Z.; methodology, Y.L.; software, Y.L.; validation, Y.L. and A.E.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, A.E.; visualization, Y.L.; supervision, Q.Z. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

All data are available within the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Table

Figure 1. Quadcopter model.

Figure 2. Multilayer Perceptron architecture for learning dynamics.

Figure 3. Desired value and actual value on the validation test.

Figure 4. Actual position and predicted position of [Forumla omitted. See PDF.] over time on the validation set.

Figure 5. Actual angular and predicted angular velocity of [Forumla omitted. See PDF.] over time on the validation set.

Figure 6. MLP and improved MPPI control block diagram.

View Image - Figure 7. Dynamic simulation of MPPI control without learning on waypoint trajectory (the blue line represents the result from the MPPI control without learning; the red line represents the desired result); RMSE = 0.7124 m.

Figure 7. Dynamic simulation of MPPI control without learning on waypoint trajectory (the blue line represents the result from the MPPI control without learning; the red line represents the desired result); RMSE = 0.7124 m.

View Image - Figure 8. Dynamic simulation of MPPI control without learning on figure-eight trajectory (blue line represents the result of the MPPI control without learning; the red line represents the desired result); RMSE = 1.9756 m.

Figure 8. Dynamic simulation of MPPI control without learning on figure-eight trajectory (blue line represents the result of the MPPI control without learning; the red line represents the desired result); RMSE = 1.9756 m.

View Image - Figure 9. Simulation of MPPI control with 2-hidden-layer- network on waypoint trajectory (the blue line represents the result of MPPI control with 2 hidden layers; the red line represents the desired result); RMSE = 0. 7053 m.

Figure 9. Simulation of MPPI control with 2-hidden-layer- network on waypoint trajectory (the blue line represents the result of MPPI control with 2 hidden layers; the red line represents the desired result); RMSE = 0. 7053 m.

View Image - Figure 10. Simulation of MPPI control with 2-hidden-layer network on figure-eight trajectory (the blue line represents the result of MPPI control with 2 hidden layers; the red line represents the desired result); RMSE = 1.3883 m.

Figure 10. Simulation of MPPI control with 2-hidden-layer network on figure-eight trajectory (the blue line represents the result of MPPI control with 2 hidden layers; the red line represents the desired result); RMSE = 1.3883 m.

View Image - Figure 11. Simulation of MPC with MLP on waypoint trajectory (the blue line represents MPC with MLP; the red line represents the desired result); RMSE = 0.9747 m.

Figure 11. Simulation of MPC with MLP on waypoint trajectory (the blue line represents MPC with MLP; the red line represents the desired result); RMSE = 0.9747 m.

View Image - Figure 12. Simulation of MPC with MLP on figure-eight trajectory (the blue line represents the result of MPC with MLP; the red line represents the desired result); RMSE = 1.8973 m.

Figure 12. Simulation of MPC with MLP on figure-eight trajectory (the blue line represents the result of MPC with MLP; the red line represents the desired result); RMSE = 1.8973 m.

View Image - Figure 13. Simulation of MPPI control with MLP on waypoint trajectory (the blue line represents the result of MPPI control with MLP; the red line represents the desired result); RMSE = 0.6515 m.

Figure 13. Simulation of MPPI control with MLP on waypoint trajectory (the blue line represents the result of MPPI control with MLP; the red line represents the desired result); RMSE = 0.6515 m.

View Image - Figure 14. Simulation of MPPI control with MLP on figure-eight trajectory (the blue line represents the results of MPPI control with MLP; the red line represents the desired result); RMSE = 1.1093 m.

Figure 14. Simulation of MPPI control with MLP on figure-eight trajectory (the blue line represents the results of MPPI control with MLP; the red line represents the desired result); RMSE = 1.1093 m.

Table 1

Inputs’ sampling and ranges.

Input	Range	Description
$x, y$	[−10, 10]	X-axis position, Y-axis position, simulates variations in horizontal position
$z$	[−10, 10]	Z-axis position, considers flight requirements below the reference plane
$(v_{x}, v_{y}, v_{z})$	[0, 10]	Velocity
$(ϕ, θ)$	[− $π / 4, π / 4$ ]	Roll and pitch, prevent drone instability
$(ψ)$	[−pi, pi]	Yaw, covers all possible directions
$(r, p, q)$	[−2, 2]	Angular velocity
$(σ_{1}, σ_{2}, σ_{3}, σ_{4})$	[0, 900]	Rotor speed

References

1. Hoffmann, G.; Huang, H.; Waslander, S.; Tomlin, C. Quadrotor Helicopter Flight Dynamics and Control: Theory and experiment. Proceedings of the AIAA Guidance, Navigation, and Control Conference; Hilton Head, SC, USA, 20–23 August 2007; Volume 2.

2. Brookner, E. Tracking and Kalman Filtering Made Easy; John Wiley and Sons, Inc.: New York, NY, USA, 1998.

3. Wan, E.A.; Van Der Merwe, R. The Unscented Kalman Filter for Nonlinear Estimation. Proceedings of the IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000; Lake Louise, AB, Canada, 4 October 2000; pp. 153-158.

4. Mellinger, D.; Michael, N.; Kumar, V. Trajectory generation and control for precise aggressive maneuvers with quadrotors. Int. J. Robot. Res.; 2012; 31, pp. 664-674. [DOI: https://dx.doi.org/10.1177/0278364911434236]

5. Gonzalez, D.; Perez, J.; Milanes, V.; Nashashibi, F. A review of motion planning techniques for automated vehicles. IEEE Trans. Intell. Transp. Syst.; 2015; 17, pp. 1135-1145. [DOI: https://dx.doi.org/10.1109/TITS.2015.2498841]

6. Mohta, K.; Watterson, M.; Mulgaonkar, Y.; Liu, S.; Qu, C.; Makineni, A.; Saulnier, K.; Sun, K.; Zhu, A.; Delmerico, J. et al. Fast, autonomous flight in GPS-denied and cluttered environments. J. Field Robot.; 2018; 35, pp. 101-120. [DOI: https://dx.doi.org/10.1002/rob.21774]

7. Baca, T.; Hert, D.; Loianno, G.; Saska, M.; Kumar, V. Model Predictive Trajectory Tracking and Collision Avoidance for Reliable outdoor Deployment of Unmanned Aerial Vehicles. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems; Madrid, Spain, 1–5 October 2018; pp. 6753-6760.

8. Ryll, M.; Ware, J.; Carter, J.; Roy, N. Efficient Trajectory Planning for High Speed Flight in Unknown Environments. Proceedings of the International Conference on Robotics and Automation; Montreal, QC, Canada, 20–24 May 2019; pp. 732-738.

9. Theodorou, E.; Buchli, J.; Schaal, S. A Generalized Path Integral Approach to Reinforcement Learning. J. Mach. Learn. Res.; 2010; 11, pp. 3137-3181.

10. Kappen, H.J. Linear Theory for Control of Nonlinear Stochastic Systems. Phys. Rev. Lett.; 2005; 95, 200201. [DOI: https://dx.doi.org/10.1103/PhysRevLett.95.200201] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16384034]

11. Theodorou, E.A. Nonlinear stochastic control and information theoretic dualities: Connections interdependencies and thermodynamic interpretations. Entropy; 2015; 17, pp. 3352-3375. [DOI: https://dx.doi.org/10.3390/e17053352]

12. Karatzas, I.; Shreve, S. Brownian Motion and Stochastic Calculus; 2nd ed. Graduate Texts in Mathematics Springer: New York, NY, USA, 1991.

13. Friedman, A. Stochastic Differential Equations and Applications; Academic Press: New York, NY, USA, 1975.

14. Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E.A. Aggressive Driving with Model Predictive Path Integral Control. Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA); Stockholm, Sweden, 16–21 May 2016; IEEE: Piscataway, NJ, USA, 2016.

15. Romero, A.; Sun, S.; Foehn, P.; Scaramuzza, D. Model predictive contouring control for time-optimal quadrotor flight. IEEE Trans. Robot.; 2022; 38, pp. 3340-3356. [DOI: https://dx.doi.org/10.1109/TRO.2022.3173711]

16. Ruchika, N.R. Model predictive control: History and development. Int. J. Eng. Trends Technol. IJETT; 2013; 4, pp. 2600-2602.

17. Gómez, V.; Thijssen, S.; Symington, A.; Hailes, S.; Kappen, H.J. Real-Time Stochastic Optimal Control for Multi-Agent Quadrotor Systems. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS); London, UK, 12–17 June 2016; pp. 468-476.

18. Mohamed, I.S.; Allibert, G.; Martinet, P. Model Predictive Path Integral Control Framework for Partially Observable navigation: A Quadrotor Case Study. Proceedings of the 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV); Shenzhen, China, 13–15 December 2020; IEEE: Piscataway, NJ, USA, 2020.

19. Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E.A. Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving. IEEE Trans. Robot.; 2018; 34, pp. 1603-1622. [DOI: https://dx.doi.org/10.1109/TRO.2018.2865891]

20. Kusumoto, R.; Palmieri, L.; Spies, M.; Csiszar, A.; Arras, K.O. Informed Information Theoretic Model Predictive Control. Proceedings of the International Conference on Robotics and Automation; Montreal, QC, Canada, 20–24 May 2019; pp. 2047-2053.

21. Nicol, C.; Macnab, C.; Ramirez-Serrano, A. Robust adaptive control of a quadrotor helicopter. Mechatronics; 2011; 216, pp. 927-938. [DOI: https://dx.doi.org/10.1016/j.mechatronics.2011.02.007]

22. Raffo, G.V.; Ortega, M.G.; Rubio, F.R. Sliding Mode Control of a Quadrotor Helicopter. Proceedings of the 45th IEEE Conference on Decision and Control; San Diego, CA, USA, 13–15 December 2006; pp. 4957-4962.

23. Raffo, G.V.; Ortega, M.G.; Rubio, F.R. An integral predictive/nonlinear H∞ control structure for a quadrotor helicopter. Automatica; 2010; 46, pp. 29-39. [DOI: https://dx.doi.org/10.1016/j.automatica.2009.10.018]

24. Voos, I. Nonlinear Control of a Quadrotor Micro-UAV Using Feedback-Linearization. Proceedings of the 2009 IEEE International Conference on Mechatronics; Malaga, Spain, 14–17 April 2009; pp. 1-6.

25. Zhaowei, M.; Tianjiang, H.; Lincheng, S.; Weiwei, K.; Boxin, Z.; Kaidi, Y. An Iterative Learning Controller for Quadrotor UAV Path Following at a Constant Altitude. Proceedings of the 2015 34th Chinese Control Conference (CCC); Hangzhou, China, 28–30 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 4406-4411.

26. Hehn, M.; D’Andrea, R. An Iterative Learning Scheme for High Performance, Periodic Quadrocopter Trajectories. Proceedings of the European Control Conference (ECC); Zurich, Switzerland, 17–19 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1799-1804.

27. Bansal, S.; Akametalu, A.K.; Jiang, F.J.; Laine, F.; Tomlin, C.J. Learning Quadrotor Dynamics Using Neural Network for Flight Control. Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC); Las Vegas, NV, USA, 12–14 December 2016; IEEE: Piscataway, NJ, USA, 2016.

28. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature; 2015; 521, pp. 436-444. [DOI: https://dx.doi.org/10.1038/nature14539]

29. Kappen, H.J. Path integrals and symmetry breaking for optimal control theory. J. Stat. Mech. Theory Exp.; 2005; 2005, P11011. [DOI: https://dx.doi.org/10.1088/1742-5468/2005/11/P11011]

30. Williams, G.; Aldrich, A.; Theodorou, E.A. Model predictive path integral control: From theory to parallel computation. J. Guid. Control Dyn.; 2017; 40, pp. 344-357. [DOI: https://dx.doi.org/10.2514/1.G001921]

31. Ruggiero, F.; Cacace, J.; Sadeghian, H.; Lippiello, V. Passivity-based control of VToL UAVs with a momentum-based estimator of external wrench and unmodeled dynamics. Robot. Auton. Syst.; 2015; 72, pp. 139-151. [DOI: https://dx.doi.org/10.1016/j.robot.2015.05.006]

Word count: 6969

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

This paper aims to address the trajectory tracking problem of quadrotors under complex dynamic environments and significant fluctuations in system states. An adaptive trajectory tracking control method is proposed based on an improved Model Predictive Path Integral (MPPI) controller and a Multilayer Perceptron (MLP) neural network. The technique enhances control accuracy and robustness by adjusting control inputs in real time. The Multilayer Perceptron neural network can learn the dynamics of a quadrotor by its state parameter and then the Multilayer Perceptron sends the model to the Model Predictive Path Integral controller. The Model Predictive Path Integral controller uses the model to control the quadcopter following the desired trajectory. Experimental data show that the improved Model Predictive Path Integral–Multilayer Perceptron method reduces the trajectory tracking error by 23.7%, 34.7%, and 10.3% compared to the traditional Model Predictive Path Integral, MPC with MLP, and a two-layer network, respectively. These results demonstrate the potential application of the method in complex environments.

Details

Title

Quadcopter Trajectory Tracking Based on Model Predictive Path Integral Control and Neural Network

Author

Li, Yong; Zhu, Qidan; Elahi, Ahsan

First page

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

2504446X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/drones9010009

ProQuest document ID

3159498298

Quadcopter Trajectory Tracking Based on Model Predictive Path Integral Control and Neural Network

Jump to:

Full Text

1. Introduction

2. Material

3. Methods

3.1. Neural Network

3.1.1. Data Generation

3.1.2. NN Model and Training

3.1.3. Validation

3.2. Improved Model Predictive Path Integral

3.3. Stability Analysis

3.3.1. Neural Network Convergence Analysis

3.3.2. Stability Analysis of the Closed-Loop System

4. Results

4.1. MPPI Control Without Learning Dynamics

4.2. MPPI Control with Learning Dynamics with Two Hidden Layers Connected to Neural Network

4.3. MPC Control with Learning Dynamics by Multilayer Perceptron

4.4. MPPI Control with Learning Dynamics by Multilayer Perceptron

5. Discussion

6. Conclusions

Abstract

Details

Suggested sources