1. Introduction
Recent research on quadcopters has established them as a significant platform for the development of unmanned aerial vehicles (UAVs) owing to their simple construction and maintenance. These quadcopters demonstrate exceptional maneuverability and are capable of hovering, taking off, flying, and landing in confined spaces, owing to their vertical takeoff and landing (VTOL) capability [1]. A quadcopter comprises four rotors positioned at the corners of a cross-shaped frame, and it is controlled by adjusting the rotational speeds of these rotors [2,3]. Quadcopters are employed in various applications such as surveillance, search-and-rescue missions, and hazardous operations in congested and Global Positioning System (GPS)-denied settings, such as dense forests, crowded offices, corridors, and warehouses. To perform these tasks effectively, quadcopters must have robust navigation abilities and a thorough understanding of their operational environment. Substantial research work examined various aspects of quadcopter dynamics and control, motion planning, and trajectory generation in unstructured environments [4,5,6,7,8].
Sampling-based optimal control methods have attracted considerable interest in path planning and trajectory tracking applications. These control techniques function by generating multiple trajectories at each control step through input sampling (referred to as rollouts) and subsequently determining a sub-optimal sequence of control inputs based on the sampled trajectories. This process involves executing a finite time-horizon optimization, utilizing the optimal input for the initial time step and then iteratively repeating the optimization for subsequent time steps. The path integral optimal control framework [9,10,11] provides a mathematical foundation for developing optimal control algorithms through stochastic trajectory sampling. The main principle is based on the transformation of the value function for the optimal control problem using the Feynman–Kac lemma [12,13] into an expectation over all possible trajectories, referred to as a path integral. This transformation enables the resolution of stochastic optimal control problems by a Monte Carlo approximation employing a forward sampling of stochastic diffusion processes. Furthermore, the utilization of path integral control theory has gained prominence in recent research.
Model Predictive Control (MPC) has been used on various robotic systems, such as self-driving cars [14] and quadcopters [15], and it is widely employed in industrial applications [16]. The path integral control framework can be integrated with the Model Predictive Control paradigm. In this scenario, an open-loop control sequence is continuously optimized in the background, whereas the system finds the current optimal estimate of the controller in parallel. One big problem with this approach is that it needs to sample a lot of different trajectories simultaneously, which is challenging when the systems being considered have complex dynamics. A possible solution to this issue involves simplifying the system model through a hierarchical scheme [17]. This employs path integral control to create trajectories for a simplified point mass model that a low-level controller can then follow. Model Predictive Path Integral (MPPI) control was introduced in [18], displaying its ability in aggressive autonomous driving. It explored information-theoretic dualities between free energy and relative entropy, which led to the formation of the Model Predictive Path Integral control logic. Williams developed the iterative path integral method, known as the Model Predictive Path Integral control framework [16]. That method iteratively modified the control sequence to find optimal solutions based on the importance of sampling trajectories. In [19], an alternative iterative method was proposed that removed the control and noise affine constraints inherent in the original Model Predictive Path Integral framework, which could accommodate non-affine dynamics. The central idea behind that extension was to use the information-theoretic interpretation of optimal control through the Kullback–Leibler (KL) divergence and free energy. It was different from the original method, which used a linearization of the Hamilton–Jacobi–Bellman (HJB) equation and applied the Feynman–Kac lemma. Despite these methods coming from different derivations, they were practically equivalent and theoretically linked. A learning model to generate informed sampling distributions was presented in [20], which was an extension of Williams’s information-theoretic-based method. In [15], the generation of time-optimal trajectories through multiple waypoints for quadrotors was investigated.
The MPPI controller has been mainly utilized in aggressive driving and UAV navigation in congested conditions. However, UAV navigation in natural conditions often involves incomplete state information. In environments such as forests and urban canyons, the navigation system cannot directly measure all state variables due to sensor limitations or occlusion problems [1,6]. Although most control schemes depend on the accuracy of the underlying models, specific adaptive control, sliding mode control, and robust control methods can remain effective despite model inaccuracies [21,22,23]. A comprehensive analysis of advanced aerodynamic effects controlling quadrotor flight, such as blade flapping and airflow dynamics, is conducted in [24]. These effects are fundamentally complicated to model, posing considerable challenges in controller design. The standard Model Predictive Path Integral approach may necessitate substantial computational resources or manual parameter adjustment to maintain control performance during dynamic environmental variations or severe system state fluctuations. Therefore, relying exclusively on the Model Predictive Path Integral control mechanism is inadequate for responding to modern complicated missions. Incorporating an adaptive mechanism into the Model Predictive Path Integral framework allows for the dynamic modification of essential parameters according to system states, and control performance enhance the robustness and efficiency of the control strategy. Moreover, to overcome modeling problems, data-driven and learning-based control strategies have been proposed [25]. A promising approach employs neural networks (NNs) to model system dynamics. Neural networks are acknowledged as universal function approximators, capable of modeling highly nonlinear functions and latent states directly from observed data, which may be challenging to model otherwise explicitly [26]. The study in [27] demonstrates that simple feedforward networks can proficiently learn and generalize system dynamics with high accuracy.
This paper proposes an adaptive control method that integrates the enhanced Model Predictive Path Integral (MPPI) method and a Multilayer Perceptron (MLP) neural network, which can dynamically modify essential parameters according to system states or control performance. In experiments, the method achieves a root-mean-square error (RMSE) in trajectory tracking that is more than 10% lower than alternative methods, surpassing the MPPI control without learning dynamics and the Model Predictive Path Integral method employed with a two-layer neural network. The paper is organized as follows: Section 2 explains the dynamic model of quadcopter; Section 3 presents the training and validation of the MLP; and Section 4 proposes the adaptive Model Predictive Path Integral approach, which can adjust its sampling based on trajectory error. Section 5 presents a comparison of Model Predictive Path Integral control without learning, utilizing two hidden layers of neural network (NN), to adaptive Model Predictive Path Integral control employing a Multilayer Perceptron. Experimental results prove that the adaptive Model Predictive Path Integral method utilizing a Multilayer Perceptron outperforms alternative methodologies.
2. Material
In this section, we first present the nonlinear dynamic model of the quadcopter to be used and the associated model uncertainties and external disturbances.
A general discrete-time stochastic nonlinear system is of the form [1,9]:
(1)
where is the state, and is the control input. The term represents a noise disturbance with a normal distribution in the control signals generated in the lower-level controllers for real-world robot systems. The term represents the combined errors due to inaccurate model representations and purely external disturbances, such as wind in the case of quadcopters.Quadcopter Model
Figure 1 shows the quadcopter model, with an inertial reference frame defined as and a body-fixed frame defined as , with the origin defined at the center of the quadcopter. The Euler angles to describe the quadrotor rotations in about the body-fixed axes , , and , are roll , pitch , and yaw , respectively.
The rotation from to with a ZYX transformation is expressed as:
(2)
where for .The body angular velocity represents the angular velocity vector relative to the body frame and is defined as:
(3)
where is the rotation rate around the axis; is the rotation rate around the axis; is the rotation rate around the axis.The Euler angles define the orientation of the quadrotor by rotating from the inertial frame to the body frame. The relationship between the body angular velocity and the Euler angle derivatives is given by:
(4)
where .To derive the transformation matrix , we decompose the rotation in the inertial frame. After applying the sequence of rotations, we can express in terms of as:
(5)
Taking the inverse, we can express the Euler angle rates as:
(6)
where is the required transformation matrix.(7)
The linear motion equations describe the translational motion in space. Using Newton’s second law, we have:
(8)
where m is the mass of the quadrotor; is the position vector of the quadrotor; is the total force acting on the quadrotor.The total force consists of two components: the gravitational force , where is the unit vector pointing upwards, and the thrust force , where the thrust is always directed along the -axis of the body frame. The rotation matrix converts that thrust into the inertial frame. The dynamics of the quadcopter are given by:
(9)
(10)
(11)
(12)
where m is the mass of the quadcopter, is the inertia matrix expressed in , is the position vector, is the velocity vector, is the Euler angle vector, is the body angular rate vector, is the body thrust input generated by the motors, is the torque input to the vehicle in , and is the arm length of the quadrotor. The disturbances and are explicitly included in the quadrotor dynamics equations, affecting both thrust and torque . represents the external force disturbance vector affecting the thrust . represents the external torque disturbance vector affecting the torque . The state of the system is defined as . Each rotor produces a vertical force and moment , where is the angular velocity of rotor. We map the control inputs and to the system inputs as(13)
where is the thrust coefficient, is the moment (torque) coefficient, is the distance from the rotor to the center of mass.We deal with the actuator dynamics by introducing a first-order lag in the control inputs. This accounts for the physical limitations and response times of the quadrotor’s actuators, ensuring that rapid changes in control commands do not lead to unrealistic actuator behaviors. Specifically, the actuator dynamics are represented as follows.
The actuator is modeled as a first-order linear system characterized by its time constant :
(14)
where is the actuator time constant, is the actual control input, and is the commanded control input. Rearranging the equation, we obtain:(15)
This first-order model captures the delayed and smoothed response of the actuators to control commands, reflecting real-world actuator behavior.
By simulating the actuator dynamics, the dataset more accurately reflects real-world quadcopter behavior, including the effects of rotor inertia and drag. In subsequent sections, we simplify the dynamics during control implementation because the neural network has already learned the influence of actuator dynamics. The inclusion of Equation (14) in the data generation process ensures that the learned model accurately captures the effects of actuators on the quadcopter’s overall dynamics.
3. Methods
3.1. Neural Network
This section describes the neural network architecture employed to mimic the quadrotor dynamics. The improvement in computational capacity enables the execution of large-scale batch predictions within minimal timeframes. The adequately trained neural networks can effectively generalize the system’s dynamic equations using supervised learning techniques. Subsequently, employing dynamic equations directly in iterative control schemes has disadvantages, as these methods frequently neglect model uncertainties and external disturbances. These constraints can be mitigated by replacing traditional dynamic models with neural networks, which can be continuously updated online as new data are acquired during quadrotor flights. This adaptability enables the neural networks to adjust changes in the model, hence improving control performance. This section provides a detailed explanation of the data collection process, neural network model design, training procedures, and the resulting training outcomes.
3.1.1. Data Generation
First, we created a dataset of input–output pairs, where the inputs were the current states and control inputs, and the outputs were the corresponding state derivatives. The input to the neural network was 17-dimensional data, and the output was the 12-state derivatives. Multiple input combinations were sampled from uniform distributions within the operational range defined for the quadrotor. The state and their ranges are specified in Table 1.
The sampled variables were not used as inputs to the NN directly for training. For a quadcopter, the thrust produced by each rotor can be represented as:
(16)
where is the thrust coefficient, which represents the efficiency of thrust generation by the rotor, and is the angular velocity of the ith rotor.The total thrust is the sum of the thrusts produced by all four rotors:
(17)
In addition to thrust, each rotor generates roll, pitch, and yaw moments. The moments consist of two parts: (1) roll and pitch moments and (2) yaw moments. These are moments caused by the eccentric position of the rotors. For roll and pitch, the moments are:
(18)
where is the distance from the rotor to the center of mass of the quadcopter.The sign of the moment depends on the arrangement and rotation direction of the rotors.
The yaw moment: this moment is generated by the reaction torques of the rotors and can be expressed as:
(19)
where is the coefficient representing the efficiency of moment generation.Gyroscopic precession effect: differences in the speed of the rotors also lead to a net gyroscopic effect, which affects the system’s dynamic response.
(20)
where is the net effect due to differences in motor speeds, contributing to gyroscopic precession.We used these sampled variables to calculate the following additional inputs to the NN as
(21)
where is the sum of thrusts produced by all four rotors, is the roll torque causing a rotation around the X-axis, is the pitch torque causing a rotation around the Y-axis, is the yaw torque causing rotation around the Z-axis, is the net effect due to differences in motor speeds, contributing to gyroscopic precession.These inputs were passed through the quadrotor dynamics defined in Section 2. The training output for the NN was the state derivatives obtained for each input.
The use of neural networks allowed for the implicit approximation of disturbances in the dynamic model. During training, the neural network employed supervised learning, using input data, which included the effects of disturbances, and output data to learn the dynamic relationships of the entire system and establish an approximate dynamic model:
where is the dynamic model learned by the neural network.By approximating and through the neural network, the design phase of the controller did not need to explicitly introduce stochastic disturbances. Instead, these disturbances were considered as part of the learned deterministic model.
3.1.2. NN Model and Training
This section describes the selection of a Multilayer Perceptron (MLP) neural network architecture to imitate the quadrotor dynamics [27,28]. The network consisted of six hidden layers containing 256, 256, 128, 128, 64, and 64 neurons, respectively, as shown in Figure 2. Every hidden layer employed the Leaky Rectified Linear Unit (LeakyReLU) activation function to incorporate nonlinearity and mitigate the vanishing gradient problem. The training process utilized a dataset consisting of 15,000 samples, processed over 2000 epochs with a batch size of 64. A Graphics Processing Unit (GPU) was employed to accelerate training, facilitating fast computation and supporting a gradual decrease in the loss function through multiple iterations, thus improving the model’s predictive performance. The optimization utilized the Adam optimizer in conjunction with a custom learning rate scheduler to enhance accuracy. The learning rate followed an exponential decay strategy, starting with an initial rate of 0.001 and reducing to 0.0001 over 1000 epochs. That method ensured stable convergence and prevented overshooting during training. The neural network model was developed with TensorFlow 2 and trained on a system comprising an Intel i9-13900 HX processor, 16 GB of RAM, and an NVIDIA GeForce RTX 3060 Ti GPU with 8 GB of memory. The model achieved a Mean Squared Error (MSE) of 0.013 and a Mean Absolute Error (MAE) of 0.047 on the test dataset on completion of training, which indicated high predictive accuracy and robustness.
The LeakyReLU activation function is as follows:
(22)
where is a small positive constant, typically set to or another small value. Specifically, when the input is positive, Leaky ReLU outputs ; when the output is negative, it outputs , retaining a slight negative gradient.The Mean Squared Error was used for the loss function; it can measure the average squared difference between the predicted and actual state derivatives:
(23)
where is the predicted state derivative from the MLP, is the ground-truth state derivative.On completion of training, the network approximated the function , essentially functioning as a surrogate model for the quadrotor dynamics.
3.1.3. Validation
A distinct validation set was used to assess the model’s performance on unseen data after completing training, thereby preventing the model from overfitting and from memorizing the training data. The MSE and MAE outputs on the validation set indicated the model generalization capability. Figure 3 shows the desired value and actual value on validation set. Points situated along the diagonal line y = x in Figure 3 indicate flawless predictions. Figure 4 displays the actual position and predicted position of over time on the validation set. Figure 5 shows the actual angular and predicted angular velocity of over time on the validation set.
3.2. Improved Model Predictive Path Integral
For a discrete-time dynamical system with states , control inputs , error in control input, as external disturbances, and mapping from the current state and input to the next state given by
(24)
a general optimal control problem consists of finding a control policy , a map from the current state to the optimal input, , such that a cost is minimized and is given by(25)
(26)
where is the current state, is the reference state, is the control input, and and are weighting matrices, subject to(27)
(28)
The path integral control method for developing optimal control algorithms is based on stochastic trajectory samplings [29,30]. This method’s benefit is in its avoidance of using derivatives for the system dynamics or the cost function, thus providing robustness in estimating the system dynamics and formulating the cost function. Given a transition model F, a sample size N, number of time steps (horizon) T, and an initial control sequence , the algorithm calculates K rollouts for T time steps for the inputs and evaluates different possible trajectories by running them through the dynamic model F. The cost and weight for each of these rollouts are calculated using the provided cost function. The K weights are used to evaluate the contribution of each rollout to the final optimal input sequence, with only the first input being applied. The remainder of the sequence is used as the initial input for the subsequent iteration, and the process continues. The traditional MPPI method may require considerable computational resources or manual parameter adjustments to maintain control performance in response to dynamic environmental changes or significant swings in system states. Consequently, incorporating an adaptive mechanism into the MPPI control method enables the dynamic adjustment of its essential parameters according to system states or control performance, thereby enhancing the control’s robustness and efficiency.
At each time step t, multiple control sequences were generated by randomly sampling the control inputs. The sampled control input sequences were denoted as follows:
(29)
Each control sequence had the length of the prediction horizon T.
The real-time system error as the distance between the current system state and the target state was defined as:
(30)
The Euclidean distance was used to calculate the system error as follows:
(31)
Based on the real-time error , we dynamically adjusted the sampling number N and the prediction horizon T:
(32)
(33)
where and are the baseline sampling number and baseline prediction horizon used when the error is small. and are gain coefficients that control the responsiveness of the sampling number and prediction horizon to the error.When the error increased, the sampling number N was increased to enhance sampling density and control precision. Conversely, when the error decreased, the sampling number was reduced to save computational resources. Similarly, the prediction horizon T was dynamically adjusted based on the error magnitude. An increased error led to a longer prediction horizon, allowing the controller to consider state changes over a more extended period. A decreased error shortened the prediction horizon, improving computational efficiency.
The cumulative cost for each trajectory was calculated as follows:
(34)
The weight for each trajectory was calculated as follows:
(35)
Subsequently, the current control input using a weighted average of all sampled control inputs was updated as follows:
(36)
After applying the updated control input to the system, we moved to the next time step t + 1 and repeated the above steps.
The MLP and improved MPPI control block diagram is shown in Figure 6:
Flight data and dynamic inputs: Provide a realistic dynamic description of the system, serving as the true data source in the simulation. These dynamics describe the motion behavior of the quadrotor UAV under external influences, including position, velocity, angular velocity, noise, and disturbance.
Multilayer Perceptron: The MLP is used to learn the system dynamics, effectively replacing traditional dynamic models to some extent. This approach is particularly effective in complex dynamic environments or under external disturbances. The neural network’s inputs are the current system state and control inputs, and the output is the derivative of the system state.
Improved MPPI controller: Based on the path integral method, the controller generates multiple possible trajectories through stochastic sampling and selects the optimal control input from them. By adaptively adjusting the sampling number and prediction horizon, the improved MPPI controller can better handle dynamic changes in the environment. When the error increases, the sampling number is increased to improve control accuracy; when the error decreases, the sampling number is reduced to save computational resources.
Trajectory output and control adjustment: The MPPI controller dynamically adjusts the control input based on real-time state errors, generating the trajectory output for the UAV. When there is an error between the actual trajectory and the target trajectory, the control adjustment module provides real-time feedback on this error and optimizes the process by dynamically adjusting the sampling number and prediction horizon, thereby enhancing control performance.
3.3. Stability Analysis
3.3.1. Neural Network Convergence Analysis
According to the universal approximation theorem, neural networks with appropriate structures and sufficient training data can approximate any continuous function with arbitrary precision. Therefore, we assumed that the MLP neural network satisfied the following after training:
(37)
where is the neural network-approximated dynamics model, and is the true dynamics model; is a small positive constant, and D is the system’s operating space.During the training process, the neural network reduced the approximation error iteratively through learning, ultimately keeping it within a small bounded range. Through learning mechanisms and Figure 3, Figure 4 and Figure 5, it ensured that the approximation error always remained within a controllable range. The above analysis demonstrates that the neural network reliably converged to an accurate approximation of the quadcopter dynamics.
3.3.2. Stability Analysis of the Closed-Loop System
To prove the stability of the closed-loop system, we adopted a control framework based on passivity, following the method proposed by Ruggiero et al. [31]. The passivity-based control method is suitable for nonlinear systems and effectively handles system uncertainties and external disturbances.
Choose the control law as:
(38)
where is the control input obtained from the Model Predictive Path Integral (MPPI) controller, and is the gain matrix designed to stabilize the tracking error, is the error.Define a Lyapunov function V to evaluate the stability of the system. For the quadrotor system, we selected the following candidate Lyapunov function:
(39)
where is the tracking error of the system, and P is a symmetric positive definite matrix.Compute the time derivative of the Lyapunov function:
(40)
Based on the system dynamics:
(41)
where represents all external disturbances and model uncertainties.Based on and Equation (40), we obtained:
(42)
The neural network approximation becomes:(43)
Thus,(44)
Assuming that the MPPI controller ensures that in the absence of errors and disturbances:(45)
we can linearize the tracking error dynamics around :(46)
Substituting into :(47)
where and are the Jacobian matrices of the system dynamics with respect to and , respectively.Under the passivity-based control framework, the control law and Lyapunov function parameters are designed such that the Lyapunov derivative satisfies:
(48)
where are design parameters.To ensure stability, we require the following:
(49)
To achieve this, the matrix and the control gain matrix are selected by solving the Lyapunov equation.
Choose and so that they satisfy the following equation:
(50)
where is a positive definite matrix. This condition ensures that the linear part of is negative definite:(51)
where is the smallest eigenvalue of .Apply the Cauchy–Schwarz inequality to the remaining terms:
(52)
To maintain , we require the following:(53)
To satisfy the above inequality, we can bound the terms involving by ensuring that
(54)
Dividing both sides by (assuming ):(55)
This condition holds if:(56)
When the tracking error is sufficiently large, . For smaller errors, the bounded approximation error and disturbances are dominated by the negative definite term, ensuring that overall.
From the above analysis, we conclude that the tracking error converges to a bounded region around zero, with the bound determined by the approximation error and disturbances . The tracking error remains bounded all the time, ensuring robust performance despite disturbances and model uncertainties. The control law satisfies the passivity condition, preventing energy injection that could destabilize the system.
Therefore, the closed-loop quadrotor system under the proposed MPPI controller and MLP neural network approximation achieves global asymptotic stability, ensuring reliable trajectory tracking in the presence of disturbances and model uncertainties.
4. Results
The neural network was trained using TensorFlow 2 on Google Colab, utilizing local system resources. Drone simulations were conducted in MATLAB, leveraging the MATLAB Deep Learning Toolbox, which facilitates the import and utilization of pre-trained TensorFlow models. We implemented three control methods: (1) Model Predictive Path Integral (MPPI) control without learning dynamics, (2) MPPI control with dynamics learned by a simple fully connected neural network, (3) MPC with dynamics learned by a Multilayer Perceptron (MLP) and (4) MPPI control with dynamics learned by a Multilayer Perceptron (MLP). Simulations were performed across two distinct trajectories: (1) waypoint navigation and (2) tracking a figure-eight trajectory. In all simulations, the plant was integrated at a frequency of 100 Hz (every 0.01 s), and the controller operated at 20 Hz (every 0.05 s). The initial position of the quadcopter was set to [0,10] with an initial control input of [620.6108, 620.6108, 620.6108, 620.6108]. For the waypoint navigation task, the desired position was [3,10]. The figure-eight trajectory was defined with a scale factor of 2, and an additional spiral path was utilized to evaluate the algorithm’s performance further. All algorithms were evaluated using a comprehensive cost function that penalized deviations in position tracking, velocity magnitudes, angular rotations, deviations of control signals from the nominal hovering control (indirectly reflecting control effort), and changes in control signals. A critical penalty was imposed if the sampled trajectory descended below z = 0 (below the reference ground level). The Root-Mean-Square Error (RMSE) metric was employed to assess the experimental results across the three trajectory tasks:
4.1. MPPI Control Without Learning Dynamics
(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.12 s to execute. The program execution time was 618.76 s. The final tracking error was 0.1426 m. The results are shown in Figure 7a–f. The RMSE was 0.7124 m.
(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.21 s to execute. The program execution time was 938.49 s. The results are shown in Figure 8a–f. The RMSE was 1.9756 m.
4.2. MPPI Control with Learning Dynamics with Two Hidden Layers Connected to Neural Network
(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.48 s to execute. The program execution time was 723.15 s. The results are shown in Figure 9a–f. The RMSE was 0.7053 m.
(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.5 s to execute. The program execution time was 986.37 s. The results are shown in Figure 10a–f. The RMSE was 1.3883 m.
4.3. MPC Control with Learning Dynamics by Multilayer Perceptron
(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 1.6 s to execute. The program execution time was 650.25 s. The results are shown in Figure 11a–f. The RMSE was 0.9747 m.
(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.53 s to execute. The simulation time was 30 s, and the actual program execution time was 1253.27 s. The results are shown in Figure 12a–f. The RMSE was 1.8973 m.
4.4. MPPI Control with Learning Dynamics by Multilayer Perceptron
(1) Waypoint navigation: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 1.8 s to execute. The program execution time was 819.68 s. The results are shown in Figure 13a–f. The RMSE was 0.6515 m.
(2) Figure-eight trajectory tracking: The number of rollouts per step was 800, with a prediction horizon of 25. At each step, the controller took an average of 0.45 s to execute. The simulation time was 30 s, and the actual program execution time was 1142.27 s. The results for the same are shown in Figure 14a–f. The RMSE was 1.1093 m.
5. Discussion
Although existing MPPI methods perform well in path planning and trajectory tracking, they typically require high computational resources and manual parameter adjustments to maintain control performance. In this study, we proposed an improved MPPI controller that incorporates an adaptive mechanism based on system errors, allowing a dynamic adjustment of the sampling number and prediction horizon, thereby improving the controller’s robustness and efficiency in complex dynamic environments.
This paper used a multilayer perceptron (MLP) neural network to learn the dynamics of the quadcopter and combined it with the MPPI controller. Compared to other control methods based on traditional dynamic models, our approach can handle modeling inaccuracies and external disturbances by learning the system’s nonlinear dynamic behavior. Compared to classical model-based approaches, this data-driven control method better adapts to real-world uncertainties.
Unlike other studies that use traditional Model Predictive Control (MPC) and robust control methods, our research continuously learns the system dynamics using a neural network and introduces an adaptive adjustment mechanism, enabling the system to maintain good performance even in the face of dynamic environmental changes and partial observability. This characteristic is particularly suitable for UAVs operating in unknown or dynamic environments, such as forests and urban canyons.
6. Conclusions
This paper presented an approach that integrated MPPI control and an MLP. The improved MPPI method could adjust the sampling quantity according to trajectory error to achieve a better trajectory. The MLP neural network could learn the dynamics of a quadrotor by its state parameter and model the quadcopter. The improved MPPI employed the model to guide and control the quadcopter along the desired trajectory. Three trajectories were employed to test the effectiveness of the proposed method. The results were compared with MPPI control without learning, MPPI control with two hidden layers, MPC with MLP, and improved MPPI control utilizing MLP. The experimental results showed that improved MPPI control with MLP had better performance than MPPI control without learning and MPPI control with two hidden layers. In the future, reinforcement learning methods could be integrated, allowing the neural network to automatically optimize control parameters based on flight variations, thereby improving adaptability and resilience to external disturbances. We could extend the improved MPPI method to not only perform single-objective trajectory tracking but also to execute multi-objective path planning and collision avoidance tasks in complex environments. Future work could incorporate graph search algorithms and global path optimization techniques to improve navigation efficiency and safety.
Conceptualization, Y.L. and Q.Z.; methodology, Y.L.; software, Y.L.; validation, Y.L. and A.E.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, A.E.; visualization, Y.L.; supervision, Q.Z. All authors have read and agreed to the published version of the manuscript.
All data are available within the paper.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 4. Actual position and predicted position of [Forumla omitted. See PDF.] over time on the validation set.
Figure 5. Actual angular and predicted angular velocity of [Forumla omitted. See PDF.] over time on the validation set.
Figure 7. Dynamic simulation of MPPI control without learning on waypoint trajectory (the blue line represents the result from the MPPI control without learning; the red line represents the desired result); RMSE = 0.7124 m.
Figure 8. Dynamic simulation of MPPI control without learning on figure-eight trajectory (blue line represents the result of the MPPI control without learning; the red line represents the desired result); RMSE = 1.9756 m.
Figure 9. Simulation of MPPI control with 2-hidden-layer- network on waypoint trajectory (the blue line represents the result of MPPI control with 2 hidden layers; the red line represents the desired result); RMSE = 0. 7053 m.
Figure 10. Simulation of MPPI control with 2-hidden-layer network on figure-eight trajectory (the blue line represents the result of MPPI control with 2 hidden layers; the red line represents the desired result); RMSE = 1.3883 m.
Figure 11. Simulation of MPC with MLP on waypoint trajectory (the blue line represents MPC with MLP; the red line represents the desired result); RMSE = 0.9747 m.
Figure 12. Simulation of MPC with MLP on figure-eight trajectory (the blue line represents the result of MPC with MLP; the red line represents the desired result); RMSE = 1.8973 m.
Figure 13. Simulation of MPPI control with MLP on waypoint trajectory (the blue line represents the result of MPPI control with MLP; the red line represents the desired result); RMSE = 0.6515 m.
Figure 14. Simulation of MPPI control with MLP on figure-eight trajectory (the blue line represents the results of MPPI control with MLP; the red line represents the desired result); RMSE = 1.1093 m.
Inputs’ sampling and ranges.
Input | Range | Description |
---|---|---|
| [−10, 10] | X-axis position, Y-axis position, simulates variations in horizontal position |
| [−10, 10] | Z-axis position, considers flight requirements below the reference plane |
| [0, 10] | Velocity |
| [− | Roll and pitch, prevent drone instability |
| [−pi, pi] | Yaw, covers all possible directions |
| [−2, 2] | Angular velocity |
| [0, 900] | Rotor speed |
References
1. Hoffmann, G.; Huang, H.; Waslander, S.; Tomlin, C. Quadrotor Helicopter Flight Dynamics and Control: Theory and experiment. Proceedings of the AIAA Guidance, Navigation, and Control Conference; Hilton Head, SC, USA, 20–23 August 2007; Volume 2.
2. Brookner, E. Tracking and Kalman Filtering Made Easy; John Wiley and Sons, Inc.: New York, NY, USA, 1998.
3. Wan, E.A.; Van Der Merwe, R. The Unscented Kalman Filter for Nonlinear Estimation. Proceedings of the IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000; Lake Louise, AB, Canada, 4 October 2000; pp. 153-158.
4. Mellinger, D.; Michael, N.; Kumar, V. Trajectory generation and control for precise aggressive maneuvers with quadrotors. Int. J. Robot. Res.; 2012; 31, pp. 664-674. [DOI: https://dx.doi.org/10.1177/0278364911434236]
5. Gonzalez, D.; Perez, J.; Milanes, V.; Nashashibi, F. A review of motion planning techniques for automated vehicles. IEEE Trans. Intell. Transp. Syst.; 2015; 17, pp. 1135-1145. [DOI: https://dx.doi.org/10.1109/TITS.2015.2498841]
6. Mohta, K.; Watterson, M.; Mulgaonkar, Y.; Liu, S.; Qu, C.; Makineni, A.; Saulnier, K.; Sun, K.; Zhu, A.; Delmerico, J. et al. Fast, autonomous flight in GPS-denied and cluttered environments. J. Field Robot.; 2018; 35, pp. 101-120. [DOI: https://dx.doi.org/10.1002/rob.21774]
7. Baca, T.; Hert, D.; Loianno, G.; Saska, M.; Kumar, V. Model Predictive Trajectory Tracking and Collision Avoidance for Reliable outdoor Deployment of Unmanned Aerial Vehicles. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems; Madrid, Spain, 1–5 October 2018; pp. 6753-6760.
8. Ryll, M.; Ware, J.; Carter, J.; Roy, N. Efficient Trajectory Planning for High Speed Flight in Unknown Environments. Proceedings of the International Conference on Robotics and Automation; Montreal, QC, Canada, 20–24 May 2019; pp. 732-738.
9. Theodorou, E.; Buchli, J.; Schaal, S. A Generalized Path Integral Approach to Reinforcement Learning. J. Mach. Learn. Res.; 2010; 11, pp. 3137-3181.
10. Kappen, H.J. Linear Theory for Control of Nonlinear Stochastic Systems. Phys. Rev. Lett.; 2005; 95, 200201. [DOI: https://dx.doi.org/10.1103/PhysRevLett.95.200201] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16384034]
11. Theodorou, E.A. Nonlinear stochastic control and information theoretic dualities: Connections interdependencies and thermodynamic interpretations. Entropy; 2015; 17, pp. 3352-3375. [DOI: https://dx.doi.org/10.3390/e17053352]
12. Karatzas, I.; Shreve, S. Brownian Motion and Stochastic Calculus; 2nd ed. Graduate Texts in Mathematics Springer: New York, NY, USA, 1991.
13. Friedman, A. Stochastic Differential Equations and Applications; Academic Press: New York, NY, USA, 1975.
14. Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E.A. Aggressive Driving with Model Predictive Path Integral Control. Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA); Stockholm, Sweden, 16–21 May 2016; IEEE: Piscataway, NJ, USA, 2016.
15. Romero, A.; Sun, S.; Foehn, P.; Scaramuzza, D. Model predictive contouring control for time-optimal quadrotor flight. IEEE Trans. Robot.; 2022; 38, pp. 3340-3356. [DOI: https://dx.doi.org/10.1109/TRO.2022.3173711]
16. Ruchika, N.R. Model predictive control: History and development. Int. J. Eng. Trends Technol. IJETT; 2013; 4, pp. 2600-2602.
17. Gómez, V.; Thijssen, S.; Symington, A.; Hailes, S.; Kappen, H.J. Real-Time Stochastic Optimal Control for Multi-Agent Quadrotor Systems. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS); London, UK, 12–17 June 2016; pp. 468-476.
18. Mohamed, I.S.; Allibert, G.; Martinet, P. Model Predictive Path Integral Control Framework for Partially Observable navigation: A Quadrotor Case Study. Proceedings of the 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV); Shenzhen, China, 13–15 December 2020; IEEE: Piscataway, NJ, USA, 2020.
19. Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E.A. Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving. IEEE Trans. Robot.; 2018; 34, pp. 1603-1622. [DOI: https://dx.doi.org/10.1109/TRO.2018.2865891]
20. Kusumoto, R.; Palmieri, L.; Spies, M.; Csiszar, A.; Arras, K.O. Informed Information Theoretic Model Predictive Control. Proceedings of the International Conference on Robotics and Automation; Montreal, QC, Canada, 20–24 May 2019; pp. 2047-2053.
21. Nicol, C.; Macnab, C.; Ramirez-Serrano, A. Robust adaptive control of a quadrotor helicopter. Mechatronics; 2011; 216, pp. 927-938. [DOI: https://dx.doi.org/10.1016/j.mechatronics.2011.02.007]
22. Raffo, G.V.; Ortega, M.G.; Rubio, F.R. Sliding Mode Control of a Quadrotor Helicopter. Proceedings of the 45th IEEE Conference on Decision and Control; San Diego, CA, USA, 13–15 December 2006; pp. 4957-4962.
23. Raffo, G.V.; Ortega, M.G.; Rubio, F.R. An integral predictive/nonlinear H∞ control structure for a quadrotor helicopter. Automatica; 2010; 46, pp. 29-39. [DOI: https://dx.doi.org/10.1016/j.automatica.2009.10.018]
24. Voos, I. Nonlinear Control of a Quadrotor Micro-UAV Using Feedback-Linearization. Proceedings of the 2009 IEEE International Conference on Mechatronics; Malaga, Spain, 14–17 April 2009; pp. 1-6.
25. Zhaowei, M.; Tianjiang, H.; Lincheng, S.; Weiwei, K.; Boxin, Z.; Kaidi, Y. An Iterative Learning Controller for Quadrotor UAV Path Following at a Constant Altitude. Proceedings of the 2015 34th Chinese Control Conference (CCC); Hangzhou, China, 28–30 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 4406-4411.
26. Hehn, M.; D’Andrea, R. An Iterative Learning Scheme for High Performance, Periodic Quadrocopter Trajectories. Proceedings of the European Control Conference (ECC); Zurich, Switzerland, 17–19 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1799-1804.
27. Bansal, S.; Akametalu, A.K.; Jiang, F.J.; Laine, F.; Tomlin, C.J. Learning Quadrotor Dynamics Using Neural Network for Flight Control. Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC); Las Vegas, NV, USA, 12–14 December 2016; IEEE: Piscataway, NJ, USA, 2016.
28. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature; 2015; 521, pp. 436-444. [DOI: https://dx.doi.org/10.1038/nature14539]
29. Kappen, H.J. Path integrals and symmetry breaking for optimal control theory. J. Stat. Mech. Theory Exp.; 2005; 2005, P11011. [DOI: https://dx.doi.org/10.1088/1742-5468/2005/11/P11011]
30. Williams, G.; Aldrich, A.; Theodorou, E.A. Model predictive path integral control: From theory to parallel computation. J. Guid. Control Dyn.; 2017; 40, pp. 344-357. [DOI: https://dx.doi.org/10.2514/1.G001921]
31. Ruggiero, F.; Cacace, J.; Sadeghian, H.; Lippiello, V. Passivity-based control of VToL UAVs with a momentum-based estimator of external wrench and unmodeled dynamics. Robot. Auton. Syst.; 2015; 72, pp. 139-151. [DOI: https://dx.doi.org/10.1016/j.robot.2015.05.006]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
This paper aims to address the trajectory tracking problem of quadrotors under complex dynamic environments and significant fluctuations in system states. An adaptive trajectory tracking control method is proposed based on an improved Model Predictive Path Integral (MPPI) controller and a Multilayer Perceptron (MLP) neural network. The technique enhances control accuracy and robustness by adjusting control inputs in real time. The Multilayer Perceptron neural network can learn the dynamics of a quadrotor by its state parameter and then the Multilayer Perceptron sends the model to the Model Predictive Path Integral controller. The Model Predictive Path Integral controller uses the model to control the quadcopter following the desired trajectory. Experimental data show that the improved Model Predictive Path Integral–Multilayer Perceptron method reduces the trajectory tracking error by 23.7%, 34.7%, and 10.3% compared to the traditional Model Predictive Path Integral, MPC with MLP, and a two-layer network, respectively. These results demonstrate the potential application of the method in complex environments.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer