Applying SLAM Algorithm Based on Nonlinear

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

With the rapid development of China’s Intelligent Electric Power Grid (IEPG) system and the proposal of the concept of a smart grid, the IEPG inspection mode has attracted extensive attention. People have favored the IEPG inspection robot to ensure the safe operation of the power system [1]. In particular, the IEPG inspection robot can make up for the low efficiency of manual inspection. It features solid environment robustness and low safety risk and improves power grid operation reliability. The IEPG inspection robot realizes inspection by leaning upon its independent navigation and positioning functions [2]. Therefore, the accurate positioning, strong robustness, and strong environmental adaptability of the IEPG inspection robot are of great significance to the power system [3]. Improving the IEPG inspection robot’s these features has become the focus of relevant research. Simultaneous Localization and Mapping (SLAM) is an important technology to realize intelligent robots’ self-active navigation and positioning in unfamiliar environments. It is an essential part of environmental perception [4]. SLAM uses its vision sensor to collect image sequences and obtain rich external scene information similar to that observed by human eyes. With the continuous innovation of vision sensors, the vision-based SLAM method has become a research hotspot [5].

However, the vision-based SLAM system has some disadvantages, such as lack of texture information, greatly affected by external light, inability to work normally in high dynamic light or dark environments, and fast speed-triggered motion blur and motion estimation failure [6]. Affected by the detection and tracking of visual front-end feature points, the visual SLAM system relies heavily on the quality of the image. A single-scene environment without rich features will generate a blurred image when the carrier moves too fast. Sometimes, two frames will be overlapped substantially. As a result, the feature matching fails, and the depth information of the pixel cannot be obtained, resulting in the lack of absolute scale, scale drift, and other problems [7]. These problems can be solved by fusing the vision sensor with other sensor information. For example, there is a complementary relationship between the IMU (Inertial Measurement Unit) and the monocular camera. The absolute scale and gravity direction during the carrier movement can be obtained by fusing IMU data. In particular, IMU can provide relatively accurate attitude estimation in a short time in the case of rapid carrier movement [8]. IMU can help the monocular camera determine the scale information and estimate the short-time attitude when the camera is blocked. Thus, it improves the positioning accuracy of the navigation system [9]. There are still some problems in fusing IMU with the monocular camera, such as over-reliance on the hardware design, off-line calibration of sensor parameters by the toolbox, complicated initialization, and a large amount of calculation, low precision, and low robustness on low-cost sensors [10]. Therefore, it is crucial to further optimize the monocular camera and IMU system. There are many kinds of research on IEPG inspection robots. Chen et al. (2020) took Daxing International Airport as an example. They studied the IEPG inspection robot and proposed the implementation scheme of installing the IEPG inspection robot in the switching station of Daxing International Airport [11]. Zhong et al. (2021) used Genetic Algorithm (GA) to determine the running image of the IEPG inspection robot and calculate the offset angle by analyzing the scale characteristics. They aimed to study the optimization method of job scheduling [12]. Zhang et al. (2022) designed an IEPG inspection system for power line Unmanned Aerial Vehicle (UAV) by using robot trajectory tracking technology [13]. In terms of IMU localization research, Poulose and Han (2019) proposed a smartphone camera-based localization system and a hybrid localization system, to minimize the localization error in smartphone cameras and IMU-based localization systems, and analyzed the performance of the system [14]. Poulose et al. (2019) proposed a position estimation algorithm that estimates pedestrian heading position using a combination of features from the accelerometer, magnetometer, and gyroscope data from IMU sensors. Pitch and roll values and gyroscope sensor values are estimated based on the combined function of the accelerometer, magnetometer, and gyroscope data from the IMU position sensor and fusion of the accelerometer [15]. On account of the SLAM, Jia et al. (2022) studied and proposed the mapping and positioning methods of inspection robots, respectively and further proposed an inspection control method based on iterative learning control (Jia et al., 2022) [16].

Firstly, the monocular camera projection model is detailed. Secondly, the IMU sensor measurement model and preintegration are described in detail. Finally, the image data preprocessing method is elaborated, and the related theory of nonlinear optimization of the SLAM system is introduced. On the basis of the above-given theory, the nonlinear optimization process of SLAM based on the Gauss–Newton Method (GNM) is formed, and the structure of the SLAM algorithm integrating vision and IMU under nonlinear optimization is constructed. The resulting test of the tight-coupling VINS algorithm is studied, the influence of loop detection on positioning accuracy is analyzed, and the test results of attitude estimation under rotation conditions are explored in detail. The designed algorithm is compared with the Root Mean Square Error (RMSE) obtained by other algorithms. The main contribution of the research is to improve and optimize the original SLAM system and establish a nonlinear optimization process based on the GNM to solve SLAM. And, the structure of the SLAM algorithm integrating vision and IMU under nonlinear optimization is constructed. The design aims to improve the estimation accuracy, attitude positioning accuracy, computational speed, and rapid deployment capability of the monocular VINS of the power inspection robot, and it is expected to provide a technical reference for the improvement of the working performance of the power inspection robot.

2. Research Theory and Method Design

The overall frame structure is designed, and the overall structure is presented in Figure 1.

[figure(s) omitted; refer to PDF]

In Figure 1, on the basis of the monocular camera projection model, IMU sensor measurement model and preintegration, image data preprocessing method, and the related theory of nonlinear optimization of the SLAM system, the GNM-based nonlinear optimization process of SLAM is constructed, and the SLAM algorithm structure of vision and IMU fusion under nonlinear optimization is further formed. Then, the experimental test and data processing are carried out, and the detailed analysis is performed from four aspects. They are the resulting test of the tight-coupling VINS algorithm, the influence of the loop detection on the positioning accuracy, the test of attitude estimation under the rotation condition, and the error analysis of different algorithms, respectively.

2.1. The Monocular Camera Projection Model

There are many kinds of vision sensors: monocular camera, binocular camera, depth camera, and panoramic camera. The latter three have significant advantages and disadvantages. The monocular camera only needs one camera, which is low cost, small volume, and high compatibility with application scenes [17]. The monocular camera and IMU are the two most important sensors in the VINS. IMU is responsible for measuring the inertial information of the carrier, establishing the connection between visual observation and the inertial coordinate system, and the camera samples the environmental information [18]. A monocular camera connects Three-Dimensional (3D) (spatial) coordinate points with Two-Dimensional (2D) (plane) coordinate points through projection and pixelation. Monocular Camera Projection Mapping mainly involves a world coordinate system, camera coordinate system, image coordinate system, and pixel coordinate system [19]. The world coordinate system is the coordinate system fixed on the Earth’s surface. The camera coordinate system refers to the coordinate system fixed on the camera and moving with the movement of the camera. The Pinhole Camera Model (PCM) is widely used in the coordinate system. In this work, the PCM is used to solve the internal parameters of the camera [20]. Figure 2 shows the projection principle of PCM.

[figure(s) omitted; refer to PDF]

In Figure 2, O means the camera’s optical center, which is the pinhole of the PCM. $O - X - Y - Z$ represents the camera coordinate system. A point Q in space, projected through hole O, falls on $O^{'} - X^{'} - Y^{'} - Z^{'}$ to form the imaging point $Q^{'}$ . Then, the coordinate of point Q is ${X; Y; Z}^{T}$ . The coordinate of $Q^{'}$ is ${X^{'}; Y^{'}; Z^{'}}^{T}$ . The focal length is denoted by $f$ . According to the triangle principle, the following equation holds: $\begin{matrix} (1) & \frac{Z}{f} = \frac{X}{X^{'}} = \frac{Y}{Y^{'}} . \end{matrix}$

Finally, the values of $X^{'}$ and $Y^{'}$ are obtained: $\begin{matrix} (2) & \begin{matrix} X^{'} = f \frac{X}{Z}, & Y^{'} = f \frac{Y}{Z} \end{matrix} . \end{matrix}$

Equation (2) describes the relationship between spatial point Q and imaging. It is converted to a pixel plane, defines the pixel coordinates, and scales them on the $u$ -axis and $v$ -axis, respectively, $α$ times and $β$ times. Then, the origin translation amount is ${c_{x}, c_{y}}^{T}$ . The relationship between point Q coordinates and pixel coordinates is illustrated in the following equation: $\begin{matrix} (3) & \begin{cases} u = α X^{'} + c_{x} \\ v = β Y^{'} + c_{y} . \end{cases} \end{matrix}$

Here, $α f = f_{x}$ , and $β f = f_{y}$ . Then, thefollowing equation can be obtained: $\begin{matrix} (4) & \begin{cases} u = f_{x} \frac{X}{Z} + c_{x} \\ v = f_{y} \frac{X}{Z} + c_{y} . \end{cases} \end{matrix}$

Here, $f$ is calculated by mm. $α$ and $β$ are counted by pixel/mm and are converted to matrix form. Then, thefollowing equation is obtained: $\begin{matrix} (5) & \begin{matrix} u \\ v \\ 1 \end{matrix} = \frac{1}{Z} \begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix} \begin{matrix} X \\ Y \\ Z \end{matrix} ≜ \frac{1}{Z} KP . \end{matrix}$

Lie groups and Lie algebra representations are commonly used in visual navigation to describe the camera position and map point coordinates. The pose of the robot in 3D space includes a total of six degrees of freedom: rotation with three degrees of freedom and translation with three degrees of freedom. It can be assumed that the robot coordinate system is set to $R$ , and the world coordinate system $w$ is used as the benchmark, it can be expressed as follows: $\begin{matrix} (6) & p_{R}^{W} = \begin{matrix} p_{x} \\ p_{y} \\ p_{z} \end{matrix} . \end{matrix}$

In the same coordinate system, the translation of the robot from $t$ to $t + 1$ is recorded as follows: $\begin{matrix} (7) & t_{t + 1}^{t} = \begin{matrix} t_{x} \\ t_{y} \\ t_{z} \end{matrix} . \end{matrix}$

The translation process is expressed as follows: $\begin{matrix} (8) & p_{R_{t + 1}}^{W} = p_{R_{t}}^{W} + t_{t + 1}^{k}, \end{matrix}$ $p_{R_{t}}^{W}$ and $p_{R_{t + 1}}^{W}$ are the positions of the robot at time $t$ and $t + 1$ in the world coordinate system.

2.2. IMU Sensor Measurement Model and Preintegration

The first task of the visual SLAM front-end is to extract and track feature points from image frames based on the model imaged by the camera (Danping et al., 2019) [21]. The main components of the IMU are accelerometers and gyroscopes, which mainly measure the acceleration and angular velocity of the moving carrier and estimate the pose of the carrier (Ashry et al., 2020) [22]. Considering the bias and noise in the measurement process, the IMU ignores the influence of the Earth’s rotation, and the measurement output of the 3D space-based M-IMU can be simply modeled, which is defined as follows: $\begin{matrix} (9) & B {\tilde{ω}}_{WB} t = B ω_{WB} t + b^{g} t + η^{g} t, \\ (10) & B \tilde{a} t = R_{BW} t W a t - W g + b^{a} t + η^{a} t . \end{matrix}$

$B$ represents the IMU coordinate system; $W$ refers to the world coordinate system, respectively; $g$ and $a$ are the gyroscope and accelerometer, respectively. $b^{g} t$ and $b^{a} t$ denote the zero bias of the two sensors. $η^{g} t$ and $η^{a} t$ signify the measurement noise of two sensors. $B ω_{W B} t$ is the actual value of the angular velocity of the carrier system. $R_{B W} t$ means the rotation matrix of the $B$ system and $W$ system. $W a t$ stands for the acceleration of the carrier in the world coordinate system. $W g$ signifies the gravitational acceleration of the carrier in the world coordinate system.

The noise of the accelerometer and gyroscope follows the Gaussian distribution. The specific expression is as follows: $\begin{matrix} (11) & \begin{matrix} E η^{a} t = 0 \\ E η^{g} t = 0 \end{matrix}, \\ (12) & \begin{matrix} E η^{a} t_{1} η^{a} t_{2} = σ_{a}^{2} δ t_{1} - t_{2} \\ E η^{g} t_{1} η^{g} t_{2} = σ_{g}^{2} δ t_{1} - t_{2} \end{matrix} . \end{matrix}$

$σ_{a}$ and $σ_{g}$ represent the noise intensity of the accelerometer and gyroscope, and the random walk process of the two is modeled as Brownian motion, it can be obtained as follows: $\begin{matrix} (13) & {\dot{b}}^{g} t = η^{b g}, {\dot{b}}^{a} t = η^{b a} . \end{matrix}$

$η^{b g}$ and $η^{b a}$ are white Gaussian noise with zero mean. Through the above-given modeling, the relationship between two image frames $t_{i}, t_{j}$ can be obtained, and the specific expression is as follows: $\begin{matrix} (14) & \begin{matrix} {\dot{b}}^{g} t_{j} = {\dot{b}}^{g} t_{i} + η^{b g d} \\ {\dot{b}}^{a} t_{j} = {\dot{b}}^{a} t_{i} + η^{b a d} \end{matrix} . \end{matrix}$

$η^{b g d}$ and $η^{b a d}$ indicate zero mean. The covariance is the white noise of the following equation: $\begin{matrix} (15) & \sum^{b g d} = Δ t_{i j} cov η^{b g}, \sum^{b g d} = Δ t_{i j} cov η^{b a} . \end{matrix}$

The acquisition frequency of IMU is 200 Hz, which is much higher than the frame rate of the image collected by the camera. Therefore, the IMU data must be preprocessed. The IMU data between two adjacent keyframes can be combined into a composite term to constrain the two visual keyframes through IMU preintegration.

2.3. Image Data Preprocessing

Facing the large-scale scene of the power grid, the inspection robot tracks and extracts the feature points. This is mainly because the camera motion is estimated based on the extraction and matching of feature points, and the reprojection error is optimized, which has high robustness to scenes with large changes such as illumination, motion, and rotation. The preprocessing of the original image data collected by the camera can be divided into three modules. The first module tracks the image feature points. The second module packages the tracked feature points into point clouds and publishes them according to the given frequency. The last module builds the observation map model according to the published point clouds [23]. Feature point tracking is also called landmark point tracking and is a more characteristic pixel in a frame image. Figure 3 lays out the tracking process of feature points.

[figure(s) omitted; refer to PDF]

In Figure 3, the obtained new image is judged as whether it is the first frame image. A certain amount of feature information is evenly extracted from the image if it is the first frame image. The feature information is numbered. Conversely, if it is not the first frame image, the optical flow method is used for tracking to move out the lost tracking points and judge whether to publish them. The tracking process returns to the initial state if the points are not published. If the points are published, the external points will be removed, and new feature points will be added.

For tight-couple monocular VINS, IMU acquisition initialization provides system optimization parameters and initial state values, which is its key [24]. The variables mainly include gyro bias, scale factor, gravity, and velocity vector in the initialization process. Figure 4 showcases the whole process of IMU initialization.

[figure(s) omitted; refer to PDF]

Apparently, the first step for IMU initialization is the gyro offset estimation. When the camera continuously collects N keyframes, the rotation transformation matrix of adjacent image frames can be solved through monocular vision 3D reconstruction. The second step is the approximate estimation of scale and gravity acceleration, the zero bias estimation of the accelerometer, and the correction of scale and gravity acceleration. Lastly, the velocity is estimated.

Visual inertial joint initialization is the prerequisite to realizing the overall attitude control of the robot. The optimal attitude control of monocular VINS is estimated by combining IMU constraints and IMU-based ones. Figure 5 demonstrates the process of tight-coupling monocular VINS.

[figure(s) omitted; refer to PDF]

In Figure 5 demonstrates that features are extracted and tracked by a vision sensor. The inertial information is preintegrated and then fused to obtain the visual-inertial joint initialization. Furthermore, tight-coupling optimization is carried out to obtain the optimized attitude state. The sampling difference between the IMU and the camera can be eliminated, the IMU can provide scale information for monocular vision, and the visual pose can correct the drift problem of the IMU, which can reduce the probability of large errors.

In addition to the constraints of landmark observation data between continuous image frames, there are two other constraints introduced by IMU. For example, there are constraints on the speed, displacement, and rotation of IMU preintegration between continuous image frames. The random walk process constrains the zero deviation of IMU corresponding to two adjacent image frames [25]. According to the constraints, the factor graph can be constructed. Figure 6 describes the factor graph of VINS.

[figure(s) omitted; refer to PDF]

Figure 6 illustrates that the factor graph is usually composed of factors and variables. In Figure 5, the block is the state variable to be estimated, and the circular node denotes the constraint relationship between variables, including IMU constraint, visual constraint, and loop constraint. Purple nodes represent camera attitude, and brown nodes signify IMU measurements. Each factor in Figure 6 corresponds to an error term in the nonlinear Least Squares Problem (LSP).

2.4. Theories Related to Nonlinear Optimization of SLAM Systems

SLAM system is mainly used to solve the problem of positioning, navigation, and mapping of mobile robots in a complex environment [26]. It can estimate the motion state of the camera’s own degrees of freedom by means of multiview geometry and extended nonlinear optimization based on a series of image sequences with prior information in an unknown environment [27]. Figure 7 expresses the overall architecture of SLAM.

[figure(s) omitted; refer to PDF]

As exhibited in Figure 7, the SLAM system can be divided into visual odometry (VO), back-end optimization, Loop Closing, and drawing. It can complete navigation and positioning tasks in various environments. The basic task of monocular vision SLAM is to determine the position and attitude of the camera and the 3D coordinates of feature points in an unknown environment. In monocular vision SLAM, only noisy camera images can be obtained. Thus, global nonlinear optimization is needed. Integrating the IMU sensor into SLAM can further obtain carrier acceleration and angular velocity information. It helps estimate the position and attitude of the carrier and the coordinates of landmarks and realize the positioning of an intelligent machine. However, the information contains noise and needs nonlinear optimization. Loop detection plays a vital role in the vision SLAM system. Although the odometer can get the camera attitude map and the position information of feature points, the odometer only considers the correlation information of adjacent frames. It does not establish a consistent optimization trajectory globally. This results in the drift during the SLAM system running. In particular, the loop detection function can minimize the drift of the SLAM system and improve its accuracy.

The monocular VINS-based SLAM must be initialized before running [28]. The aim is to guide the system state and generate the initial state information (namely, the initial system parameter) [29]. The subsequent positioning and mapping process is closely related to the system’s initial state information. Figure 8 draws the initialization flow of the nonlinear optimized SLAM system.

[figure(s) omitted; refer to PDF]

Figure 8 depicts that system initialization is divided into two processes. In the first process, the initial state information is estimated by visual initialization to obtain inaccurate state information. Then, the joint inertial navigation is initialized by IMU to improve the accuracy of the initialization state information. First, the visual sensor obtains the data information, and the keyframes are filtered according to certain conditions. The first keyframe is obtained as the current frame, and another keyframe is taken as the current frame to replace the first keyframe. Then, the first keyframe becomes the previous frame. Extracting the feature points can obtain the feature information of the two keyframes. Furthermore, triangulation calculation is performed to obtain the motion attitude information with the position feature. Next, the information map is established and continuously optimized, and finally, the system initialization information can be obtained.

2.5. Solving Nonlinear SLAM Optimization Process Based on GNM

The nonlinear SLAM optimization process is derived using the GNM. In visual SLAM, the sum of squares of the error term is generally represented by a cost function [30]. The specific expression is as follows: $\begin{matrix} (16) & F X = \frac{1}{2} e^{T} R^{- 1} e . \end{matrix}$

The error is expanded by first-order Taylor and is linearly approximated to obtain: $\begin{matrix} (17) & e \overset{⌣}{X} + Δ X \approx \overset{⌣}{e} + J Δ X . \end{matrix}$ $\overset{⌣}{X} ≜ X_{n - 1}$ is the state quantity at time $n - 1$ and is the column vector. $\overset{⌣}{e} ≜ e \overset{⌣}{X}$ denotes the error at time $n - 1$ and is the column vector. $J ≜ \partial e / \partial x / \overset{⌣}{X}$ represents the first derivative of the error with respect to the state quantity. Substituting the above-given terms into (17) gets the following equation: $\begin{matrix} (18) & F \overset{⌣}{X} + Δ X \approx \frac{1}{2} e^{T} R^{- 1} \overset{⌣}{e} + {\overset{⌣}{e}}^{T} R^{- 1} J Δ X + \frac{1}{2} Δ X^{T} J^{T} R^{- 1} J Δ X . \end{matrix}$

The derivative of the increment $Δ X$ is calculated so that it equals zero, resulting in the optimal solution for the increment: $\begin{matrix} (19) & Δ X = {J^{T} R^{- 1} J}^{- 1} J^{T} R^{- 1} \overset{⌣}{e}, \\ (20) & X = \overset{⌣}{X} + Δ X . \end{matrix}$

The final equation of increment $Δ X$ is also called the nonlinear least-squares equation. $\overset{⌣}{H} ≜ J^{T} R^{- 1} J$ is defined, where $H$ is the approximation of the Hessian matrix in Newton’s iterative method, which reduces the large computational consumption caused by Newton’s method for finding the Hessian matrix [31]. Then, $Δ X$ is used to calculate the iteration step size along the descending direction of the gradient until the visual reprojection error converges to the minimum. In the end, the optimal estimation is obtained for camera attitude and landmarks [32].

Construction of the proposed SLAM system based on tight-coupling monocular VINS under nonlinear optimization.

SLAM system can improve the positioning accuracy and robustness by fusing multisensors. In particular, fusing a simple and low-cost camera and IMU sensor kit is the best choice for multi-sensor fusion in theory and practical application [33]. Fusing IMU and monocular vision through nonlinear optimization is the framework with the best accuracy and robustness. It is essentially a tight-coupling monocular VINS. Figure 9 details the proposed SLAM system based on tight-coupling monocular VINS.

[figure(s) omitted; refer to PDF]

As portrayed in Figure 9, the proposed SLAM system based on tight-coupling monocular VINS can compensate for monocular vision defects. It improves tracking speed and robustness and exerts high accuracy. Following a successful system initialization, the system starts tracking, local map optimization, and closed-loop detection simultaneously. The tracking process is also called mileage. Its core part is to estimate the transformation relationship between two adjacent frames.

2.6. Experimental Test and Data Processing

The EuRoC dataset in IEPG is adopted, which can be used for simulation testing of visual algorithms and visual-inertial algorithms, and it is one of the most commonly used datasets for evaluating the positioning accuracy of visual SLAM algorithms. The experiment is divided into two parts. One is to use the hardware platform to test the error analysis of the algorithm and the analysis of the positioning accuracy. The other is to test the dynamic performance of the moving scene under the single rotation condition of the tight-coupling monocular VINS according to relevant literature research [34]. Table 1 shows the test environment of this experiment.

Table 1

Experimental test environment.

Number	Content	Model and parameters
1	Computer configuration	Intel i7-4510U
2	Memory size	8G
3	Operating system	ROS kinetic
4	CPU	2.66 GHz
5	Camera frame rate	20 HZ
6	Frequency of IMU	200 HZ

Absolute Trajectory Error (ATE) is usually used to evaluate the positioning accuracy of the algorithm. The specific calculation reads: $\begin{matrix} (21) & \begin{matrix} {ATE}_{rot} = {\frac{1}{N} \sum_{i = 0}^{N - 1} {∠ Δ R_{i}}^{2}}^{1 / 2} \end{matrix}, \\ (22) & {ATE}_{pos} = {\frac{1}{N} \sum_{i = 0}^{N - 1} {Δ p_{i}}^{2}}^{1 / 2} . \end{matrix}$

In (18) and (19), $∠ Δ R_{i}$ means converting the rotation matrix into an angle and using the angle to represent the attitude error. $Δ p$ stands for position error.

The RMSE is used to evaluate the improvement effect of loop detection on positioning accuracy. The expression is as follows: $\begin{matrix} (23) & RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {X_{p_{i}} - X_{M i}}^{2}}{N}}, \end{matrix}$ $X_{p_{i}}$ and $X_{M i}$ denote the estimated value and the actual value, respectively. $N$ refers to the number of estimation points, and $N > 1$ . According to the calculated RMSE, the smaller RMSE is, the more accurate the estimation is, and the better the estimation effect is.

The MH_06_difficult and MH_02_easy two dataset sequences contain sufficient rotation, translation, and highly representative illumination changes. Therefore, this work makes a quantitative analysis of the attitude error and position error of the positioning algorithm utilizing these two different data sequences. The proposed algorithm is implemented through the robot operating system.

3. Results and Discussion

3.1. Test and Analysis Results of Tight-Coupling VINS Algorithm

On account of the MH_06_difficult dataset sequence, the error of the proposed SLAM system based on tight-coupling monocular VINS is analyzed by quantifying the attitude error and position error. Figure 10 plots the proposed algorithm’s state errors on the MH_06_difficult dataset sequence.

[figure(s) omitted; refer to PDF]

In Figure 10, for the attitude error, as the distance increases, the error fluctuation range of the heading angle becomes larger, and the error of the heading angle is larger than that of the pitch angle and roll angle. It is mainly because the pitch angle and roll angle are absolutely considerable and will not produce angular drift. Compared with the other two angles, the error of the roll angle and pitch angle is within ±2°. The error of the heading angle is larger, and the range is within ±3°. In Figure 9, for the position error, the fluctuation of the position error of the Y axis is the largest, followed by the X axis, and finally the Z axis. However, overall, the errors of the X, Y, and Z axis are all within 40 cm, which meets the positioning requirements of the UAV.

Subsequently, the proposed SLAM system by tight-coupling monocular VINS is evaluated on MH_02_easy dataset sequences by quantifying the attitude and position errors. Figure 11 reveals the results.

[figure(s) omitted; refer to PDF]

In Figure 11, for the MH_02_easy sequence, the attitude error of the heading angle is larger than that of the pitch and roll angles. The accuracy of the heading angle is compared with the other two angles, and the errors of the roll and pitch angles are within ±1°. The error of the heading angle is large, the range of the error angle is within ±3°, and the variation range is large. For the position error, the error of the Y-axis is the most obvious, followed by the X-axis, and finally the Z axis. However, on the whole, the three-axis errors of X, Y, and Z are all controlled within 300 mm, which can satisfy the positioning requirements of the UAV.

3.2. Influence of Loop Detection on Positioning Accuracy

This section evaluates the proposed SLAM system based on tight-coupling monocular VINS’s loop detection on positioning accuracy improvement by calculating RMSE. Different datasets train the proposed SLAM system. The positioning accuracy is compared under the datasets with and without loop detection. Figure 12 compares the influence of loop detection on the positioning accuracy of different datasets.

[figure(s) omitted; refer to PDF]

Figure 12 demonstrates that the RMSE of the dataset with loopback detection is smaller than that without loopback detection, and the gap between the two is large. The loop detection can play a certain role in reducing the error, and the effect is obvious. It can be used to solve the long-term error accumulation and ensure the stability of the long-term operation of the system. Therefore, the loop detection can effectively improve the positioning accuracy of the system.

3.3. Analysis of Attitude Estimation Test Results under Rotating Conditions

Under the condition of single rotation, the dynamic performance of the proposed SLAM system based on tight-coupling monocular VINS on the moving scene is tested. The dynamic experiment is conducted with the actual hardware and compared with the output results of the high-precision attitude solution module. Then, the pitch angle variation with time is analyzed. The estimated value is compared with the real value by considering the ATE. Figure 13 analyzes the variation of pitch angle and error curve.

[figure(s) omitted; refer to PDF]

Apparently, the estimated curve and the real curve of the pitch angle can be well fitted. The coincidence degree between the real value of the pitch angle and the estimated value under the proposed SLAM system by tight-coupling monocular VINS is high, and the AET can be controlled within ±2 with the change of time.

Furthermore, variation of roll angle with time is analyzed by comparing the estimated value with the real value and considering the AET. Figure 14 analyzes the variation of roll angle and error curve.

[figure(s) omitted; refer to PDF]

Figure 14 denotes that the fitting degree of the estimated curve and the real curve of the rolling angel are good. The coincident degree of the real value of the rolling angle and the estimated value by the proposed SLAM system by tight-coupling monocular VINS is high. The ATE can be controlled within ±1.5° with a change of time.

Afterward, the variation of yaw angle with time is analyzed by comparing the estimated value with the real value and the ATE. Figure 15 displays the variation of the yaw angle and error curve.

[figure(s) omitted; refer to PDF]

Evidently, the fitting degree of the estimated curve and real curve of the yaw angle is good altogether. The true value of the yaw angle and estimated test fit well under the proposed SLAM system based on tight-coupling monocular VINS. The ATE can be controlled within ±2.5° with the change of time.

3.4. Error Analysis of Test Results of Different Algorithms

The designed algorithm is compared with the mainstream visual-inertial positioning algorithm Open Keyframe-based Visual-Inertial SLAM (OKVIS) and the RMSE obtained by a separate SLAM algorithm. Table 2 demonstrates the error analysis of the data set under different algorithms.

Table 2

The error analysis of data set under different algorithms.

Data set	The proposed algorithm	SLAM	OKVIS
MH_01_easy	0.09	0.11	0.15
MH_02_easy	0.07	0.10	0.14
MH_03_easy	0.06	0.11	0.13
MH_04_difficult	0.13	0.17	0.34
MH_05_difficult	0.19	0.21	0.36
MH_06_difficult	0.06	0.09	0.11

As exhibited in Table 2, the RMSE of each sequence obtained by the test results of this system is lower than that of OKVIS and SLAM. The error values of all data sequences are significantly lower than other algorithms. Therefore, it can be found that the accuracy of the designed algorithm is better than that of OKVIS and SLAM, and the RMSE is minimal.

4. Conclusion

To improve the positioning accuracy of the power inspection robot in different environments and the dynamic detection performance in moving scenes, on the basis of the IMU sensor measurement model, preintegration, and nonlinear optimization SLAM system theory, the shortcomings of the original SLAM system are improved, the optimization process of solving nonlinear SLAM based on GNM is established, and the SLAM algorithm structure of vision and IMU fusion under nonlinear optimization is constructed. Firstly, the performance is tested by moving scenes, and the results of the tight-coupling VINS algorithm are analyzed. Secondly, the influence of loop detection on positioning accuracy is explored, and finally, the test results of attitude estimation under rotation conditions are studied. By testing on the dataset, the results manifest that the roll angle error, pitch angle error, and yaw angle error in the robot attitude error are all within the controllable range. Position error, the three axes of X, Y, and Z are also within the error range, which can meet the positioning requirements of the power inspection robot. The results also show that loop detection can effectively improve the system’s positioning accuracy. In the attitude estimation test under pure rotation, the coincidence of the actual value and the predicted value of the robot attitude is good. With the change of time, the ATE can be controlled within ±2.5°. Compared with other algorithms, the designed algorithm has the smallest RMSE and has better accuracy for the positioning and navigation system of the power inspection robot. The disadvantage of the design method is that the local pose is estimated. In the future, it will face the challenge of continuous innovation of technical methods. It can be considered to introduce cutting-edge technologies and methods such as neural networks into the VINS. A further update is needed in the simulation scene setting to make it closer to the actual application scene. Moreover, sensors that provide measurements relative to the Earth coordinate system from the barometer can be fused to expand the application scenarios.

Acknowledgments

The study was supported by Shanghai Key Laboratory of Power Station Automation Technology (Grant no. 13DZ2273800) Project.

References

[1] Y. Yao, C. Jun-hua, G. Yi, F. Zhun, Z. An-Min, X. Biao, L. Ke, "Autonomous Control method of rotor UAVs for power inspection with Renewable Energy based on Swarm intelligence," Frontiers in Energy Research, vol. 9,DOI: 10.3389/fenrg.2021.697054, 2021.

[2] V. N. Nguyen, R. Jenssen, D. Roverso, "Intelligent monitoring and inspection of power line components powered by UAVs and deep learning," IEEE Power and energy technology systems journal, vol. 6 no. 1, pp. 11-21, DOI: 10.1109/jpets.2018.2881429, 2019.

[3] A. T. Zengin, G. Erdemir, T. C. Akinci, S. Seker, "Measurement of power line sagging using sensor data of a power line inspection robot," IEEE Access, vol. 8, pp. 99198-99204, DOI: 10.1109/access.2020.2998154, 2020.

[4] M. Labbé, F. Michaud, "RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation," Journal of Field Robotics, vol. 36 no. 2, pp. 416-446, DOI: 10.1002/rob.21831, 2019.

[5] S. Kim, D. Kim, S. Jeong, J. W. Ham, J. K. Lee, K. Y. Oh, "Fault diagnosis of power transmission lines using a UAV-mounted smart inspection system," IEEE Access, vol. 8, pp. 149999-150009, DOI: 10.1109/access.2020.3016213, 2020.

[6] G. Silano, T. Baca, R. Penicka, D. Liuzza, M. Saska, "Power line inspection tasks with multi-aerial robot systems via signal temporal logic specifications," IEEE Robotics and Automation Letters, vol. 6 no. 2, pp. 4169-4176, DOI: 10.1109/lra.2021.3068114, 2021.

[7] T. Zhao, J. Liu, S. Dian, R. Guo, S. Li, "Sliding-mode-control-theory-based adaptive general type-2 fuzzy neural network control for power-line inspection robots," Neurocomputing, vol. 401, pp. 281-294, DOI: 10.1016/j.neucom.2020.03.050, 2020.

[8] D. Rojas-Valverde, C. D. Gómez-Carmona, R. Gutiérrez-Vargas, J. Pino-Ortega, "From big data mining to technical sport reports: the case of inertial measurement units," BMJ open sport & exercise medicine, vol. 5 no. 1,DOI: 10.1136/bmjsem-2019-000565, 2019.

[9] V. B. Semwal, N. Gaud, P. Lalwani, V. Bijalwan, A. K. Alok, "Pattern identification of different human joints for different human walking styles using inertial measurement unit (IMU) sensor," Artificial Intelligence Review, vol. 55 no. 2, pp. 1149-1169, DOI: 10.1007/s10462-021-09979-x, 2022.

[10] G. Coviello, G. Avitabile, "Multiple synchronized inertial measurement unit sensor boards platform for activity monitoring," IEEE Sensors Journal, vol. 20 no. 15, pp. 8771-8777, DOI: 10.1109/jsen.2020.2982744, 2020.

[11] Y. Chen, T. Zhao, S. Dian, X. Zeng, H. Wang, "Balance adjustment of power-line inspection robot using general type-2 fractional order fuzzy PID controller," Symmetry, vol. 12 no. 3,DOI: 10.3390/sym12030479, 2020.

[12] Z. Zhonglin, F. Bin, L. Liquan, Y. Encheng, "Design and function realization of nuclear power inspection robot system," Robotica, vol. 39 no. 1, pp. 165-180, DOI: 10.1017/s0263574720000740, 2021.

[13] Z. Zhang, F. Xu, Z. Qin, Y. Xie, "Resource allocation in UAV assisted air ground intelligent inspection system," Cognitive Robotics, vol. 2,DOI: 10.1016/j.cogr.2021.12.002, 2022.

[14] A. Poulose, D. S. Han, "Hybrid indoor localization using IMU sensors and smartphone camera," Sensors, vol. 19 no. 23,DOI: 10.3390/s19235084, 2019.

[15] A. Poulose, O. S. Eyobu, D. S. Han, "An indoor position-estimation algorithm using smartphone IMU sensor data," IEEE Access, vol. 7, pp. 11165-11177, DOI: 10.1109/access.2019.2891942, 2019.

[16] G. Jia, X. Li, D. Zhang, W. Xu, H. Lv, Y. Shi, M. Cai, "Visual-SLAM Classical framework and key Techniques: a review," Sensors, vol. 22 no. 12,DOI: 10.3390/s22124582, 2022.

[17] L. Xiong, Y. Wen, Y. Huang, J. Zhao, W. Tian, "Joint Unsupervised learning of depth, pose, ground normal vector and ground segmentation by a monocular Camera sensor," Sensors, vol. 20 no. 13,DOI: 10.3390/s20133737, 2020.

[18] L. Xiong, Z. Deng, Y. Huang, W. Du, X. Zhao, C. Lu, W. Tian, "Traffic Intersection Re-identification using monocular Camera sensors," Sensors, vol. 20 no. 22,DOI: 10.3390/s20226515, 2020.

[19] M. He, C. Zhu, Q. Huang, B. Ren, J. Liu, "A review of monocular visual odometry," The Visual Computer, vol. 36 no. 5, pp. 1053-1065, DOI: 10.1007/s00371-019-01714-6, 2020.

[20] R. Miyamoto, M. Adachi, H. Ishida, T. Watanabe, K. Matsutani, H. Komatsuzaki, S. Sakata, R. Yokota, S. Kobayashi, "Visual navigation based on semantic segmentation using only a monocular camera as an external sensor," Journal of Robotics and Mechatronics, vol. 32 no. 6, pp. 1137-1153, DOI: 10.20965/jrm.2020.p1137, 2020.

[21] D. Zou, P. Tan, W. Yu, "Collaborative visual SLAM for multiple agents: a brief survey," Virtual Reality & Intelligent Hardware, vol. 1 no. 5, pp. 461-482, DOI: 10.1016/j.vrih.2019.09.002, 2019.

[22] S. Ashry, T. Ogawa, W. Gomaa, "CHARM-deep: Continuous human activity recognition model based on deep neural network using IMU sensors of smartwatch," IEEE Sensors Journal, vol. 20 no. 15, pp. 8757-8770, DOI: 10.1109/jsen.2020.2985374, 2020.

[23] A. D. Richardson, "Tracking seasonal rhythms of plants in diverse ecosystems with digital camera imagery," New Phytologist, vol. 222 no. 4, pp. 1742-1750, DOI: 10.1111/nph.15591, 2019.

[24] M. Zhao, D. Zhou, X. Song, X. Chen, L. Zhang, "DiT-SLAM: real-time Dense visual-inertial SLAM with Implicit depth representation and tightly-Coupled graph optimization," Sensors, vol. 22 no. 9,DOI: 10.3390/s22093389, 2022.

[25] T. H. Nguyen, T. M. Nguyen, L. Xie, "Tightly-coupled ultra-wideband-aided monocular visual SLAM with degenerate anchor configurations," Autonomous Robots, vol. 44 no. 8, pp. 1519-1534, DOI: 10.1007/s10514-020-09944-7, 2020.

[26] F. Yu, J. Shang, Y. Hu, M. Milford, "NeuroSLAM: a brain-inspired SLAM system for 3D environments," Biological Cybernetics, vol. 113 no. 5-6, pp. 515-545, DOI: 10.1007/s00422-019-00806-9, 2019.

[27] N. Liu, S. Qin, "A neurodynamic approach to nonlinear optimization problems with affine equality and convex inequality constraints," Neural Networks, vol. 109, pp. 147-158, DOI: 10.1016/j.neunet.2018.10.010, 2019.

[28] S. P. Li, T. Zhang, X. Gao, D. Wang, Y. Xian, "Semi-direct monocular visual and visual-inertial SLAM with loop closure detection," Robotics and Autonomous Systems, vol. 112, pp. 201-210, DOI: 10.1016/j.robot.2018.11.009, 2019.

[29] Z. Feng, J. Li, L. Zhang, C. Chen, "Online spatial and temporal calibration for monocular direct visual-inertial odometry," Sensors, vol. 19 no. 10,DOI: 10.3390/s19102273, 2019.

[30] C. Cartis, L. Roberts, "A derivative-free Gauss–Newton method," Mathematical Programming Computation, vol. 11 no. 4, pp. 631-674, DOI: 10.1007/s12532-019-00161-7, 2019.

[31] M. Mazzotti, Q. Mao, I. Bartoli, S. Livadiotis, "A multiplicative regularized Gauss-Newton method with trust region Sequential Quadratic Programming for structural model updating," Mechanical Systems and Signal Processing, vol. 131, pp. 417-433, DOI: 10.1016/j.ymssp.2019.05.062, 2019.

[32] Z. Tian, Y. Ren, G. Wang, "An application of backtracking search optimization–based least squares support vector machine for prediction of short-term wind speed," Wind Engineering, vol. 44 no. 3, pp. 266-281, DOI: 10.1177/0309524x19849843, 2020.

[33] M. M. Atia, S. L. Waslander, "Map-aided adaptive GNSS/IMU sensor fusion scheme for robust urban navigation," Measurement, vol. 131, pp. 615-627, DOI: 10.1016/j.measurement.2018.08.050, 2019.

[34] P. Gang, L. Zezao, C. Bocheng, C. Shanliang, H. Dingxin, "Robust tightly coupled pose estimation based on monocular vision, inertia, and wheel speed," , 2020. https://arxiv.org/abs/2003.01496

Word count: 6093

Show less

Copyright © 2022 Can Wang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

Under China’s Intelligent Electric Power Grid (IEPG), the research on IEPG inspection mode is of great significance. This work aims to improve the positioning and navigation performance of IEPG inspection robots in a complex environment. First, it reviews the monocular camera projection and the Inertial Measurement Unit (IMU) models. It also discusses the tight-coupling monocular Vision Inertial Navigation System (VINS) and the initialization theory of the Simultaneous Localization and Mapping (SLAM) system. Nonlinear optimization for SLAM by the Gauss–Newton Method (GNM) is established. Accordingly, this work proposes the SLAM system based on tight-coupling monocular VINS. The EuRoC dataset data sequence commonly used in visual-inertial algorithm testing in IEPG is used for simulation testing. The proposed SLAM system’s attitude and position estimation errors are analyzed on different datasets. The results show that the errors of roll, pitch, and yaw angle are acceptable. The errors of the X, Y, and Z axes are within 40 cm, meeting the positioning requirements of an Unmanned Aerial Vehicle (UAV). Meanwhile, the Root Mean Square Error (RMSE) evaluates the improvement of positioning accuracy by loop detection. The results testify that loop detection can reduce the RMSE and improve positioning accuracy. The attitude estimation tests the angle changes of pitch, roll, and yaw angles with time under a single rotation condition. The estimated value of the proposed SLAM algorithm is compared with the real value through Absolute Trajectory Error (ATE). The results show that the real value and the estimated value of attitude error can coincide well. Thus, the proposed SLAM algorithm is effective for positioning and navigation. ATE can also be controlled within ±2.5°, satisfying the requirements of navigation and positioning accuracy. The proposed SLAM system based on tight-coupling monocular VINS presents excellent positioning and navigation accuracy for the IEPG inspection robot. The finding has a significant reference value in the later research of IEPG inspection robots.

Details

Title

Applying SLAM Algorithm Based on Nonlinear Optimized Monocular Vision and IMU in the Positioning Method of Power Inspection Robot in Complex Environment

Author

Wang, Can¹

; Li, Zhibin¹; Kang, Yefei¹; Li, Yingzheng¹

¹ School of Automation Engineering, Shanghai University of Electric Power, Shanghai 200090, China

Editor

Xos M Pardo

Publication year

2022

Publication date

2022

Publisher

John Wiley & Sons, Inc.

ISSN

1024123X

e-ISSN

15635147

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2022/3378163

ProQuest document ID

2720241553

Applying SLAM Algorithm Based on Nonlinear Optimized Monocular Vision and IMU in the Positioning Method of Power Inspection Robot in Complex Environment

Jump to:

Full text

Abstract

Details

Suggested sources