Content area
This study intends to address the issues of low recognition accuracy, delayed response, and insufficient efficiency of multi machine collaboration in unmanned aerial vehicle (UAV) inspections of transmission lines in extreme environments. Thus, the study proposes an intelligent operation and inspection framework that integrates multimodal perception, deep reinforcement learning, and dynamic scheduling, which is divided into three stages. In the first stage, this study proposes an UAV hardware system integrating Light Detection and Ranging (LiDAR), infrared thermal imagers, and high-resolution visual sensors to enhance data collection efficiency. In the second stage, this study then presents a Transformer-based multimodal data fusion algorithm to improve defect recognition accuracy and robustness. It also uses a deep reinforcement learning algorithm for dynamic path planning to optimize UAV inspection routes, thereby enhancing inspection coverage and energy efficiency. In the third stage, a dynamic task allocation and resource scheduling model combining Mixed Integer Programming (MIP) and heuristic rules is proposed to achieve real-time task allocation and resource optimization for multi-UAV collaborative inspection. Experimental results show that this method achieves an F1-score of 89.8% for defect recognition in extreme environments (improved by 11% compared with TransPathNet), shortens emergency response time to 45 s (improved by 28.6% compared with PPO-MultiDrone (Proximal Policy Optimization-Multi-Drone)), increases inspection coverage to 98.7% (improved by 10.7% compared with PPO-MultiDrone), reduces energy consumption by 28.4%, and achieves task completion rate and resource utilization rate of 95.6% and 91.5% respectively (Improved by 8.4% and 16.0% respectively compared to the optimal baseline Genetic Algorithm-Mask Region-based Convolutional Neural Network). This study provides a reference method for the further development of power Internet of Things defect detection.
Introduction
With the rapid development of modern power enterprises, transmission lines, as a key link in power transmission, have their operation and maintenance efficiency and intelligent management level directly related to the safe and stable operation of the power system [1]. However, traditional inspection methods for transmission lines mainly rely on manual inspection, which is not only inefficient but also easily restricted by complex terrain, severe weather and other conditions, posing great challenges to inspection work [2]. Therefore, how to improve the operation and maintenance efficiency of transmission lines and realize intelligent management has become an important issue that needs to be solved urgently in the power industry.
In recent years, the rapid development of UAV technology has provided a new solution for the inspection of transmission lines. UAVs have the advantages of high efficiency, flexibility, and low cost, and can carry a variety of sensors to carry out multi-dimensional and all-round data collection for transmission lines, significantly improving inspection efficiency [3]. Especially with the continuous progress of sensor technology, such as Light Detection and Ranging (LiDAR), infrared thermal imagers, and high-resolution visual sensors, the application potential of UAV in transmission line inspection has been further explored [4]. These sensors can provide strong technical support for tasks such as 3D spatial modeling of transmission lines, temperature anomaly detection, and surface defect recognition, providing powerful support for the intelligent operation and maintenance of transmission lines [5].
This study aims to response to the shortcomings pointed out in the above research background, including the lack of existing methods in the collaborative mechanism of multimodal feature fusion and the insufficient adaptability of algorithms in complex power scenarios. Therefore, this study proposes an efficient UAV operation and maintenance management system integrating multimodal perception, deep reinforcement learning, and dynamic scheduling for the practical needs of transmission line operation and maintenance in modern power enterprises. By integrating multimodal sensors such as LiDAR, infrared thermal imagers, and high-resolution visual sensors, it realizes multi-dimensional data collection of transmission lines. A Transformer-based multimodal data fusion algorithm is introduced to enhance the accuracy and robustness of defect recognition. Meanwhile, a deep reinforcement learning algorithm is adopted for dynamic path planning to optimize UAV inspection routes and improve inspection coverage and energy efficiency. Finally, combining mixed integer programming with heuristic rules, a dynamic task allocation and resource scheduling model is proposed to achieve real-time task allocation and resource optimization for multi-UAV collaborative inspection. This study aims to provide an efficient and intelligent solution for the operation and maintenance of transmission lines in modern power enterprises through this intelligent operation and maintenance management system, promoting the further development of power Internet of Things (IoT).
Related work
With the continuous innovation of UAV platforms and radar sensing technologies, R&D institutions in the power industry have begun to focus on breakthroughs in this technical field [6]. As a new geographic information acquisition method, UAV oblique photogrammetry realizes intelligent acquisition and processing of multi-angle, high-precision spatial data through a small aircraft platform [7]. Combining navigation systems and LiDAR sensors, this technology enables high-precision positioning, providing a reliable data foundation for 3D modeling of power facilities [8]. Compared with traditional aerial photography, it has obvious advantages: (1) The image data generated by multi-view image matching algorithms features higher integrity and coverage. A single flight can obtain high-definition images of the target area from five perspectives-front, rear, left, right, and orthographic. (2) Through intelligent image screening, it only requires 1/3 of the high-quality images used in traditional methods to construct a 3D digital model with minimal precision, reducing data storage needs while improving work efficiency. Especially in the inspection of UHV transmission lines above 500 kV, this technology can clearly identify conductor wear and insulator defects with a diameter of more than 2 mm. This innovative mapping method provides an immersive visual maintenance environment for power inspection personnel through virtual reality technology [9]. Inspectors can conduct virtual line inspections in the 3D digital twin system, which automatically marks potential defects and generates maintenance work orders, significantly enhancing operational intuitiveness and accuracy [10]. Previous studies have shown that this technology can effectively improve line inspection efficiency and defect recognition accuracy while reducing risks of high-altitude operations [11]. Currently, the technology has been scaled up in relevant construction projects in China. In the future, it will be deeply integrated with AI diagnostic systems to achieve intelligent operation and maintenance of transmission lines [12, 13].
Research on the application of UAV in transmission line operation and maintenance management is extensive in China and other countries. In terms of flight state recognition and trajectory prediction, recent research has improved system adaptability through multi-sensor fusion. For example, Zhang et al. proposed a trajectory prediction framework based on flight state recognition, which reduced prediction errors in strong wind interference scenarios by establishing a mapping relationship between 12-dimensional state features such as flight attitude and velocity and trajectory offset [14]. Shi et al. further expanded the application dimensions of the framework. A multi-mode differentiated prediction system was constructed by dividing six typical flight modes, including cruise, obstacle avoidance, and return [15]. In the same year, Shi et al. innovatively introduced a spatiotemporal alignment algorithm between millimeter wave radar and visual data in multi-sensor state recognition systems, solving the reliability problem of traditional IMU (Inertial Measurement Unit) data under extreme weather conditions and reducing system response delay [16]. Shi et al. also achieved dynamic correction of predicted trajectories by real-time assessment of flight environment risk levels (sudden changes in wind speed, approaching obstacles), further improving the success rate of emergency obstacle avoidance. These studies provide important support for the stable operation of drones in power inspection [17]. In terms of point cloud data processing technology, Ding et al. identified independent points by calculating point cloud normal and their spacing, optimized point normal through bidirectional filtering, and repositioned them to ensure the integrity of 3D point cloud models [18]. Yu et al. developed a density-based point cloud data processing technology, predicting point cloud density using a particle swarm optimization algorithm, filtering noise with K-means clustering, and finally achieving smooth optimization through bilateral filtering [19]. Liu et al. integrated a feature-preserving weight mechanism into the fuzzy C-means clustering method, proposing a new fuzzy C-means clustering algorithm with curvature weights. This algorithm eliminated abnormal data, analyzed statistical indicators of local radius neighboring points, fit a quadratic paraboloid to estimate data curvature, and calculates eigenvalues [20]. Camuffo et al. used clustering algorithms to obtain weighted cluster centers, showing high discrimination accuracy for different proportions and intensities of noise in point cloud data while maintaining feature information [21]. Poiesi and Boscaini proposed an innovative graph optimization method based on the interaction between geometric morphology and color relationships, specifically for color noise reduction, geometric noise reduction, and geometric-color fusion noise reduction [22]. Mikuni et al. developed a hybrid filtering technology that performed local surface approximation on adjacent areas of point cloud data, maintaining edge features during noise elimination to enhance preprocessing efficiency [23]. Hu et al. constructed a 3D point cloud denoising framework based on statistical information, robustly evaluating normal vectors and curvature indices of point cloud data to complete noise removal and surface flattening [24].
Although existing research has made some progress in unmanned aerial vehicle (UAV) oblique photogrammetry and point cloud data processing, recent analysis indicates that there are still some limitations. For example, existing point cloud processing methods lack a collaborative mechanism for multimodal feature fusion (references [18] and [21]); Most algorithms have insufficient adaptability in complex power scenarios (references [19] and [20]); And the current system generally lacks intelligent dynamic decision-making capabilities. These shortcomings constrain the engineering practicality of the technology. Aiming at these shortcomings, this study proposes an intelligent point cloud processing framework integrating multimodal perception and adaptive decision-making. The study efficiently extracts point cloud information through hardware settings, and then combines the dynamic parameter adjustment mechanism of deep reinforcement learning to achieve precise identification and classification of transmission line defects, providing more efficient and intelligent technical support for power inspection.
Construction of transmission line operation, inspection, and management methods integrating deep learning and UAV technology
Multimodal perception and data fusion model
Hardware construction
To achieve efficient autonomous inspection of transmission lines, this study proposes a UAV hardware system equipped with multimodal sensors. The core goal of the system is to integrate three devices: Light Detection and Ranging (LiDAR), InfraRed (IR) thermal imager, and high-resolution visual sensor, so as to complete multi-dimensional data collection for 3D spatial modeling of transmission lines, temperature anomaly detection, and surface defect recognition. This section elaborates on the hardware architecture, sensor selection, and integration optimization strategies in detail.
i. Sensor selection and functional construction: The UAV hardware system consists of three types of core sensors, with their specific technical parameters and functional divisions shown in Table 1:
Table 1. Selection of hardware system for UAV
Sensor type | Technical parameter | Functional description |
|---|---|---|
LiDAR | Wavelength: 905 nm; Scanning frequency: 20 Hz; Ranging accuracy: ±2 cm; Detection distance: 0.1–200 m | Generate three-dimensional point cloud data of transmission lines for tower positioning, conductor sag calculation and vegetation intrusion detection. |
IR | Resolution: 640 × 480; Temperature measuring range: −20 °C ~ 650 °C; Thermal sensitivity: 0.05 C. | Detect abnormal temperature of conductor joints, insulators and other components, and identify potential overload or poor contact faults. |
High resolution cameras (HRC) | Resolution: 20 million pixels; Frame rate: 30 fps; Focal length: 24–120 mm (adjustable) | Collect the image of the conductor surface, and identify the visible defects such as broken strands, corrosion and damaged insulation layer. |
The selection of sensor parameters mentioned above is mainly determined based on existing industrial application research and practical scenario requirements: LiDAR uses a 905 nm wavelength to achieve a balance between rain and fog penetration and human eye safety. The resolution of 640 × 480 infrared thermal imager can meet the accuracy requirements for detecting temperature anomalies in key components of power transmission equipment, while ensuring data transmission efficiency. High-resolution cameras ensure clear identification of small defects on the surface of wires at inspection heights [25].
ii. Hardware integration and layout optimization.
To reduce the weight of the UAV and ensure that the sensor can work normally, the modular propose of the UAV is shown in Fig. 1.
[See PDF for image]
Fig. 1
System module of UAV
Figure 1 shows that in the UAV equipment, the LiDAR and infrared thermal imager are installed below the fuselage through a gimbal, supporting the adjustment of pitch and yaw angles. The high-resolution camera is fixed at the front of the fuselage to avoid the propeller blocking the view. Each sensor will be connected to the main control unit through the Controller Area Network (CAN) bus to complete the synchronous collection and real-time transmission of data.
The total weight of the hardware system is controlled within 2.5 kg, meeting the load requirements of small UAVs. The power management module adopts a time-sharing power supply strategy to prioritize the energy consumption requirements of the LiDAR and infrared sensors. The power distribution equation is as follows:
1
represents the total power, and represents the priority weight coefficient of each sensor.
iii. The anti-interference and reliability construct.
Due to the interference of strong electromagnetic environment around transmission lines, the hardware system adopts the following protection measures: (a) The sensor interfaces and cables are wrapped with a copper foil shielding layer to reduce high-frequency electromagnetic interference. (b) The main control unit is equipped with dual Microcontroller Units (MCUs), which can automatically switch to the backup system when the main control fails. (c) The sensor housing adopts an Ingress Protection 67 (IP67) protection level to ensure that the UAV can operate stably in harsh conditions such as rain, snow, and sandstorms.
Data fusion based on the transformer model
To improve the accuracy and robustness of transmission line defect recognition, this section proposes a Transformer-based multimodal data fusion algorithm. The algorithm performs feature fusion and cross-modal alignment on multi-dimensional data collected by LiDAR, infrared thermal imagers, and high-resolution visual sensors, making up for the limitations of single-sensor data dimensions and noise interference. The specific process includes four stages: data preprocessing, cross-modal feature extraction, attention alignment, and fusion decision-making.
i. Data preprocessing and feature extraction.
Standardization and noise reduction processing are performed on the multimodal raw data collected by the UAV. The specific steps are as follows:
Step 1: LiDAR point cloud processing. Redundant data is removed using voxel filtering and ground segmentation methods, preserving the 3D structure information of towers, conductors, and key components.
Step 2: IR temperature field calibration. The ambient temperature compensation model is used to correct the thermal imager data and eliminate ambient radiation interference. The equation is:
2
stands for compensation coefficient. stands for ambient temperature, and stands for reference temperature.
Step 3: Visual image enhancement. Histogram equalization and edge enhancement algorithms are employed to improve the clarity of wire surface details. After completing data preprocessing, a multimodal deep learning model is used to extract features from different sensor data: (a) Geometric structure features of transmission towers, including spatial parameters such as pole inclination angles and wire sag, are extracted from LiDAR data using the PointNet + + point cloud processing network. (b) Anomalous thermal features such as local overheating are identified by analyzing infrared thermal imaging data with a ResNet-18 convolutional neural network, leveraging temperature gradient distributions. (c) Visible defects such as wire strand breaks and insulator surface damage are localized and classified in visible light images using the You Only Look Once version 5 (YOLOv5) object detection algorithm.
ii. Cross-modal attention alignment.
To align the space and semantics of multimodal features, a cross-modal Transformer module is proposed, as shown in Fig. 2.
[See PDF for image]
Fig. 2
Trans-modal transformer module
Figure 2 shows that the implementation process of the cross-modal Transformer module is as follows:
a, map LiDAR point cloud coordinates and image pixel coordinates to a unified reference system;
b, calculate the correlation weight between modes through the interaction of Query, Key and Value;
3
Q comes from LiDAR features. K and V come from IR or visual features. stands for dimension scaling factor.
c. The aligned features are combined by residual connection and layer normalization to generate joint representation.
The specific structure of this cross-modal Transformer module includes 4 layers of encoders, each layer using 8 attention heads. The hidden layer dimension of the module is set to 1024. During training, this study uses an Adam optimizer with an initial learning rate of 3 × 10⁻⁴, and adjusts the learning rate using cosine annealing strategy.
iii. Multi-modal fusion decision.
The aligned features are input to the fusion decision layer, and then the final defect classification results are output by using the weighted voting strategy. Among them, the fusion weight is dynamically adjusted according to the confidence of each mode:
4
is the confidence score of the ith mode, and represents the temperature coefficient.
The dynamic path planning based on deep reinforcement learning
To address the challenges of complex environments and dynamic tasks in transmission line inspection, this section proposes a dynamic path planning algorithm based on Deep Reinforcement Learning (DRL). The algorithm integrates environmental states, UAV performance, and task requirements to adaptively optimize paths, overcoming the real-time and robustness limitations of traditional static planning methods. This section unfolds step-by-step from state space modeling, action space design, reward function construction to DRL framework implementation.
Problem modeling and DRL framework
[See PDF for image]
Fig. 3
DRL path planning process based on PPO algorithm
The objective of this section is to use the Proximal Policy Optimization (PPO) algorithm to plan the UAV flight path for a given inspection task, giving priority to high-risk defect areas and further improving inspection coverage and energy consumption. The PPO algorithm is adopted because it can handle continuous action spaces and maintain stability under high-dimensional state inputs, and its overall process is shown in Fig. 3.
Figure 3 shows that in the DRL path planning process based on the PPO algorithm, the policy network is used to output action probabilities, the environmental feedback includes new states and rewards, and the driving strategy is responsible for iterative optimization.
State space propose
Let the state space be S and the state vector be expressed as:
5
represents terrain elevation, that is, the elevation matrix generated by LiDAR point cloud. stands for meteorological data, which is collected through multi-sensor modules mounted on drones (with an update frequency of 10 Hz), and fused and calibrated with data from power system meteorological monitoring stations (5 min/time), including parameters such as wind speed, visibility, and precipitation probability. represents the distribution of towers, and N represents the total number of towers. represents defect distribution, which is divided into low, medium and high. represents the remaining power of the UAV, and its value is [0,1]. represents the priority of the current task 1 is an emergency task and 3 is a regular task), and represents the coverage rate of the patrol area. d stands for state dimension. In addition, the above components are normalized to ensure the same range of values.
Action space propose
The action space A is defined as the continuous parameter of the UAV flight control command, and the action vector is defined as:
6
represents the horizontal angle, and the value is . represents the vertical climb rate. stands for horizontal speed.
The UAV dynamic model updates the position based on the following differential equation:
7
represents the time step.
Reward function propose
The reward function is proposed to balance the results of multi-objective optimization:
8
In Eq. (8), stands for coverage reward. stands for energy consumption punishment. stands for task priority reward. stands for collision punishment. The calculation method of each variable is as follows:
9
10
11
12
In Eq. (9), represents the new inspection area, and represents the total area to be inspected. In Eq. (10), represents the energy consumption coefficient. In Eq. (12), L represents the distance between the defective circuit and the tower or obstacle.
Algorithm implementation and training
Based on the above propose, the setting of algorithm parameters is shown in Table 2.
Table 2. Parameter proposal of algorithm
Parameter type | Parameter name | Implementation process/parameter setting |
|---|---|---|
Policy network architecture | Input layer | State vector s (dimension d) |
Hidden layer | 3-layer fully connected network (256→128→64), and the activation function is ReLU. | |
Output layer | Gaussian distribution parameters generate continuous actions. | |
Training parameters | Discount factor | 0.99 |
Learning rate | 3 × 10−4 | |
Batch size | 512 | |
PPO clipping parameters | 0.2 |
The training process of the algorithm is as follows: a. Initialize the strategy network and the value network; b, acquire the interaction trajectory of the UAV; c, calculate the advantage function and the strategy loss; d, update network parameters until convergence.
Dynamic task allocation and resource scheduling model
To further address challenges such as dynamic task updates and resource constraints in multi-UAV collaborative inspection of transmission lines, this section proposes a dynamic task allocation and resource scheduling model combining Mixed Integer Programming (MIP) with heuristic rules. The model enables real-time optimization of UAV task allocation, path planning, and resource occupation, overcoming the limitations of traditional static scheduling methods in terms of real-time performance and flexibility.
Mathematical modeling
Suppose there is a fleet of G UAVs, and each UAV has the maximum endurance time and load capacity . The set of tasks to be patrolled is , and each task includes location , priority , required time and resource requirement . To maximize the task completion rate and resource utilization rate and minimize the total energy consumption, the objective function is set as follows:
13
is used to measure whether task is completed. represents the energy consumption of UAV i in the k-th time period. represents the penalty coefficient of energy consumption, which is generally 0.05. The setting of this parameter is determined through parameter sensitivity analysis. Relevant experimental results show that a too low β value will significantly increase energy consumption and have limited improvement on task completion rate, while a too high β value will significantly reduce task completion rate while saving energy. When the β value varies within the range of 0.01 to 0.1, β = 0.05 can achieve optimal task completion rate and total system energy consumption. Therefore, β = 0.05 is chosen as the optimal performance value for the model in both task and energy consumption aspects.
The constraint conditions are set as follows:
a. Time window constraints
14
indicates whether the UAV i performs the task during the time period k.
b. Resource capacity constraint
15
c. Task uniqueness constraint
16
Combination of mixed integer programming and heuristic rules
To solve the above Non-deterministic Polynomial (NP)-Hard problem, the study proposes a hierarchical solving method, namely dynamic scheduling solving. The key steps of this method are divided into three parts: (a) Generate an initial solution based on the current task set; (b) If there is no feasible solution for MIP, use a greedy algorithm to preferentially allocate high-priority tasks; (c) Resolve resource conflicts through time window translation or UAV collaboration. Among them, the heuristic rules include: Sort tasks in descending order of pj; Select the UAV with the minimum current load to allocate tasks; Prioritize the allocation of tasks with high path overlap to reduce the total flight distance.
Experimental setup and model effect analysis
Experimental propose
Data selection and processing
i. Selection of datasets.
To train and validate the UAV inspection model for transmission lines, this study uses the open-source Transmission Towers and Power Lines Aerial-image Dataset (TTPLA). Technically supported by the Intelligent Code Assistant Artificial Intelligence (InsCode AI) large model developed by China Software Developer Network (CSDN) Company, this dataset focuses on the detection and segmentation of transmission towers and power lines. It contains a large number of aerial images and pixel-level annotations (including key components such as towers, conductors, and insulators). The data is formatted in the standard Common Objects in Context (COCO) format, which can be directly used for deep learning tasks such as object detection and instance segmentation. The dataset has recently updated high-quality image samples and preprocessing scripts, optimized annotation effectiveness, and provided pre-trained weights based on network architectures like ResNet, further improving model development efficiency (https://gitcode.com/gh_mirrors/tt/ttpla_dataset). Additionally, in order to simulate the generalization ability of the model under extreme environmental conditions, this study expands the dataset using three data augmentation strategies: adding Gaussian noise and fogging filters, superimposing random pulse noise, and using random masking simulation. The expanded dataset includes foggy scene samples, strong electromagnetic interference samples, and vegetation occlusion samples. Figure 4 shows a high-definition data example of the TTPLA dataset.
[See PDF for image]
Fig. 4
Example of high-definition data of TTPLA dataset
To ensure the generalization ability of model training, the dataset is divided into training set, validation set and test set at a ratio of 7:2:1 to ensure the balanced distribution of various defects. The division is shown in Table 3.
Table 3. Dataset division
Subset | Number of images | Application |
|---|---|---|
Training set | 3500 | Model parameter optimization |
Verification set | 1000 | Hyperparametric optimization and early stop mechanism |
Test set | 500 | Final performance evaluation and generalization test |
ii. Data preprocessing
To further improve the quality and accuracy of the dataset, the study uses radius filtering for data denoising. Radius filtering can remove point clouds that do not reach a sufficient number within a certain neighborhood of the point cloud, and its basic principle is shown in Fig. 5.
[See PDF for image]
Fig. 5
Radius filtering
In the figure, A, B and C are the center points respectively. Assuming that the given quantity threshold is t and the spatial neighborhood radius is d, when T ≥ 1, A, B and C are all kept as normal points. When T ≥ 2, B and C are kept as normal points, and A is kept as noise points and removed. When T ≥ 3, only B is kept as a normal point, and A and C are treated as noise points and removed.
Performance indicators and baseline model setting
i. Performance indicators.
To comprehensively evaluate the comprehensive performance of the proposed model, this study selects a variety of key evaluation indicators, including four aspects: accuracy, efficiency, resource utilization, and robustness: In terms of defect recognition accuracy, the weighted F1-score is selected (weights are assigned according to defect levels: high-risk defect weight = 0.6, medium-risk = 0.3, low-risk = 0.1) to strengthen the recognition ability of high-risk defects. In terms of dynamic response efficiency, the task completion rate and inspection coverage rate are used to comprehensively reflect the timeliness and task coverage rate of the model. In terms of resource utilization efficiency, energy consumption and resource utilization rate are introduced to quantify the rationality of resource scheduling. In terms of environmental robustness, the conflict resolution time and average idle rate are used to evaluate the anti-interference ability of the model in complex environments.
ii. Baseline model.
To verify the advanced nature of the proposed method, the following baseline models are selected for different comparative experiments, as shown in Table 4.
Table 4. Baseline model
Baseline model | Core method |
|---|---|
YOLOv7 + A* [26] | YOLOv7 defect detection + single target shortest path search |
GA (Genetic Algorithm) + MaskRCNN (Mask Region-based Convolutional Neural Network) [27] | Mask R-CNN segmentation + genetic algorithm |
TransPathNet [28] | Transformer multimodal fusion + path planning |
PPO-MultiDrone (Proximal Policy Optimization-Multi-Drone) [29] | Multi-agent PPO collaborative task allocation |
ResNet50-GCN (Graph Convolutional Network) [30] | Graph product network resource scheduling + ResNet defect classification |
Performance verification of the model
Verification of defect identification performance of multimodal fusion model based on transformer
[See PDF for image]
Fig. 6
Verification results of defect identification performance of multi-modal fusion model based on Transformer
In three extreme environments (fog (visibility < 50 m), strong electromagnetic interference (> 1000 V/m) and vegetation cover (coverage > 40%), the defect identification ability of this model is compared with that of the baseline model, and the results are shown in Fig. 6.
Figure 6 shows that under three extreme conditions-foggy weather, strong electromagnetic interference, and vegetation occlusion-the proposed model outperforms baseline models (YOLOv7, MaskRCNN, TransPathNet) in all scenarios. Specifically, it achieves an F1-score of 89.7% for detecting broken conductors in fog (11.4% higher than the best baseline), an insulator damage recognition rate of 87.2% under strong electromagnetic interference, and a joint overheating detection accuracy of 92.4% in vegetation-obstructed scenes. The weighted average F1-score (89.8%) is 11% points higher than TransPathNet, demonstrating that multimodal fusion effectively overcomes the limitations of single-environment interference. This verifies the model’s robustness in complex environments and its engineering practical value.
Efficiency verification of dynamic path planning
The transmission line in mountainous area (50-base tower, elevation drop of 300 m) is simulated, and five emergency defect tasks are randomly generated, and the inspection efficiency of UAV is compared. The result is shown in Fig. 7.
[See PDF for image]
Fig. 7
Performance comparison of path planning (single machine task)
Figure 7 shows that the proposed model outperforms baseline algorithms in several key indicators. Among them, the emergency task response time is 45 s (a 28.6% improvement over PPO-MultiDrone), the inspection coverage rate is 98.7% (a 10.7% improvement), the total flight distance is 6.3 km (a 23.2% reduction), and the energy consumption is 0.68 kWh (a 28.4% reduction). The above data show that the model uses dynamic weight adjustment to meet the needs of path optimality and real-time performance at the same time. The high coverage rate verifies its adaptability to complex terrain, while the reduction in flight distance reflects the superiority of the multi-objective collaborative optimization strategy, which provides an efficient solution for mountain power emergency inspection.
Verification of multi-UAV cooperative scheduling
The sudden failure scenario is set (8 high-priority tasks are added within 10 min), and the resource scheduling effect is compared. The result is shown in Fig. 8.
[See PDF for image]
Fig. 8
Performance comparison of dynamic task allocation
Figure 8 shows that the proposed model improves the two important indicators of task completion rate (95.6%) and resource utilization rate (91.5%) by 8.4% and 16.0% respectively compared with the optimal baseline (GA-MaskRCNN). It indicates that it improves the agility of system response through the real-time task decomposition and dynamic priority adjustment mechanism. In terms of resource management, the proposed model has an average idle rate of 8.5% and a conflict resolution time of 3.2 s. This data verifies that the model’s topology-aware load balancing strategy can effectively coordinate and alleviate multi-UAV resource competition, providing a highly robust scheduling scheme for power emergency inspection.
Conclusion
Unlike existing research, the main contribution of this study is to propose a closed-loop solution for UAV inspection of transmission lines: (1) At the technical level, the proposed UAV system and Transformer fusion algorithm alleviate the problem of insufficient defect detection accuracy in extreme environments; (2) At the decision-making level, utilizing deep reinforcement learning for dynamic planning of drone paths can improve energy efficiency and reduce emergency response speed; (3) At the management level, designing a mixed integer programming heuristic scheduling model helps solving the collaborative efficiency bottleneck caused by multi machine resource competition. Experiments have shown that, the multimodal sensor system (LiDAR + IR + HRC) and Transformer algorithm improve the defect recognition F1-score to 89.8% in extreme environments; The PPO-based dynamic path planning model shortens emergency response time to 45 s, increases coverage to 98.7%, and reduces energy consumption by 28.4%; The MIP-heuristic scheduling model achieves a task completion rate of 95.6% and resource utilization rate of 91.5%. However, the study has limitations: insufficient scheduling efficiency for large-scale UAV fleets and limited adaptability to extreme weather. Future work will optimize the system through federated learning, multi-energy scenario expansion, and ice disaster-resistant algorithms. The specific implementation of optimizing using federated learning is planned to adopt a hierarchical collaborative architecture: first, lightweight model shards are trained on the single machine side, then local parameters are aggregated through edge servers, and finally global updates are integrated by the central node. This study provides key technical support for the “perception-decision-execution” integration of power IoT, promoting the reliability of new power systems.
Author contributions
Hongzhi Gao contributed to conceptualization, methodology, analysis, investigation, data collection, draft preparation, manuscript editing; Dekyi and Metok contributed to analysis, investigation, data collection. All authors reviewed the manuscript.
Funding
The authors received no specific funding for this study.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Luo, Y; Yu, X; Yang, D; Zhou, B. A survey of intelligent transmission line inspection based on unmanned aerial vehicle. Artif Intell Rev; 2023; 56,
2. Singh, G; Stefenon, SF; Yow, KC. Interpretable visual transmission lines inspections using pseudo-prototypical part network. Mach Vis Appl; 2023; 34,
3. Li, Z; Zhang, Y; Wu, H; Suzuki, S; Namiki, A; Wang, W. Propose and application of a UAV autonomous inspection system for high-voltage power transmission lines. Remote Sens; 2023; 15,
4. Li, Z; Wang, Q; Zhang, T; Ju, C; Suzuki, S; Namiki, A. Uav high-voltage power transmission line autonomous correction inspection system based on object detection. IEEE Sens J; 2023; 23,
5. Wang, Q; Wang, W; Li, Z; Namiki, A; Suzuki, S. Close-range transmission line inspection method for low-cost uav: propose and implementation. Remote Sens; 2023; 15,
6. Li, H; Dong, Y; Liu, Y; Ai, J. Propose and implementation of Uavs for bird’s nest inspection on transmission lines based on deep learning. Drones; 2022; 6,
7. Guo, Q; Liu, H; Hassan, FM; Bhatt, MW; Buttar, AM. Application of UAV tilt photogrammetry in 3D modeling of ancient buildings. Int J Syst Assur Eng Manage; 2022; 13,
8. Zhu, Z; Wang, J; Zhu, Y; Chen, Q; Liang, X. Systematic evaluation and optimization of unmanned aerial vehicle tilt photogrammetry based on analytic hierarchy process. Appl Sci; 2022; 12,
9. Zhu, Z; Li, Q; Zhao, L; Huang, D; Wu, Q; Zuo, S. Whole-process survey of a landslide based on UAV tilt photography and 3D reconstruction and spatial analysis techniques. Landslides; 2025; 22,
10. Bocullo, V; Martišauskas, L; Pupeikis, D; Gatautis, R; Venčaitis, R; Bakas, R. Uav photogrammetry application for determining the influence of shading on solar photovoltaic array energy efficiency. Energies; 2023; 16,
11. Li, Q; Yao, X; Li, R; Zhou, Z; Yao, C; Ren, K. Quick extraction of joint surface attitudes and slope preliminary stability analysis: a new method using unmanned aerial vehicle 3d photogrammetry and GIS development. Remote Sens; 2024; 16,
12. Zhu, M; Yu, X; Tan, H; Yuan, J. Integrated high-precision monitoring method for surface subsidence in mining areas using D-InSAR, SBAS, and UAV technologies. Sci Rep; 2024; 14,
13. Wang, L; Tang, N; Jiang, H; Deng, H; Yang, Z; Cai, B. Stability evaluation of columnar perilous rock in the three Gorges reservoir area based on UAV tilt photography. Bull Eng Geol Environ; 2025; 84,
14. Zhang, J; Shi, Z; Zhang, A; Yang, Q; Shi, G; Wu, Y. Uav trajectory prediction based on flight state recognition. IEEE Trans Aerosp Electron Syst; 2023; 60,
15. Shi, Z; Zhang, J; Shi, G; Ji, L; Wang, D; Wu, Y. Design of a UAV trajectory prediction system based on multi-flight modes. Drones; 2024; 8,
16. Shi, Z; Zhang, J; Shi, G; Zhu, M; Ji, L; Wu, Y. Autonomous UAV safety oriented situation monitoring and evaluation system. Drones; 2024; 8,
17. Shi, Z; Shi, G; Zhang, J; Wang, D; Xu, T; Ji, L et al. Design of UAV flight state recognition system for multisensor data fusion. IEEE Sens J; 2024; 24,
18. Ding, Z; Sun, Y; Xu, S; Pan, Y; Peng, Y; Mao, Z. Recent advances and perspectives in deep learning techniques for 3D point cloud data processing. Robotics; 2023; 12,
19. Yu, D; Zhou, X; Pan, Y; Niu, Z; Sun, H. Application of statistical K-means algorithm for university academic evaluation. Entropy; 2022; 24,
20. Liu, Y; Zhou, J; Bian, Y; Wang, T; Xue, H; Liu, L. Estimation of weight and body measurement model for pigs based on back point cloud data. Animals; 2024; 14,
21. Camuffo, E; Mari, D; Milani, S. Recent advancements in learning algorithms for point clouds: an updated overview. Sensors; 2022; 22,
22. Poiesi, F; Boscaini, D. Learning general and distinctive 3D local deep descriptors for point cloud registration. IEEE Trans Pattern Anal Mach Intell; 2022; 45,
23. Mikuni, V; Nachman, B; Pettee, M. Fast point cloud generation with diffusion models in high energy physics. Phys Rev D; 2023; 108,
24. Hu, C; Ru, Y; Fang, S; Zhou, H; Xue, J; Zhang, Y et al. A tree point cloud simplification method based on Fpfh information entropy. Forests; 2023; 14,
25. Han, J. A review of UAV applications in the propose, construction, and operation & maintenance of overhead transmission lines. Int Core J Eng; 2025; 11,
26. Wang, Q; Zhang, Z; Chen, Q; Zhang, J; Kang, S. Lightweight transmission line fault detection method based on leaner YOLOv7-tiny. Sensors; 2024; 24,
27. Zhou, M; Wang, J; Li, B. Arg-mask RCNN: an infrared insulator fault-detection network based on improved mask RCNN. Sensors; 2022; 22,
28. Mkrtchyan, R; Ghukasyan, E; Petrosyan, K; Khachatrian, H; Raptis, TP. Vision transformers for efficient indoor pathloss radio map prediction. Electronics; 2025; 14,
29. Araújo, AG; Pizzino, CA; Couceiro, MS; Rocha, RP. A multi-drone system proof of concept for forestry applications. Drones; 2025; 9,
30. Talaat, FM; El-Sappagh, S; Alnowaiser, K; Hassan, E. Improved prostate cancer diagnosis using a modified ResNet50-based deep learning architecture. BMC Med Inform Decis Mak; 2024; 24,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.