Content area
As the global economy continues to expand and energy demand increases, the size of power transmission networks continues to grow, making the safety monitoring of transmission towers increasingly important. To address the accuracy deficiencies of existing technologies in predicting external damage risks to transmission towers, this study proposes a real-time spatial distance measurement method based on monocular vision. The method first uses a Transformer network to optimize the distribution of pseudo point clouds and designs a 3D monocular vision distance measurement method based on LiDAR. Through validation on the KITTI 3D object detection dataset, the method achieved an average detection accuracy increase of 10.71% in easy scenarios and 2.18% to 7.85% in difficult scenarios compared to other methods. In addition, this study introduced a foreground target depth optimization method based on a 2D target detector and geometric constraints, which further improved the accuracy of 3D target detection. The innovation of the study is the optimization of the pseudo point cloud distribution using the transformer network, which effectively captured the global dependencies and improved the global consistency and local detail accuracy of the pseudo point clouds. The method proposed in the study provides a new approach for intelligent detection and recognition of power transmission lines, and provides a positive impetus for the fields of power engineering and computer vision.
1. Introduction
With the development of the global economy and the increase in the demand for energy consumption, the scale of the power transmission network is constantly expanding. The emergence of intelligent and automation technology promotes the development of smart grid, and promotes the efficiency improvement and intelligent detection of power transmission technology [1,2]. Power transmission towers are key components in power transmission lines. It is of positive research significance to carry out real-time monitoring of transmission towers to reduce their hidden safety risks and hidden dangers, and to improve the operational safety of power lines (PLs) [3,4]. 3-dimension (3D) laser point cloud (PC) technology, as an advanced laser measurement technology, captures the spatial information on the surface of an object through the principle of laser range finder. However, the current 3D laser PC technology mostly relies on extracting the suspension points of transmission towers from the point cloud data (PCD) in the external breakage hidden danger monitoring of transmission towers. The distribution of pseudo point cloud (PPC) at the edge of the transmission tower has a large error with the distribution of the internal area to be determined. How to improve the data feature extraction of 3D laser PC technology to improve the efficiency of real-time measurement of transmission towers and other targets has become a current research hot spot.
C. Zong and Z. Wan proposed a unit rail segmentation method with an improved 3D PC segmentation model in an attempt to enhance the installation accuracy of unit rails on container ships. The method solved the errors during container loading due to low loading accuracy, welding shrinkage and expansion deviations by fitting the structure of unit rails on container ships [5]. L. Hui et al. developed an efficient PC learning network in order to solve the problem of excessive computational resource consumption of existing deep learning based methods. The method aggregated the local geometric features of the PC by the proposed lightweight ProxyConv neural network module to obtain a global descriptor of the PC for location identification [6]. A filtering method incorporating an anisotropic point error model and a simplified process was proposed by M. Ozendi et al. to address the problem of low-quality points in PCD acquired by light detection and ranging (LiDAR). The method effectively eliminated redundant data due to different positional quality patterns of the target object over multiple scans by taking into account the main error sources [7]. T. Y. Tang et al. addressed the limitations of using top-view images for LiDAR localization by proposing a method that utilized a public top-view image as a map proxy. The method simulated the acquisition of PCs scanned by LiDAR sensors located near the center of the top view image by converting the top view image into a collection of 2-dimensional (2D) cloud points, thus weakening the large domain discrepancy between the LiDAR data and the top view image [8]. Z. Zhang et al. proposed a perception algorithm inspired by the eagle’s eye. The method achieved adaptive high-dynamic-range stabilization by simulating the physiological structures of “deep concavity” and “shallow concavity” in the eagle’s eye, thus obtaining a 78.6% improvement in LiDAR PC bias in fixed-distance measurements [9]. Aiming at the efficiency of LiDAR-based simultaneous localization and mapping (SLAM) in the field of robot mapping, X. Yue analyzed the application of different types and configurations of LiDAR in SLAM system. It was found that multi-robot collaborative mapping and multi-source fusion SLAM system based on 3D LiDAR, combined with deep learning, could improve the accuracy of robot mapping [10].
With its accurate 3D data method that can capture intricate settings or structures, 3D laser PC technology has found extensive application in a variety of areas, including electric power, agriculture, and medical. The use of airborne or ground-based LiDAR for transmission line inspection can achieve high-precision 3D modeling of the line, and timely and effective access to defects or safety hazards of transmission lines [11,12]. X. Liu et al. proposed a point-by-point multi-layer perceptron (MLP)-based semantic segmentation (SS) network to address the SS of key objects in PL inspection and the low efficiency in dealing with a large amount of PCD and missing PCs of PL. The method effectively solved the point missing problem by designing a local coding module to improve the segmentation ability of the network in the corridor [13]. A framework for automated vegetation monitoring along PL was proposed by M. Gazzea et al. in response to the drawbacks of slower and less effective vegetation management and environmental inspections. The method reduced the cost and time of PL monitoring by utilizing satellite data analysis as an alternative to ground patrols and inspections by helicopters or drones [14]. D. Shokri et al. proposed an intelligent algorithm combining preprocessing, pole extraction and cable extraction to address the low efficiency of monitoring, maintenance and organization of PL corridors. The method detected the search area containing the lines by automatically extracting the poles and cables data from a moving ground LiDAR PC using a Hough transform algorithm. This achieved 100% average correctness and 97% completeness extraction for utility poles [15]. A. Al Najjar et al. proposed a two-stage method for detecting vegetation encroachment on PLs in urban areas, combining laser point cloud technology and point convolutional neural networks. By slicing the map and selecting informative parts, as well as conducting proximity analysis between vegetation and PL voxels, precise and automatic detection of vegetation encroachment on urban PLs was achieved, thereby optimizing vegetation management and active maintenance in urban environments [16]. M. García-Fernández et al. proposed a method for comparing different scanning strategies to address the problem of difficulty in improving the scanning throughput of multi-channel ground penetrating radar synthetic aperture radar systems carried by drones. By considering different values of inter track spacing to generate dense and sparse sampling distributions, detection efficiency could be improved while maintaining image quality [17]. C. Li et al. put forward an accurate parallel laser line scanning system to detect the depth of structural defects on concrete surface. Through the triangulation device composed of digital camera, double-line laser diode and positioning rigid arm, the image processing algorithm for extracting depth information from distorted laser strips was realized. It improved the accuracy and efficiency of evaluating the depth of defects at different distances [18]. M. M. Hosseini et al. addressed the inefficient and time-consuming problem of manual operations in the process of power tower damage assessment by proposing an automated model that used an unmanned aerial vehicle (UAV) to capture images and transmit them in real time to an intelligent damage classification and assessment unit. The model’s integration of four convolutional neural networks facilitated the learning of the tower condition from images, the extraction of image features, and the training of automated intelligent tools to replace manual fault location and damage assessment [19]. Chen et al. proposed a comprehensive risk assessment framework based on transmission tower geometry and topography to address the problem that transmission towers are prone to collapsing and triggering large-scale power outages and electrocution risks when subjected to heavy rainfall and flooding. A series of new point cloud segmentation and fitting algorithms are developed to accurately estimate the tilt angle of transmission towers by utilizing 3D point cloud data acquired from aerial LiDAR [20].
Combined with the above, it can be observed that scholars at home and abroad have conducted various researches on PPC distribution optimization, and have achieved good research results in many areas. However, there is a PPC long-tailed distribution effect in the process of point cloud image acquisition using LiDAR technology for transmission tower external damage risk detection. In the existing transmission tower external damage risk prediction technology, the processing time of the traditional LiDAR technology is about 3-5s, and the data processing and analysis is too time-consuming to meet the real-time demand. In addition, the completeness of the feature extraction of the towers is only about 97%, and there is a 3% error between the distribution of PPC at the edges of the towers and the distribution of the internal area to be determined, which limits the accuracy of the prediction model. And there are still some research gaps on the limitations of stereo vision in long-distance measurement. Therefore, the study proposes a real-time spatial distance measurement method for transmission tower external breakage hidden danger based on monocular vision with a view to improve the efficiency of predicting hidden danger on transmission towers. The distribution of PPC is optimized by using Transformer network as an encoder, and a distance measurement method for monocular vision is further designed. The novelty of the study is that the encoder-decoder structure is used for PC feature distribution enhancement. The study also achieves adequate extraction and pseudo point cloud distribution optimization (PPCDO) of the actual PCD in a way that enhances the feature information of the PCD.
The study proposes a real-time measurement method for the spatial distance of external breakage and hidden dangers of transmission towers based on monocular vision. This method employs the Transformer network to optimize the distribution of PPC and combines the depth optimization of foreground targets to enhance the accuracy and efficiency of 3D target detection. By optimizing the PPC with the Transformer encoder, the CD error is effectively reduced, thereby improving the accuracy of the PCD. Based on the optimized PPC, a 3D monocular visual distance measurement method based on LiDAR is developed in this study, which further improves the measurement accuracy. The experimental validation results confirm the superiority and effectiveness of the proposed method, and demonstrate its practical application value on the inspection data of Guangdong Power Grid Company. The real-time measurement method proposed by the study offers a novel approach for intelligent detection and recognition of power transmission lines. Furthermore, the PPCDO strategy is of considerable significance for enhancing image measurement and recognition.
The overall structure of the study consists of three sections. In the first section, the real-time measurement method of the external breakage hidden danger spatial distance of transmission towers based on monocular vision is studied and designed. In the second section, the proposed method is experimented and analyzed. The third section summarizes the experimental results and indicates the future research direction.
2. Methods and materials
To improve the efficiency of detecting external breakage hidden danger for transmission towers, the study firstly optimizes the design of PPC distribution on the basis of Transformer network. Second, a foreground objective depth optimization method is proposed. On this basis, a real-time spatial distance measurement method based on monocular vision is further designed by combining PPCDO and foreground target depth optimization.
2.1. Transformer-based pseudo point cloud distribution optimization
To adjust the large error between the distribution of PPC at the edge of the target object and the distribution characteristics of a specific region inside the target, the study first defines the target problem of how to make the simulated PC closer to the actual PC acquired by LiDAR as the PC distribution transformation problem. The main factor influencing the discrepancy between the distribution of the real PC and the simulated PC is the blurring of the deep location data of the PC during the acquisition phase. Therefore, the study sets the distribution information of the PC as the position information in the 3D spatial coordinate system, and utilizes the bilinear image processing method to obtain the quantitative information of the PC [21,22]. Fig 1 depicts the unique optimization procedure.
[Figure omitted. See PDF.]
In Fig 1, while performing the feature extraction process for different PCs, the study utilizes PointNet++ to monitor the whole process. The PointNet++ network solves some of the limitations of the original PointNet in dealing with PCD with complex structures, especially in extracting local features and multi-scale features, through an innovative network structure [23,24]. PointNet learns global features by directly processing the entire point cloud. It considers each point in the PC as an independent sample, ignoring the spatial relationship and local structure information between points. PointNet++ introduces a layered architecture that uses different scales to capture the local structure in the PC. This is achieved by sampling and grouping PCs at different scales, allowing the network to learn the characteristics of local areas. In addition, PointNet++ can capture local features of different scales by setting up multiple layers, each with different receptive fields. Due to the lack of local feature capture, PointNet has limited generalization ability, especially when dealing with PCD with changing shapes. PointNet++, on the other hand, by introducing local feature capture, multi-scale feature fusion and hierarchical architecture, is able to improve its ability to handle complex PCDs, enabling it to perform well in a variety of PC-related tasks [25]. Therefore, the integration of global information can be enhanced by using it to encode PC properties. The specific network structure is shown in Fig 2.
[Figure omitted. See PDF.]
In Fig 2, the PointNet++ network has a point set abstraction module that extracts key features and structural information from raw PCD. The module consists of three main layers: sampling, integration, and PointNet. Among them, the sampling layer mainly performs sampling of the whole received PCDset according to the farthest point sampling (FPS). Moreover, one point is randomly selected as the first center point, and iteration is performed to obtain multiple center points. FPS aims to select a uniform set of points from a PC for efficient processing and analysis. The method ensures sampling uniformity by selecting the points that are furthest away from the current set of points. This helps the network capture the global structure of the PCs and reduces the amount of computation required for large-scale data processing. In short, FPS improves the efficiency and effectiveness of PCD processing through uniform sampling. Random selection of the first centroid of the FPS algorithm is essential to ensure unbiased and uniform PC sampling. This selection method circumvents any bias towards particular regions, bolsters the resilience of the algorithm, and facilitates the comprehensive encapsulation of the global structure of the PCs, thereby furnishing a homogeneously dispersed set of points for subsequent feature extraction and analysis.
The objective of the integration layer is to merge the local PC features extracted from the sampling layer and construct a unified feature representation reflecting the global structure of the PCD. This is achieved through the attention mechanism, which provides comprehensive data support for subsequent analyses. This process employs spherical queries to delineate local regions and integrate these local features to capture multi-scale information. The PointNet layer, on the other hand, is mainly responsible for extracting features from the resulting PC collection and using it as input for subsequent tasks.
As one of the core components of PointNet++, the Set Abstraction module is responsible for extracting local features from the original PC. The module first uses FPS to select a set of scattered points as a“set”. For each center point, a local region is defined around it using Ball Query, and all points within this region are extracted from the original PC. PointNet++ realizes multi-scale feature extraction through multiple set abstraction modules, each module can have different radius and sampling density, thus capturing local features of different scales. After multi-scale local feature extraction, PointNet++ aggregates all local features through a global pooling layer to generate global feature vectors (FVs). This global FV contains the geometric and semantic information of the entire PC.
The study performs encoder construction of PC features based on PointNet++. The auto-correlation processing unit of the Transformer network is then utilized to reintegrate the encoder-extracted multi-resolution PC descriptions for auto-correlation feature extraction. In this case, the PC FVs are pre-calculated with tri-linear layer mapping before the attention calculation. The specific calculation formula is shown in Equation (1) [26,27].
(1)
In Equation (1), is the query vector. is the key-in vector used to determine the attention score. denotes the value vector containing the actual data information. denotes the feed-forward neural network. denotes the original FV input to the MLP. is the layer level. is the total point sets obtained from network sampling. After the attention weighting process, the PC FV expression formula is shown in Equation (2).
(2)
In Equation (2), denotes the Softmax function, it is mainly used to normalize the attention scores, ensuring that the weights of each FV sum to 1. denotes the matrix multiplication that measures the degree of match between the query and the key. denotes the scaling factor. denotes the dimension of the FV. Moreover, in Transformer network, the weighted feature expression formula obtained from PointNet++ output is shown in Equation (3).
(3)
In Equation (3), denotes the weighted features after processing through the self-attention layer. denotes the normalized network layer. denotes the attributes or metrics of the data points input to the model for performing learning. Equation (3) is used to compute the weighted FVs after the self-attention layer processing, which is used to further optimize the global consistency of the point cloud features. The final output of Transformer’s FV expression formula is shown in Equation (4).
(4)
Equation (4), denotes the FV of the final output of Transformer. denotes the feed-forward network of Transformer network. Therefore, the flow of the proposed encoder combining PointNet++ feature extraction with Transformer based PPCDO for decoder is shown in Fig 3.
[Figure omitted. See PDF.]
In Fig 3, both the encoder and decoder of PPCDO require multi-head self-attention (MHSA) and MLP implementation. Among them, MHSA ensures that the encoder and decoder process information in parallel at the same time. This improves the network’s ability to understand the real PC feature data and thus improves the generalization ability of the network. MHSA facilitates parallel processing by dividing the self-attention calculation into multi-heads, with each head focusing on a distinct feature subspace of the input data. In the encoder, MHSA parallel processing enhances the efficiency of feature extraction, whereas in the decoder, it enables the model to examine a multitude of potential feature fusion pathways in parallel and to optimize the distribution of PCD. By introducing PCD at different scales and from different perspectives during training, MHSA enables the network to learn more robust feature representations. While MLP mainly performs classification of image PC features during encoding and decoding. In the encoding phase, the MLP employs its multi-layer structure to non-linearly transform the PCD, thereby extracting high-level features that encompass both local and global structural information. In the decoding phase, the MLP inverses the high-level features, returning them to the original space for accurate PC classification. The MLP achieves accurate classification of PCD by learning complex nonlinear mappings that transform the abstract feature space into concrete category labels.
Fig 3(a) shows the overall encoding process of the feature encoder. The semantic information of the corresponding level of features is deepened by repeatedly accumulating the basic constructs of multiple levels and connecting the levels at each network level of PointNet++. In Fig 3(b), the decoder of PPCDO mainly performs inference computation on a set of FVs at multiple levels to optimize the distribution of the PC. It contains the Transformer network with repeated accumulation of multiple levels, and the number of repeated accumulations of the basic Transformer constructs is the same as that of PointNet++. The decoder module of the transformer facilitates the restoration of the FV output by the encoder to the spatial distribution of the PC. The features are then fused at multiple scales through the use of MHSA, which optimizes the detailed information of the PC distribution. Ultimately, the error between the optimized PPC and the actual PC is calculated by the chamfer distance (CD) loss function, and the network parameters are adjusted by back-propagation. In this case, the loss expression formula for the mean and variance between the FVs at the corresponding level between the decoder and the encoder is shown in Equation (5).
(5)
In Equation (5), denotes the mean value calculation on the set of FVs. is the set of FVs of PPC after processing by a certain layer of encoder. denotes the set of FVs of real PC after processing by a certain layer of encoder. denotes the standard deviation computation of the set of FVs. The loss function calculation formula for PPC after final optimization is shown in Equation (6).
(6)
In Equation (6), denotes the CD loss function. denotes CD. denotes the optimized PPCDset. denotes a PC in . denotes real PCDset. denotes a PC in . is the squared Euclidean distance between and . The overall loss calculation formula for PPC and real PC is shown in Equation (7).
(7)
In Equation (7), denotes the full number of feature layers of the PC in the network. The study utilizes adaptive supervision with multiple layers to modulate the designed decoder. To guarantee the convergence speed and effectiveness of the network training process, various supervised signal levels are utilized to enhance performance [28,29]. Meanwhile, high performance end-to-end network is constructed by comprehensively training the crosstalk between encoder and decoder. This increases target identification accuracy and allows PPC feature distribution to be optimized without the need for an actual PC.
2.2. Monocular vision measurement based on pseudo point cloud distribution optimization
Based on the previously proposed PPCDO method, the study further proceeds with the design of the transmission pole tower external breakage hidden danger monocular vision measurement method. Since the process of monocular vision predicting the depth information of the scene is not sensitive to the classification of foreground and background, the study proposes an optimization method focusing on the depth of foreground targets. A 2D target detector is used to replace the 2D instance segmentation and the network is trained using uncertainty learning (UL) [30]. The 2D target detector can effectively detect foreground targets in images with complex background situations and can clearly distinguish between foreground and background. The core idea of this stage is to extract the target region in the image by 2D target detection and estimate its depth using deep learning models [31]. Compared with the traditional method, the 2D instance segmentation method can deal with the target boundary more accurately, thus improving the depth optimisation of the foreground target. The study first uses the CenterMask network as the instance segmentation model for the proposed method in order to improve the joint training effect with UL. The specific framework is shown in Fig 4.
[Figure omitted. See PDF.]
Fig 4 illustrates the structure of CenterMask network, which consists of spatial attention module (SAM) and feature channel attention module (FCAM). First, the input image is extracted from the feature image by the backbone network and fed into the feature pyramid network (FPN) to obtain the multi-scale feature image. Then, the image features are localized by the predictive bounding box module of the full convolutional single-stage target detection (FCOS). Finally, the image segmentation is completed by the spatial attention-guided mask (SAG-Mask). This process not only effectively extracts the target features in the image, but also enhances the ability of the model to recognize the target region through the attention mechanism. After the CenterMask network has completed the instance segmentation, UL trains and learns the model. It primarily trains the model’s judgment and detection skills, taking into account the size, kind, and pixel location of the image detection target, among other details. The particular procedure is depicted in Fig 5.
[Figure omitted. See PDF.]
In Fig 5, the study first generates focus ranges based on each estimated obtained 2D target image. Moreover, the deep image, which is pre-estimated and measured, is integrated with the 2D instance cut image. On this basis, the foreground target region obtained from the depth image with the 2D instance cut is based on the depth image. The study introduces a geometrically constrained geometrically constrained depth refinement (GCDR) module to realize the UL learning of the model [32,33]. GCDR improves the stability of GCDR by refining the depths of close objects and optimizing the depth estimates of distant objects. For long-range targets, the GCDR module is able to optimise the depth estimation of the target and improve the depth accuracy of long-range targets. It is assumed that the 3D height prediction of each object is generated by adding a 3D height regression head to the CentreMask model, and the 3D height has Laplace distribution randomness [34]. By combining the 3D height prediction of the target and the depth regression head, the randomness of the Laplace distribution is exploited for the correction of depth estimation. Therefore, the probability density function of the Laplace random variable is shown in Equation (8).
(8)
In Equation (8), is the distribution scale parameter. denotes the logarithm. Moreover, in order to detect the deep position information of the target at a long distance, the study introduces a distance sensitivity factor (DSF) as a constraint term penalty. The specific calculation formula is shown in Equation (9).
(9)
In Equation (9), denotes the DSF. denotes the average value of depth values in a single space at depth where the target is located. denotes the number of depth spaces. and denote the minimum and maximum spatial depth values, respectively. The formula for calculating the loss function for each 3D physical height obtained from model training is then shown in Equation (10).
(10)
In Equation (10), denotes the 3D height of the target. denotes the randomness of the initial output result. denotes the initial output of the 3D regression head. denotes the physical height of the object labeled with its true value. After obtaining the 3D spatial coordinate information of the target using the depth prediction module of CenterMask, the study further utilizes the spatial transformation function to calculate the depth spatial distribution of the target. Equation (11) displays the particular calculating formula.
(11)
In Equation (11), denotes the focal length of the camera lens. denotes the projection height of the target in the depth image. denotes the Laplace random variable. and denote the parameters of the Laplace distribution. denotes the mean value of the redundant bias term generated by a pre-estimated depth. denotes the average depth. The final predicted depth Laplace distribution of the target is calculated as shown in Equation (12).
(12)
In Equation (12), denotes the Laplace distribution function. denotes the product of UL learning standard deviation and DSF. The final UL of the depth distribution mainly consists of the random metric of the projection and the random metric of the deviation obtained by learning. Therefore, the depth refinement loss function of the GCDR module is updated as shown in Equation (13).
(13)
In Equation (13), denotes the actual depth value of the labeled target. Furthermore, the objective is to enhance the sensitivity to the distance and position information of the target in 3D space with respect to a reference point or reference plane. To this end, the natural exponential function is being investigated as a means of calculating the depth information of the target’s distance in three-dimensional space, which is then converted to the confidence level of the coordinate system. Moreover, the depth information within the target 2D segmentation is calculated. The specific calculation formula is shown in Equation (14).
(14)
In Equation (14), denotes the depth formulation score. denotes the expectation function. denotes the final refined depth estimate. Therefore, combining the above, the flow of the study’s proposed monocular vision-based method for real-time measurement of the spatial distance of transmission pole tower external breakage hidden danger is shown in Fig 6.
[Figure omitted. See PDF.]
Fig 6 illustrates the complete process of the real-time measurement method for the spatial distance to external break hazards of transmission towers based on monocular vision. The method commences with the acquisition of the PCD of the transmission lines, subsequently followed by a series of pre-processing steps that include noise reduction and contrast enhancement. Concurrently, the optimized depth of the foreground targets is integrated with the original images to generate the PPCD. Secondly, based on the previously proposed Transformer, PCD optimization of transmission towers is performed using PPCDO and further PC key features such as wire edges, insulator locations, etc. are extracted. Moreover, the feature information is used for further hidden hazard analysis, such as identifying problems such as broken wires and damaged insulators. By extracting the visual features of the transmission tower from monocular images, candidate target regions are generated on the fused features. Moreover, it is projected into the depth map to further extract the PCs of the optic vertebrae.On this basis, the PCD is bis-classified by using PointNet or PointNet++, the 3D bounding box of the image detection target of the transmission towers is estimated. Furthermore, the candidate regions are used for the external breakage hidden danger of the transmission towers classification and identification. Finally, the transmission pole tower target detection and output are performed by point-voxel region convolutional neural networks (PVRCNN).
3. Results
To validate the effectiveness of the proposed monocular vision based spatial distance measurement method for transmission tower external breakage hidden danger, the study firstly validates the Transformer based PPCDO method based on Transformer is validated. Second, the proposed monocular vision measurement method is validated and analyzed.
3.1. Experimental verification of relevant parameters
The study utilizes Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago (KITTI) for experiments and analysis. The dataset includes a wide variety of sensor data such as high-resolution color and grayscale video, LIDAR scans, localization, and more. This is of great application in developing and evaluating stereo vision, 3D object detection and 3D tracking. Each image in the KITTI dataset is labeled with an accurate 3D bounding box, including information such as the category, location, size, and direction of the object. This labeling information provides accurate supervision signals for training and evaluation of the 3D target detection model. The dataset contains 14999 images. It is split up into three sets for the study: 3712 photos for training, 3769 images for validation, and 7518 images for testing. In addition, the dataset is split into three categories: simple, medium, and challenging, based on the picture recognition complexity, occlusion range, and pixel value. The specific definitions are shown in Fig 7.
[Figure omitted. See PDF.]
In Fig 7, the study categorizes the KITTI dataset into three validation levels of easy, medium, and difficult based on the target image height, visibility, and target truncation, etc. At the same time, the research uses Jaccard index (intersection over union, IoU), CD, average precision (AP), precision rate and recall rate as verification indicators, and verifies and evaluates the performance of the proposed method. All images are normalized prior to input into the model to reduce the influence of illumination changes. Specifically, each pixel value of the image is normalized to the range [0,1]. The PCD obtained by LiDAR scanning is first converted into points in the 3D coordinate system, and then the number of points is reduced by voxelization for subsequent processing. The pre-processing steps are mainly realized by Python language and OpenCV library. The processing of PC depends on PC library.
The model’s PointNet++ uses two Set Abstraction layers, and each layer contains 512 points. Transformer: Six Transformer blocks are used, and each block contains eight attention heads. Convolution layer: 4 convolution layers are used, and each convolution layer has 32 3x3 filters. Fully connected layer: Two fully connected layers are used. Moreover, the first fully connected layer has 128 neurons and the second fully connected layer has 64 neurons. The experimental hardware environment is GeForce RTX 3090 GPU, and the memory size is 48 GB. The CPU is Intel Core i9 and the RAM is 512GB. The operating system is Ubuntu 20.04 LTS and the programming language is Python 3.8.
3.2. Validation of transformer-based ppcdo approach
To verify the validity of PPC distribution on the basis of Transformer network, PointNet++ is firstly utilized as the main basic network for data feature acquisition. Moreover, Transformer is added to the feature extraction unit of each PC collection for image global feature information acquisition. Meanwhile, the study utilizes the validation set of KITTI dataset for experiments. The encoder is trained on the training set of the KITTI data set for 100 epochs. The cosine annealing algorithm is used as the model training strategy. The initial learning rate is set to 0.0001, and the learning rate of every 10 epochs is attenuated to the original 0.1. Adam: The default parameters is used, β1 = 0.9 and β2 = 0.999. Firstly, the training efficiency and loss curve is compared between using Transformer module to optimize PPC distribution and not optimizing it. As shown in Fig 8.
[Figure omitted. See PDF.]
Fig 8(a) shows the change curve of model training loss before and after optimization. It can be seen that as the number of iterations increases, the loss values of both models keep decreasing and gradually converge. When the number of iterations is about 60, the loss curve of the Transforme model converges. The Transformer optimised model proposed in this study does not show overfitting and its loss value is lower than the unoptimised model after 10 iterations. This shows that the optimization strategy is effective and reliable. Concurrently, the convergence velocity of the proposed model is less rapid than that of the non-optimized model. This discrepancy may be attributed to the fact that the non-improved model is trained using pre-trained weights, whereas the proposed model is primarily trained through comprehensive retraining. Comparing the AP values of the two models in Fig 8(b), it can be concluded that the optimized Transformer model has better AP values than the unimproved model. Second, the CD error between PPC and PC is compared between using Transformer module to optimize PPC distribution and not optimizing it, as shown in Table 1.
[Figure omitted. See PDF.]
In Table 1, when the IoU threshold in 3D target detection is 0.7, the AP obtained from the model after optimizing the PPC distribution using the Transformer module is significantly better than the un-optimized results. The output results under bird’s eye view (BEV) also demonstrate the performance of PPCDO based on Transformer. In contrast, comparing the CD errors of the two methods before and after applying Transformer, it can be observed that the CD error between PPC and real PC decreases by 32.23% after optimization using Transformer. This indicates that the optimization of PPC distribution using Transformer network can improve the correlation of different points in the corona and reduce the gap between PPC and real PC. At the same time, the data indicates that the degree of correspondence between the optimized PPC and the actual PC is greater, indicating that each point in the PC is closer to its corresponding point in the target PC. The lower CD error indicates that the PC generation process preserves more detailed information, including edges, corners, and surface textures. Based on this, the study goes on to present the F-PointNet target detector for comparison with the proposed optimization approach. The particular outcomes are displayed in Fig 9.
[Figure omitted. See PDF.]
Fig 9(a) The validation results under IoU3D = 0.7 show that after the optimization of PPC using Transformer, the outputs of both target detectors, F-PointNet and PVRCNN, are significantly better than the un-optimized results. However, the proposed PVRCNN+Transformer method of the study increases the AP values by 1.35%, 1.50%, and 1.33% over F-PointNet+Transformer for the three detection levels, respectively. Fig 9(b) shows the comparison of the validation results under the threshold of IoUBEV = 0.7. The PVRCNN+Transformer method proposed in the study is still superior. This indicates that the optimization of PPC distribution using Transformer network has validity and reliability. On this basis, the study further qualitatively analyzes the proposed method using images from the KITTI dataset. Fig 10 displays the particular outcomes.
[Figure omitted. See PDF.]
In Fig 10, after the Transformer optimization, the shape of the vehicle contour is more obvious and the PC features are more easily captured. This indicates that the study proposes that the optimization of PPC using Transformer network can effectively improve the target PC features, thus improving the detection efficiency of the 3D target detector.
3.3. Validation of monocular vision measurement method based on pseudo point cloud distribution optimization
The efficiency of the suggested Transformer model to optimize the PPC distribution is confirmed by the earlier validation findings. To further confirm the effectiveness of the proposed monocular vision measurement method, the study introduces the popular algorithms in the current KITTI test set for quantitative analysis and comparison. These mainly include depth regression model (DRM), Did-m3d, feature aggregation strategy (FAS) based 3D target detection and stereo multi-granularity 3D (SGM3D) [35–38]. When IoU3D=0.7, the evaluation results obtained from 10 trials of different detection methods in three detection levels are shown in Fig 11.
[Figure omitted. See PDF.]
Fig 11 shows the validation results of the different methods at a single level of the KITTI test set for IoU3D = 0.7. In Fig 11(a), the results of 10 experiments of the proposed method under study are at the highest value. With the increase of the dataset level, the test results of all the five methods in Fig 11(b) and Fig 11(c) moderate and difficult levels are significantly decreased. Meanwhile, the study further compares the results of 10 tests of different methods at IoUBEV = 0.7. This is specifically shown in Fig 12.
[Figure omitted. See PDF.]
Comparing the AP values of 10 measurements of different methods in Fig 12(a), it can be noticed that at IoUBEV = 0.7, the AP value of the study’s proposed measurement method is about 91.03%. Combining the detection results in Fig 12(b) and Fig 12(c), the validity of the study’s proposed method can be further confirmed. Table 2 displays the AP values derived from the five approaches that is tested ten times.
[Figure omitted. See PDF.]
In Table 2, when IoU3D = 0.7, the study of the proposed method increases the AP value under simple setting by 10.71% on average compared to other methods. Whereas, the study of the proposed method under difficult setting has increased by 2.51%, 1.92%, 1.47%, and 3.03% than the other methods, respectively. However, at IoUBEV = 0.7, the study of the proposed method increases by an average of 16.41%, 14.76%, and 7.85% under different difficulty settings than the other methods, respectively. This indicates that after optimizing the depth of the foreground target and then using Transformer for PPCDO, more accurate and effective depth information can be obtained, which leads to more desirable 3D target detection results. It can be inferred that as the complexity of the scene increases, the efficacy of the proposed model diminishes. This phenomenon may be attributed to the fact that the target object in the simple scene is more discernible and the background interference is minimal, thereby enabling the model to extract the target features with greater precision. However, in moderate and difficult scenes, the target object has severe occlusion, illumination change, or complex background, which poses a great challenge to the model in feature extraction and depth estimation. The precision, recall, F1 score and false positive rate (FPR) of DRM, Did-m3d, FAS, SGM3D, and the methods proposed in this study are shown in Table 3.
[Figure omitted. See PDF.]
In Table 3, the accuracy, recall, and F1 score of the proposed method are superior to other methods in different difficulty scenes. Moreover, the detection performance is obviously superior in simple and medium difficulty scenes. This shows that this method has high accuracy and robustness in dealing with 3D target detection tasks of varying difficulty. Comparing the FPR of the five methods, the FPR of DRM is the highest at 15%. This indicates that there is a 15% chance that the DRM method will incorrectly predict all negative samples as positive samples. This may indicate that the DRM method faces challenges in accurately distinguishing between positive and negative samples, especially in complex or challenging scenarios. The DRM method relies primarily on a deep regression model and is sensitive to occlusion, illumination changes, and background clutter in complex scenes. In medium and high complexity scenes, the target object may be partially occluded or appear close to the background depth. This can hinder the model’s ability to accurately distinguish the target from the background, thereby reducing the detection accuracy. The feature fusion method of DID-M3D is simple, and it can’t effectively deal with the deep overlap between the backgrounds of the target area. The FAS method ignores the comprehensive consideration of multi-scale features and local features when dealing with complex scenes. It is difficult to effectively deal with local occlusion and background interference of targets. The FPR of the proposed method is 8.90%, which is the lowest among all methods, indicating that it has the best performance in controlling false positives.
From the above results, it can be seen that the monocular vision measurement method based on the optimisation of PPC allocation shows obvious advantages in different difficulty scenarios. Compared with DRM, Did-m3d, FAS, and SGM3D, the proposed method has higher AP values at all difficulty levels. The greatest improvement in detection performance was achieved in the “difficult” scenario. This indicates that the depth estimation and target detection models optimised for PPC allocation can provide higher accuracy and robustness when dealing with complex backgrounds, target occlusion and illumination changes. Furthermore, the proposed method effectively reduces false predictions and improves detection efficiency, underscoring its strong application potential in real-world monocular vision measurement tasks.
3.4. Wire pole tower external breakage hidden danger example validation
The study uses the inspection data from China Guangdong Power Grid Company as the training set and test set in order to further validate the applicability of the suggested strategy. Its tower categories mainly include V-tower and T-tower. After manually calibrating the positions of the tower PC and PL PC, the collected PCD is expanded using operations such as translation, cropping, mirroring and rotation. Finally, 18840651 experimental data points is obtained. The study categorizes the data into 3 groups of data based on the topography of the area where the data is collected and the type of tower. Experimental data group 1 is mainly dry and T-shaped towers. Its width is 247.15m, length is 6666.89m, and the number of PCs is 15479342. Experimental data group 2 is mainly ram’s horn tower. Its width is 64.38m, length is 1606.63m, and the PCs is 1165449. The experimental data 3 groups are mainly dry and T-shaped towers. Its width is 454.06m, length is 3973.31m, and the number of PCs is 2195.860. At the same time, the training set of the previously mentioned KITTI dataset is utilized to train the proposed method model. The study initially evaluates the effectiveness of the suggested approach for the extraction of transmission towers and PLs for three sets of data using the experimental data conditions mentioned above. The specific results are shown in Fig 13.
[Figure omitted. See PDF.]
The results of the study’s suggested method for extracting PL points for each of the three data groups are compared with the points that is manually calibrated in Fig 13(a). The suggested method’s extraction accuracy in experimental data group 1 is 95.30%. For experimental data groups 2 and 3, the extraction accuracy is 96.77% and 97.81%, respectively. The lower extraction accuracy in experimental data group 1 may be due to the fact that the topography of the region is more complex, with greater undulations and vertical shading. In contrast, the topography of the experimental area in data group 2 and data group 3 is relatively flatter, with less occlusion, higher PC density and less background interference. Therefore, the PCD of the groups 2 and 3 are more complete and the extraction accuracy is higher, which makes the PC extraction method more effective in identifying and extracting targets. In Fig 13(b), the extraction accuracy of the proposed method of the study is 91.95%, 95.83%, and 93.90% in the extraction results of transmission pole tower points for the three sets of test data, respectively. This indicates that the measurement method proposed by the study is able to extract the PCD of transmission lines and transmission towers better, especially for flat terrain. The comparison results of DRM, Did-m3d, FAS, SGM3D and the methods proposed in this study are shown in Table 4.
[Figure omitted. See PDF.]
In Table 4, the accuracy of the proposed method is higher than other methods in all datasets, with an average accuracy of 95.23%. The preprocessing time of the proposed method is 0.48 s/frame, which is the lowest among all the methods. This shows that this method is more efficient in the data preprocessing stage. This may be because the processing flow of PC data is optimized. Meanwhile, it also shows that the proposed method can extract and analyze the PCD of transmission towers more efficiently in practical application. Therefore, the study further conducted a qualitative analysis of examples. The Guangzhou Bureau’s 110kV Helongzhong line is used as an example to demonstrate the efficacy of the suggested technique for determining the spatial separation between transmission towers. It is specifically shown in Fig 14.
[Figure omitted. See PDF.]
Fig 14 shows that the distance measurement result of 110kV Hap long center line by the proposed method of the study is 14.87m, while the actual distance of the transmission tower is 15.36m. The accuracy of the measurement is 96.81%, and the error is 0.49m. This shows that the real-time measurement method proposed by the study can effectively measure the distance of the transmission tower. Moreover, after optimizing the PPC distribution using Transformer, the detection target is more easily captured by the 3D detector.
4. Discussion
The real-time spatial distance measurement method for transmission tower external breakage hazards proposed in this study demonstrated significant improvements over other methods proposed by scholars, such as DRM, Did-m3d, FAS, and SGM3D. On average, it increased by 16.41%, 14.76%, and 7.85% in different difficulty levels, respectively. The precision rates in three scenarios reached as high as 91.15%, 83.26%, and 70.97%. The measurement method proposed in this study also showed superior performance in the validation with real data of transmission tower poles. This method adopted the Transformer network as the core of PPCDO. The Transformer network could effectively capture global dependencies through self-attention mechanisms, which was crucial for improving the global consistency and local detail accuracy of PPC. To address the inaccuracy of depth estimation for foreground targets in monocular vision, this study proposed a depth optimization method based on 2D instance segmentation and geometric constraints. By optimizing the depth of foreground targets, it improved the accuracy of foreground targets in PPC, thereby improving the overall detection efficiency.
It is noteworthy that the study revealed that a low learning rate facilitates the model’s stable convergence during the initial training phase. However, with the advancement of training, an appropriate increase in the learning rate could expedite convergence and enhance the final detection accuracy. A larger batch size could facilitate more stable gradient estimation. However, it also entailed an increase in memory consumption. Through experiments, it was found that moderate batch size could not only ensure the training efficiency of the model, but also did not cause memory overflow. The adjustment of the super-parameters had a crucial influence on the model performance. In the follow-up work, the optimization and adjustment of model super-parameters will be strengthened to improve the model performance.
5. Conclusion
The study aimed to improve the detection efficiency of external damage hazards to transmission towers and the accuracy of 3D target detection. By proposing a real-time spatial distance measurement method based on monocular vision, the research not only achieved efficient monitoring of external hazards to transmission towers, but also significantly improved the feature representation, resolution, and depth information of target PC features. In this study, a Transformer network was used to optimize the distribution of PPC, and a foreground target depth optimization method based on a 2D detector was proposed. The training effect of the model was further improved by UL. The validation on the KITTI dataset showed that the optimized PPC distribution could more accurately capture target point cloud features, thereby improving the detection efficiency of the 3D target detector. Fig 9 reveals the superiority of the proposed method of the study in combination with PointNet++. Compared to F-PointNet+Transformer, the AP values of the proposed method in this study are improved by 1.35%, 1.50% and 1.33% at the three detection levels, respectively. Combined with the performance comparison test of different methods in Table 2, it can be seen that the average AP values of the proposed method of the study are improved by 16.41%, 14.76% and 7.85% compared to the other methods for different difficulty settings of IoUBEV = 0.7, respectively. The above results show that by optimising the depth information of foreground targets, the accuracy of foreground targets in PPC can be improved, thus increasing the overall detection efficiency. In the actual validation with transmission tower data, the method proposed in this study achieved an average accuracy rate of 96.56% and 93.89% in extracting PL points and transmission tower points, respectively, further proving the effectiveness of the method.
Nevertheless, the efficacy of the method presented in this study is contingent upon the accessibility of training data and the necessity for manual annotation. In future work, intelligent algorithms will be considered for the design of semi-supervised or self-supervised detection algorithms, with the aim of enhancing their application value in practical power engineering. Secondly, although the study has yielded promising results on the KITTI dataset and actual transmission tower data, its performance on other types of datasets remains to be validated, which will be addressed in future work.
Supporting information
S1 File. Minimal data set.
https://doi.org/10.1371/journal.pone.0326254.s001
(DOCX)
References
1. 1. Ezeigweneme CA, Nwasike CN, Adefemi A, Adegbite AO, Gidiagba JO. Smart grids in industrial paradigms: a review of progress, benefits, and maintenance implications: analyzing the role of smart grids in predictive maintenance and the integration of renewable energy sources, along with their overall impact on the industri. Eng Sci Technol J. 2024;5(1):1–20.
* View Article
* Google Scholar
2. 2. Charles D. The Lead-Lag Relationship Between International Food Prices, Freight Rates, and Trinidad and Tobago’s Food Inflation: A Support Vector Regression Analysis. GLCE. 2023;1(2):94–103.
* View Article
* Google Scholar
3. 3. Jiang J-A, Chiu H-C, Yang Y-C, Wang J-C, Lee C-H, Chou C-Y. On Real-Time Detection of Line Sags in Overhead Power Grids Using an IoT-Based Monitoring System: Theoretical Basis, System Implementation, and Long-Term Field Verification. IEEE Internet Things J. 2022;9(15):13096–112.
* View Article
* Google Scholar
4. 4. Tarahi H, Haghighat H, Ghandhari N, Adinehpour F. Smart Online Protection System for Power Transmission Towers: An IoT-Aided Design and Implementation. IEEE Internet Things J. 2023;10(9):7480–9.
* View Article
* Google Scholar
5. 5. Zong C, Wan Z. Container ship cell guide accuracy check technology based on improved 3d point cloud instance segmentatioN. brod. 2022;73(1):23–35.
* View Article
* Google Scholar
6. 6. Hui L, Cheng M, Xie J, Yang J, Cheng M-M. Efficient 3D Point Cloud Feature Learning for Large-Scale Place Recognition. IEEE Trans Image Process. 2022;31:1258–70. pmid:34982682
* View Article
* PubMed/NCBI
* Google Scholar
7. 7. Ozendi M, Akca D, Topan H. A point cloud filtering method based on anisotropic error model. The Photogrammetric Record. 2023;38(184):460–97.
* View Article
* Google Scholar
8. 8. Tang TY, De Martini D, Newman P. Point-based metric and topological localisation between lidar and overhead imagery. Auton Robot. 2023;47(5):595–615.
* View Article
* Google Scholar
9. 9. Zhang Z, Chen J, Xu X, Liu C, Han Y. Hawk‐eye‐inspired perception algorithm of stereo vision for obtaining orchard 3D point cloud navigation map. CAAI Trans on Intel Tech. 2022;8(3):987–1001.
* View Article
* Google Scholar
10. 10. Yue X, Zhang Y, Chen J, Chen J, Zhou X, He M. LiDAR-based SLAM for robotic mapping: state of the art and new frontiers. Ind Robot. 2024;51(2):196–205.
* View Article
* Google Scholar
11. 11. Hao J, Li X, Wu H, Yang K, Zeng Y, Wang Y, et al. Extraction and analysis of tree canopy height information in high-voltage transmission-line corridors by using integrated optical remote sensing and LiDAR. Geodesy and Geodynamics. 2023;14(3):292–303.
* View Article
* Google Scholar
12. 12. Mohsan SAH, Othman NQH, Li Y, Alsharif MH, Khan MA. Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends. Intell Serv Robot. 2023;16(1):109–37. pmid:36687780
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Liu X, Shuang F, Li Y, Zhang L, Huang X, Qin J. SS-IPLE: Semantic Segmentation of Electric Power Corridor Scene and Individual Power Line Extraction From UAV-Based Lidar Point Cloud. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2023;16:38–50.
* View Article
* Google Scholar
14. 14. Gazzea M, Pacevicius M, Dammann DO, Sapronova A, Lunde TM, Arghandeh R. Automated Power Lines Vegetation Monitoring Using High-Resolution Satellite Imagery. IEEE Trans Power Delivery. 2022;37(1):308–16.
* View Article
* Google Scholar
15. 15. Shokri D, Rastiveis H, Sarasua WA, Shams A, Homayouni S. A Robust and Efficient Method for Power Lines Extraction from Mobile LiDAR Point Clouds. PFG. 2021;89(3):209–32.
* View Article
* Google Scholar
16. 16. Al-Najjar A, Amini M, Rajan S, Green JR. Identifying Areas of High-Risk Vegetation Encroachment on Electrical Powerlines Using Mobile and Airborne Laser Scanned Point Clouds. IEEE Sensors J. 2024;24(14):22129–43.
* View Article
* Google Scholar
17. 17. García-Fernández M, Álvarez-Narciandi G, Heras FL, Álvarez-López Y. Comparison of Scanning Strategies in UAV-Mounted Multichannel GPR-SAR Systems Using Antenna Arrays. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2024;17:3571–86.
* View Article
* Google Scholar
18. 18. Li C, Su RKL, Pan X. Assessment of out‐of‐plane structural defects using parallel laser line scanning system. Computer aided Civil Eng. 2023;39(6):834–51.
* View Article
* Google Scholar
19. 19. Hosseini MM, Umunnakwe A, Parvania M, Tasdizen T. Intelligent Damage Classification and Estimation in Power Distribution Poles Using Unmanned Aerial Vehicles and Convolutional Neural Networks. IEEE Trans Smart Grid. 2020;11(4):3325–33.
* View Article
* Google Scholar
20. 20. Chen M, Chan TO, Wang X, Luo M, Lin Y, Huang H, et al. A risk analysis framework for transmission towers under potential pluvial flood - LiDAR survey and geometric modelling. International Journal of Disaster Risk Reduction. 2020;50:101862.
* View Article
* Google Scholar
21. 21. Chen Y, Huang S, Liu S, Yu B, Jia J. DSGN++: Exploiting Visual-Spatial Relation for Stereo-Based 3D Detectors. IEEE Trans Pattern Anal Mach Intell. 2022;45(4):4416–29. pmid:35939470
* View Article
* PubMed/NCBI
* Google Scholar
22. 22. Li X, Kong D. SRIF-RCNN: Sparsely represented inputs fusion of different sensors for 3D object detection. Appl Intell. 2023.
* View Article
* Google Scholar
23. 23. Liu K, Gao Z, Lin F, Chen BM. FG-Net: A Fast and Accurate Framework for Large-Scale LiDAR Point Cloud Understanding. IEEE Trans Cybern. 2022;53(1):553–64. pmid:35417363
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Li D, Shi G, Li J, Chen Y, Zhang S, Xiang S, et al. PlantNet: A dual-function point cloud segmentation network for multiple plant species. ISPRS Journal of Photogrammetry and Remote Sensing. 2022;184:243–63.
* View Article
* Google Scholar
25. 25. Chen Y, Li Z, Li Q, Zhang M. Pose estimation algorithm based on point pair features using PointNet ++. Complex Intell Syst. 2024;10(5):6581–95.
* View Article
* Google Scholar
26. 26. Tang K, Chen Y, Peng W, Zhang Y, Fang M, Wang Z, et al. RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception. Vis Comput. 2023;39(11):5577–88.
* View Article
* Google Scholar
27. 27. Jin C, Wu T, Zhou J. Multi-grid representation with field regularization for self-supervised surface reconstruction from point clouds. Computers & Graphics. 2023;114:379–86.
* View Article
* Google Scholar
28. 28. Xiao A, Huang J, Guan D, Zhang X, Lu S, Shao L. Unsupervised Point Cloud Representation Learning With Deep Neural Networks: A Survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(9):11321–39. pmid:37030870
* View Article
* PubMed/NCBI
* Google Scholar
29. 29. Hafiz AM, Bhat RUA, Parah SA, Hassaballah M. SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images. Pattern Anal Applic. 2023;26(3):1291–302.
* View Article
* Google Scholar
30. 30. Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res. 2023;35(11):2363–97. pmid:37682491
* View Article
* PubMed/NCBI
* Google Scholar
31. 31. Li X, Bagher‐Ebadian H, Gardner S, Kim J, Elshaikh M, Movsas B, et al. An uncertainty‐aware deep learning architecture with outlier mitigation for prostate gland segmentation in radiotherapy treatment planning. Medical Physics. 2023;50(1):311–22.
* View Article
* Google Scholar
32. 32. Qi X, Liu Z, Liao R, Torr PHS, Urtasun R, Jia J. GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation. IEEE Trans Pattern Anal Mach Intell. 2022;44(2):969–84. pmid:32870785
* View Article
* PubMed/NCBI
* Google Scholar
33. 33. Xu Q, Kong W, Tao W, Pollefeys M. Multi-Scale Geometric Consistency Guided and Planar Prior Assisted Multi-View Stereo. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4945–63. pmid:35984800
* View Article
* PubMed/NCBI
* Google Scholar
34. 34. Ge C, Zhang R, Jiang Y, Li B, He Y. A 3-D Dynamic Non-WSS Cluster Geometrical-Based Stochastic Model for UAV MIMO Channels. IEEE Trans Veh Technol. 2022;71(7):6884–99.
* View Article
* Google Scholar
35. 35. Kim Y, Kim S, Sim S, Choi JW, Kum D. Boosting Monocular 3D Object Detection With Object-Centric Auxiliary Depth Supervision. IEEE Trans Intell Transport Syst. 2022;1–13.
* View Article
* Google Scholar
36. 36. Peng L, Wu X, Yang Z, Liu H, Cai D, Did-m3d: decoupling instance depth for monocular 3d object detection. ECCV. 2022;13661(1):71–88.
* View Article
* Google Scholar
37. 37. Zhou D, Song X, Fang J, Dai Y, Li H, Zhang L. Context-Aware 3D Object Detection From a Single Image in Autonomous Driving. IEEE Trans Intell Transport Syst. 2022;23(10):18568–80.
* View Article
* Google Scholar
38. 38. Zhou Z, Du L, Ye X, Zou Z, Tan X, Zhang L, et al. SGM3D: Stereo Guided Monocular 3D Object Detection. IEEE Robot Autom Lett. 2022;7(4):10478–85.
* View Article
* Google Scholar
Citation: Liao R, Li D, Li C, Sun W, Liu G, Wang C (2025) Real-time measurement of spatial distance to external breakage hazards of transmission pole tower based on monocular vision. PLoS One 20(7): e0326254. https://doi.org/10.1371/journal.pone.0326254
About the Authors:
Ruchao Liao
Roles: Conceptualization, Data curation, Writing – original draft
E-mail: [email protected]
Affiliation: Guangdong Power Grid Co., Ltd., Guangzhou, China
ORICD: https://orcid.org/0009-0004-5583-1730
Duanjiao Li
Roles: Data curation, Investigation, Resources
Affiliation: Guangdong Power Grid Co., Ltd., Guangzhou, China
Changyu Li
Roles: Formal analysis, Project administration, Resources
Affiliation: Guangdong Power Grid Co., Ltd., Guangzhou, China
Wenxing Sun
Roles: Funding acquisition, Methodology, Supervision
Affiliation: Guangdong Power Grid Co., Ltd., Guangzhou, China
Gao Liu
Roles: Funding acquisition, Project administration, Validation
Affiliation: Guangdong Power Grid Co., Ltd., Guangzhou, China
Cong Wang
Roles: Software, Supervision, Validation
Affiliation: Guangdong Power Grid Co., Ltd., Guangzhou, China
1. Ezeigweneme CA, Nwasike CN, Adefemi A, Adegbite AO, Gidiagba JO. Smart grids in industrial paradigms: a review of progress, benefits, and maintenance implications: analyzing the role of smart grids in predictive maintenance and the integration of renewable energy sources, along with their overall impact on the industri. Eng Sci Technol J. 2024;5(1):1–20.
2. Charles D. The Lead-Lag Relationship Between International Food Prices, Freight Rates, and Trinidad and Tobago’s Food Inflation: A Support Vector Regression Analysis. GLCE. 2023;1(2):94–103.
3. Jiang J-A, Chiu H-C, Yang Y-C, Wang J-C, Lee C-H, Chou C-Y. On Real-Time Detection of Line Sags in Overhead Power Grids Using an IoT-Based Monitoring System: Theoretical Basis, System Implementation, and Long-Term Field Verification. IEEE Internet Things J. 2022;9(15):13096–112.
4. Tarahi H, Haghighat H, Ghandhari N, Adinehpour F. Smart Online Protection System for Power Transmission Towers: An IoT-Aided Design and Implementation. IEEE Internet Things J. 2023;10(9):7480–9.
5. Zong C, Wan Z. Container ship cell guide accuracy check technology based on improved 3d point cloud instance segmentatioN. brod. 2022;73(1):23–35.
6. Hui L, Cheng M, Xie J, Yang J, Cheng M-M. Efficient 3D Point Cloud Feature Learning for Large-Scale Place Recognition. IEEE Trans Image Process. 2022;31:1258–70. pmid:34982682
7. Ozendi M, Akca D, Topan H. A point cloud filtering method based on anisotropic error model. The Photogrammetric Record. 2023;38(184):460–97.
8. Tang TY, De Martini D, Newman P. Point-based metric and topological localisation between lidar and overhead imagery. Auton Robot. 2023;47(5):595–615.
9. Zhang Z, Chen J, Xu X, Liu C, Han Y. Hawk‐eye‐inspired perception algorithm of stereo vision for obtaining orchard 3D point cloud navigation map. CAAI Trans on Intel Tech. 2022;8(3):987–1001.
10. Yue X, Zhang Y, Chen J, Chen J, Zhou X, He M. LiDAR-based SLAM for robotic mapping: state of the art and new frontiers. Ind Robot. 2024;51(2):196–205.
11. Hao J, Li X, Wu H, Yang K, Zeng Y, Wang Y, et al. Extraction and analysis of tree canopy height information in high-voltage transmission-line corridors by using integrated optical remote sensing and LiDAR. Geodesy and Geodynamics. 2023;14(3):292–303.
12. Mohsan SAH, Othman NQH, Li Y, Alsharif MH, Khan MA. Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends. Intell Serv Robot. 2023;16(1):109–37. pmid:36687780
13. Liu X, Shuang F, Li Y, Zhang L, Huang X, Qin J. SS-IPLE: Semantic Segmentation of Electric Power Corridor Scene and Individual Power Line Extraction From UAV-Based Lidar Point Cloud. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2023;16:38–50.
14. Gazzea M, Pacevicius M, Dammann DO, Sapronova A, Lunde TM, Arghandeh R. Automated Power Lines Vegetation Monitoring Using High-Resolution Satellite Imagery. IEEE Trans Power Delivery. 2022;37(1):308–16.
15. Shokri D, Rastiveis H, Sarasua WA, Shams A, Homayouni S. A Robust and Efficient Method for Power Lines Extraction from Mobile LiDAR Point Clouds. PFG. 2021;89(3):209–32.
16. Al-Najjar A, Amini M, Rajan S, Green JR. Identifying Areas of High-Risk Vegetation Encroachment on Electrical Powerlines Using Mobile and Airborne Laser Scanned Point Clouds. IEEE Sensors J. 2024;24(14):22129–43.
17. García-Fernández M, Álvarez-Narciandi G, Heras FL, Álvarez-López Y. Comparison of Scanning Strategies in UAV-Mounted Multichannel GPR-SAR Systems Using Antenna Arrays. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2024;17:3571–86.
18. Li C, Su RKL, Pan X. Assessment of out‐of‐plane structural defects using parallel laser line scanning system. Computer aided Civil Eng. 2023;39(6):834–51.
19. Hosseini MM, Umunnakwe A, Parvania M, Tasdizen T. Intelligent Damage Classification and Estimation in Power Distribution Poles Using Unmanned Aerial Vehicles and Convolutional Neural Networks. IEEE Trans Smart Grid. 2020;11(4):3325–33.
20. Chen M, Chan TO, Wang X, Luo M, Lin Y, Huang H, et al. A risk analysis framework for transmission towers under potential pluvial flood - LiDAR survey and geometric modelling. International Journal of Disaster Risk Reduction. 2020;50:101862.
21. Chen Y, Huang S, Liu S, Yu B, Jia J. DSGN++: Exploiting Visual-Spatial Relation for Stereo-Based 3D Detectors. IEEE Trans Pattern Anal Mach Intell. 2022;45(4):4416–29. pmid:35939470
22. Li X, Kong D. SRIF-RCNN: Sparsely represented inputs fusion of different sensors for 3D object detection. Appl Intell. 2023.
23. Liu K, Gao Z, Lin F, Chen BM. FG-Net: A Fast and Accurate Framework for Large-Scale LiDAR Point Cloud Understanding. IEEE Trans Cybern. 2022;53(1):553–64. pmid:35417363
24. Li D, Shi G, Li J, Chen Y, Zhang S, Xiang S, et al. PlantNet: A dual-function point cloud segmentation network for multiple plant species. ISPRS Journal of Photogrammetry and Remote Sensing. 2022;184:243–63.
25. Chen Y, Li Z, Li Q, Zhang M. Pose estimation algorithm based on point pair features using PointNet ++. Complex Intell Syst. 2024;10(5):6581–95.
26. Tang K, Chen Y, Peng W, Zhang Y, Fang M, Wang Z, et al. RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception. Vis Comput. 2023;39(11):5577–88.
27. Jin C, Wu T, Zhou J. Multi-grid representation with field regularization for self-supervised surface reconstruction from point clouds. Computers & Graphics. 2023;114:379–86.
28. Xiao A, Huang J, Guan D, Zhang X, Lu S, Shao L. Unsupervised Point Cloud Representation Learning With Deep Neural Networks: A Survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(9):11321–39. pmid:37030870
29. Hafiz AM, Bhat RUA, Parah SA, Hassaballah M. SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images. Pattern Anal Applic. 2023;26(3):1291–302.
30. Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res. 2023;35(11):2363–97. pmid:37682491
31. Li X, Bagher‐Ebadian H, Gardner S, Kim J, Elshaikh M, Movsas B, et al. An uncertainty‐aware deep learning architecture with outlier mitigation for prostate gland segmentation in radiotherapy treatment planning. Medical Physics. 2023;50(1):311–22.
32. Qi X, Liu Z, Liao R, Torr PHS, Urtasun R, Jia J. GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation. IEEE Trans Pattern Anal Mach Intell. 2022;44(2):969–84. pmid:32870785
33. Xu Q, Kong W, Tao W, Pollefeys M. Multi-Scale Geometric Consistency Guided and Planar Prior Assisted Multi-View Stereo. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4945–63. pmid:35984800
34. Ge C, Zhang R, Jiang Y, Li B, He Y. A 3-D Dynamic Non-WSS Cluster Geometrical-Based Stochastic Model for UAV MIMO Channels. IEEE Trans Veh Technol. 2022;71(7):6884–99.
35. Kim Y, Kim S, Sim S, Choi JW, Kum D. Boosting Monocular 3D Object Detection With Object-Centric Auxiliary Depth Supervision. IEEE Trans Intell Transport Syst. 2022;1–13.
36. Peng L, Wu X, Yang Z, Liu H, Cai D, Did-m3d: decoupling instance depth for monocular 3d object detection. ECCV. 2022;13661(1):71–88.
37. Zhou D, Song X, Fang J, Dai Y, Li H, Zhang L. Context-Aware 3D Object Detection From a Single Image in Autonomous Driving. IEEE Trans Intell Transport Syst. 2022;23(10):18568–80.
38. Zhou Z, Du L, Ye X, Zou Z, Tan X, Zhang L, et al. SGM3D: Stereo Guided Monocular 3D Object Detection. IEEE Robot Autom Lett. 2022;7(4):10478–85.
© 2025 Liao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.