1 Introduction
Several factors can influence the production process, including continuous casting of steel billets, processing techniques, and the production environment. Steel surfaces typically have defects such as crazing, inclusion, patches, pitted surfaces, rolled-in scale, and scratches [1]. These defects not only affect the appearance, but also lead to stress concentrations. As a result, the steel’s capacity to tolerate fatigue and collisions is reduced, limiting its lifespan. The manufacturing of defective steel wastes a significant amount of raw materials, which has a negative impact on the enterprise’s efficiency. The current urgent issue is to solve the problem of controlling defective steel productivity and improving product quality to meet the requirements of modern industry. Therefore, it is crucial to improve the ability to detect surface defects in steel [2].
Traditional steel surface defect detection methods are classified into three categories: manual detection, photoelectric detection and traditional machine vision-based detection methods [3–5]. Manual testing method is inefficient and labor-intensive, with issues such as inconsistent testing standards and missing tests. Despite its increased effectiveness, the photoelectric detection approach is limited by strict environmental conditions and requires significant equipment maintenance expenses [6]. Traditional machine vision-based inspection approaches have significantly enhanced the speed and precision of inspections. However, they still require manual segmentation and feature extraction from steel photos. The operator must have a high level of technical expertise, and the computer must have significant computing power. With the rapid advancement of artificial intelligence, particularly deep learning technology, has provided a more efficient approach for detecting defects on steel surfaces in recent years [7,8].
Defect detection approaches based on conventional machine vision are divided into several processes, including image preprocessing, threshold segmentation, feature extraction, and classifier classification. Threshold segmentation and feature extraction are difficult procedures that require manual design [9–12]. With the increase of industrial activities, the number of data generated in actual production is continuously increasing, which is leading to greater demands for the performance of algorithms [13–15]. In 2014, Girshick et al. proposed the R-CNN algorithm based on Convolutional Neural Networks (CNN) [16]. The VOC2007 dataset achieved a mean Average Precision (mAP) of 58.5%, representing a significant gain compared to the old algorithm’s 33.7%. Presently, the majority of steel surface defect detection models that utilize deep learning are developed using the Faster RCNN model, YOLO series model, and SSD series model [17–22]. Each of the three designs has distinct principles and focuses on specific performance aspects. The Faster RCNN model is based on the two-stage network architecture of R-CNN. The design approach involves the use of region proposal, CNN, and Support Vector Machine (SVM). The CNN network is used to extract features. The region proposal algorithm is used to generate candidate regions and correct the coordinates. The final classification and localization results are obtained through full connectivity and classifiers. Wang M et al. proposed an image detection method based on improved Faster R-CNN model for wear location and wear mechanism identification [23]. The YOLO series and SSD series networks are categorized as single-stage network structures and employ a single-stage design concept. This class of network avoids the need for adjusting the candidate box and can directly get the category probability and position information of the target. While there is a trade-off in terms of precision, the speed of detection is significantly improved. Li Z et al. proposed a lightweight and efficient deep learning model (SSDD-Net) for steel surface defect detection. The proposed model obtains optimal performance while keeping a small number of parameters (3.79M) [24]. Zhao C et al. proposed a model, named RDD-YOLO, based on YOLOv5 for steel surface defect detection, a higher level of precision is achieved [25].
In order to solve the problems of slow detection speed, low accuracy, and challenging deployment of low arithmetic power equipment in hot rolled strip steel surface defect detection model. The FasterNet-YOLO hot-rolled strip surface defect detection model is proposed. The model allows for rapid and exact automatic identification of surface defects on hot-rolled strip using low computational power devices.
2 Proposed methods
YOLOv5 is a deep learning based target detection model. The model takes the entire image as input and directly predicts the bounding box and category probabilities of the target within the image. YOLOv5 has received much attention for its robustness, generalization ability, fast detection speed and high accuracy. In this paper, we propose a FasterNet - YOLO lightweight surface defect detection model for low-computing power devices based on the YOLOv5 model. The network structure of FasterNet--YOLO is shown in Fig 1.
[Figure omitted. See PDF.]
FasterNet - YOLO is mainly composed of three parts: the backbone network, the neck network and the detection head. The backbone network extracts features from images primarily using convolutional operations and Spatial Pyramid Pooling-Fast (SPPF) [26]. The neck network uses a feature pyramid structure to incorporate different scale features obtained from the backbone network. Obtaining more comprehensive semantic information improves the model’s detection performance. The detection header is responsible for generating target location, species and confidence information. It consists of three output layers at different scales. Each output layer is responsible for detecting target objects of different sizes to improve the model’s ability to detect targets of different scales.
FasterNet - YOLO shows four main improvements in comparison to YOLOv5. First, the backbone network of the YOLOv5 model is improved using a lightweight FasterNet network. Reducing redundant parameters in the backbone network and reducing model complexity while maintaining detection accuracy. Subsequently, the CBS structure in the neck network is replaced with a DBS structure to achieve lightweighting. Afterwards, replace the C3 module in the neck network with a C3STR module from fusion Swin-Transformer. By introducing some discrete parameters of Transformer, the semantic information and feature representation of small targets are enhanced with the help of window self-attention module. Solves the problem of cluttered background of defect images and easy confusion of defect categories. Finally, BiFPN is used for feature fusion. The detector’s adaptability to targets of different scales is improved by retaining more informative features.
2.1 FasterNet-based backbone network improvement
MobileNet, ShuffleNet, GhostNet, and other algorithms are often used to detect surface defects in lightweight hot-rolled strips [27–29]. The use of DWConv deep convolution or GConv group convolution has been achieved to lighten the model detection task [30,31]. In the process of reducing the number of network references, as operators often suffer from side effects due to increased memory accesses. In addition, such networks are usually accompanied by additional data operations such as splicing, shuffling and pooling. These operations are critical to the inference speed of lightweight networks. In lightweight networks, inverted residuals and linear bottleneck structures are often used to reduce the overall number of parameters in the network model. However, DWConv in this series of lightweight operations increases the network width, resulting to higher memory accesses. This increases computation and decreases the computational reasoning speed of the network model. It is not conducive to real-time detecting of surface defects on hot-rolled strips, particularly in devices with low computational power.
FasterNet is a new neural network proposed at CVPR 2023 [32]. It possesses superior capabilities in terms of speed, accuracy, and performance compared to other networks like MobileVit, across a diverse variety of devices including CPU, GPU, and ARM processor. FasterNet designs a new partial convolution (PConv). By applying a convolution operation to some of the channels of the input feature map while leaving other channels unchanged [33]. The effect of operators suffering from the side effect of increased memory access in the extraction of spatial features using DWConv and GConv for earlier backbone networks such as MobileNet, ShuffleNet and GhostNet is addressed.
YOLOv5 has a complex backbone network structure, and when deployed on a low computational power device, it detects speeds that do not meet usage requirements. To improve the speed of model detection, the backbone network of YOLOv5 has been lightweighted and improved. The backbone network of YOLOv5 consists mainly of five CBS structures and four C3 structures. Each CBS structure contains of one ordinary convolution, and each C3 structure has at leastfive ordinary convolutions. Each input channel requires ordinary convolutional operations and each channel requires a set of convolutional kernels. This leads to a large number of network parameters and high memory usage. The FasterNet network consists mainly of an embedding layer, a merging layer and a FasterNet block. It has a simple structure and low computational complexity. Improving the backbone network of YOLOv5 by introducing the FasterNet network can effectively reduce the parameters in the backbone network.
PConv uses ordinary convolution for spatial feature extraction on only some of the input channels. Keeping the rest of the channels unchanged at the same time ensures that the inputs and outputs have the same number of channels. The computational complexity of the model is effectively reduced while preserving the spatial information. Pointwise Convolution (PWConv) uses a 1 × 1 convolution kernel for each pixel point of the input and is able to fully utilize the information from all channels.
Assume that is the height of the feature map, is the width of the feature map, is the size of the convolution kernel, is the number of channels, is the number of channels in PConv where the operation is performed, and is the ratio of to in PConv. GFLOPs for PConv at r = :
Only 1/16 of an ordinary convolution.
The memory accesses of PConv are as follows:
Only 1/4 of an ordinary convolution. It is evident that PConv significantly decreases the number of GFLOPs and memory accesses in comparison to ordinary convolution. Introducing PConv can reduce the computational complexity of the model.
The FasterNet block consists of one PConv and two PWConvs. BN and activation functions are used only after the intermediate PWConv layer. Lower latency and faster inference are achieved while ensuring feature diversity. The combined structure of PConv and PWConv has a T-shaped effective receptive field on the input feature map. The center location information is given more importance than the regular convolution of the ordinary processing. Combining PConv and PWConv not only reduces computational complexity, but also improves output quality. The backbone network improvements based on the FasterNet network are shown in Fig 2. First, the FasterNet network contains embedding and merging layers to achieve downsampling and increase the number of channels. Second, FasterNet blocks are introduced for feature extraction. Finally, the SPPF structure of YOLOv5 is retained to enhance the model’s ability to perceive targets at different scales. When compared to YOLOv5, the FasterNet-based backbone network has lower computational complexity. At the same time, the information of all channels is fully utilized to maintain a high feature extraction capability.
[Figure omitted. See PDF.]
2.2 Improvement of neck networks based on depth-separable convolution
The YOLOv5 neck network’s CBS structure has a large number of ordinary convolutional parameters and uses a lot of memory, leading to slowed model detection. In order to reduce the number of neck network parameters and improve the speed of model detection, the ordinary convolution in the CBS structure is lightened and improved. Depthwise Separable Convolution (DSConv) has a small number of parameters and a small memory usage [34]. The use of DSConv to improve the ordinary convolution in the CBS structure can effectively reduce the number of parameters in the neck network and improve the speed of model detection.
DSConv contains two stages. The first stage is depthwise convolution. This stage performs convolution for each channel of the input using a separate convolution kernel. The second stage is pointwise convolution. This stage adjusts the number of channels using convolution for the results of pointwise convolution.
Assume is the input feature size, is the number of input channels, is the output feature size, is the number of output channels, and is the convolution kernel size. The computation of ordinary convolution is:
The computation of DSConv is:
The ratio of DSConv to ordinary convolutional computation is:
From the above equation, it can be seen that the computational volume of DSConv is much less than that of normal convolution.
The improved structure of the model CBS based on DSConv is shown in Fig 3. Introducing DSConv into the neck network of YOLOv5 reconstructs the ordinary convolution in the CBS structure into depthwise convolution and pointwise convolution. Define this structure as DBS. By reducing the number of parameters and memory usage of the YOLOv5 neck network, the speed of model detection can be improved.
[Figure omitted. See PDF.]
2.3 C3STR module incorporating Swin-Transformer
With the deepening of the network structure and multiple convolution operations, most of the target feature information that should be available for small targets in the hot-rolled strip surface defect image is lost in the advanced feature map. Therefore, Swin-Transformer is embedded into the C3 convolutional block in the feature fusion section [35]. The window self-attention module improves the semantic information and feature representation of small targets by introducing discrete parameters of the Transformer. The improved C3STR structure is shown in Fig 4.
[Figure omitted. See PDF.]
Swin Transformer Block (STB) mainly consists of Window multi-head self-attention (W-MSA) module and shifted window multi-head self- attention (SW-MSA) module. These two sub-modules significantly reduce the computational complexity compared to the Transformer self-attention MSA sub-module by restricting the computation to windows. The computational complexity of the sub-module is as follows:
Where: denotes the computational complexity; is a constant (usually set to 7); denote the values of feature height and width, respectively. From equation, the computational complexity of W-MSA and SW-MSA sub-modules is linearly related to and , and the computational complexity of MSA module is quadratically related to and . Self-attention computation is conducted in the window and then the output is obtained by Multi-layer perceptron (MLP).
The features enter the STB and go to the W-MSA sub-module after Layer Normalization (LN) in the first part. Self-attention calculation based on the moving window through the second part of the SW-MSA sub-module. The final prediction is obtained through performing global average pooling using MLP. where self-attention is calculated as:
Where: SoftMax is the normalized exponential function; , and are the matrices corresponding to Query, Key, and Value, respectively; Query and Key are the feature vectors for calculating the Attention weights, and Value denotes the vector of the input features; is the vector dimensions of and ; and is the relative position bias matrix.
2.4 Using the BiFPN network
Traditional FPNs use a top-down approach to aggregate multi-scale feature elements [36]. However, this unidirectional transmission information transfer method will lose some of the detailed information. PANet is the addition of bottom-up aggregation to FPN to effectively fuse different levels of features. The PAFPN structure used in YOLOv5 combines the advantages of FPN and PANet [37]. Although the limitation of unidirectional information flow of FPN is improved and more efficient feature fusion is realized, the training parameters will be increased accordingly.
Bidirectional Feature Pyramid Network (BiFPN) applies top-down and bottom-up bi-directional multi-scale feature fusion to aggregate features of different resolutions [38]. Learnable weights are also introduced to learn to update the weights of different input features to improve the detection accuracy. The network structure of BiFPN and the BiFPN network structure used in this paper is shown in Fig 5.
[Figure omitted. See PDF.]
is the intermediate feature of P6 layer and and are the input and output features. The band-weighted feature fusion formula for BiFPN is:
Where: Conv is the corresponding convolution operation; Resize is the up-sampling or down-sampling operation; ω is the corresponding weight of each layer, which is used to distinguish the importance of different features in the process of feature fusion; and ε is a very small non-zero number.
BiFPN networks are capable of conducting multi-level feature fusion across different scales, while also having bi-directional connectivity. The model removes the P3 and P7 layer feature fusion nodes that contribute less to the network to reduce the computational effort and adds an edge to connect the input deterioration outputs. Get deep semantic information and retain more location information without adding more cost. Small target detection accuracy is improved by fusing shallow feature maps. This reduces the cost of computation and storage while increasing detection accuracy.
Hot rolled strips have many surface defects and small targets, such as pitted surface, rolled-in scale, and extreme aspect ratio targets, such as scratch, inclusion. Therefore, this paper uses the BiFPN network principle to add a connection line between the input and output. The P5 and P3 layer fusion points are also retained to preserve more information features. The fusion of defect features at different scales improves the precision of detecting surface defects on hot-rolled strip steel.
3 Experiments
3.1 Datasets
The experiments in this paper use the publicly available dataset NEU-DET (Steel Surface Defects) produced by Northeastern University to train and test the model [39]. The NEU-DET dataset comprises six different types of steel surface defects: Cr (crazing), In (inclusion), Pa (patches), PS (pitted surface), RS (rolled-in scale and scratches), and Sc (scratches). There are 1800 images in total, 300 images for each defect. The training set, validation set, and test set are divided in a ratio of 8:1:1. The six defects of the dataset are shown in Fig 6.
[Figure omitted. See PDF.]
2.2 Training Details
The specific configuration of the experimental environment is shown in Table 1.
[Figure omitted. See PDF.]
The parameter settings for this experiment are shown in Table 2.
[Figure omitted. See PDF.]
The training process employs a transfer learning method, where pre-trained weight parameters are loaded on the hot rolled strip surface defect dataset. Throughout the initial three training Epochs, the model’s learning rate gradually increases. Upon the completion of three Epochs, the learning rate will be modified to 0.01. As the Epoch iterates, the learning rate will gradually decrease in order to avoid overfitting the model.
In order to evaluate the overall performance of the model, the experiments used Average Precision (AP), mean Average Precision (mAP) as an index to evaluate the detection effect of the model. AP is the average precision of a single category, which is the region enclosed by the P-R curve. Where P stands for precision and R is recall. TP, FP and FN represent the number of positive samples predicted to be positive cases, the number of negative samples predicted to be positive cases and the number of positive samples predicted to be negative cases, respectively.
In addition to detection accuracy, another important evaluation criterion for object detection tasks is speed, which plays a crucial role in real-time tasks. The metric that usually judges the speed of object detection is frames per second (FPS), i.e., the number of images that can be processed in each second. It is particularly important to note that the calculation of the FPS metric needs to be completed under the same hardware conditions.
4 Results and discussions
4.1 Ablation study
The paper carries out ablation experiments on the NEU-DET dataset to validate the improved model’s effectiveness in detecting surface defects on steel. For this purpose, the 10 experiments in Table 3 below are carried out, and each set of experiments is tested with the same parameters and network environment. Where experiment 1 is YOLOv5s network without any improvement strategy. Experiments 2, 3, 4, and 5 represent the introduction of FasterNet, DBS, C3STR, and BiFPN, respectively. Experiments 6, 7, and 8 investigate the effects of DBS, C3STR, and BiFPN on the model with the introduction of FasterNet. Experiment 9 investigates the effects of DBS, C3STR and BiFPN on the model.
[Figure omitted. See PDF.]
The results from the ablation experiments were analysed. Our model parameters decreased by 49.4%, GFLOPs decreased by 57.0%, mAP increased by 6.2%, and FPS increased by 54.1% when compared to the YOLOv5s model. The introduction of FasterNet achieves a more lightweight model and higher FPS. PConv extracts spatial features using ordinary convolution on some of the input channels. Keeping the rest of the channels unchanged at the same time ensures that the inputs and outputs have the same number of channels. The computational complexity of the model is effectively reduced while preserving the spatial information. Lightweight improvement of the ordinary convolution in the CBS structure by using DSConv. It effectively decreases the number of parameters in the neck network and improves the model detection speed. By fusing Swin-Transformer with the C3 module on the neck. The parameters and GFLOPs had a small rise, while the modeled mAP improved significantly by 5.2%. It effectively enhances the model’s ability to deal with the cluttered background of defect images and the easy confusion of defect types. Replacing the PANet in the neck network with a BiFPN increased the model’s mAP by 4.3%. This is because BiFPN networks can perform multi-level feature fusion across scales while having bi-directional connectivity. Gain deep semantic information and retain more positional information. The shallow feature maps are fused to improve the detection accuracy of small targets such as surface defects on hot-rolled steel strip.
4.2 Experimental results
Table 4 shows the mAP of our model in comparison to the six defects of YOLOv5. The mAP of Cr, In, Pa, PS, RS, and Sc increased by 8.8%, 2.3%, 5.1%, 8.9%, 17.5%, and 0.7%, respectively. Cr, PS, and RS show significant improvement, whereas the improvement in Sc is not evident. We analysed this because the accuracy of Sc is already high, while the model’s detection of small target defects could be improved.
[Figure omitted. See PDF.]
Based on the above analysis, the inclusion of FasterNet and DBS has effectively reduced the parameters and GFLOPs of the model. The introduction of Swin-Transformer and BiFPN significantly improves the model’s mAP. Fig 7(a) and 7(b) show some detection results of the YOLOv5s model and our model.
[Figure omitted. See PDF.]
4.3 Comparison of different algorithms
In this paper, we choose SSD, Faster R-CNN, YOLOv3, and YOLOv4 to compare their performance to our model. Table 5 shows that our model’s mAP surpasses that of SSD, Faster-RCNN, YOLOv3, and YOLOv4 by a significant margin. The SSD uses a multi-layer feature fusion approach. Smaller targets have limited feature extraction, resulting in poor detection performance. Faster R-CNN is a two-stage detection algorithm. Candidate frames are first generated and then classification and regression operations are performed to get the location and category of the target. So the algorithm has high computation, and the detection speed is difficult to meet the requirements of real-time detection. The YOLOv3 is poor at capturing small targets. The YOLOv4 detection accuracy and speed are both low, making it difficult to deploy. Additionally, the mAP of our model is slightly higher than Retina-Net and YOLOv8. This is due to the fact that most of the current methods for detecting steel surface defects are only good for specific defect categories and lack good applicability to multiple categories of defects. We have made corresponding improvements to address specific needs. Moreover, the FPS of the model is much higher than other models due to the introduction of FasterNet. This indicates that the model is capable of meeting the real-time detection requirements of low computational power platforms.
[Figure omitted. See PDF.]
5 Conclusions
In this paper, we propose an algorithm for detection of steel surfaces defects called FsterNet -YOLO based on YOLOv5. The algorithm combines some current techniques in computer vision, including FasterNet, depth-separable convolution, Swin-Transformer, and BiFPN. To address the problems of low detection speed and large number of parameters, the FasterNet network is introduced to reconstruct the backbone network, and the deepwise separable convolution improves the ordinary convolution of YOLOv5’s neck network. To address the problem of cluttered background of defect photos and easy confusion of defect categories. The Swin-Transformer is integrated into the neck network’s C3 module to improve semantic information and feature representation of small targets. In order to improve the adaptability of the detector to targets at different scales. BiFPN is used for feature fusion to retain more informative features. Testing on the NEU-DET dataset reduces model parameters by 49.4%, GFLOPs by 57.0%, and improves mAP by 6.2%. It is demonstrated that the algorithm can detect in real-time with a significant increase in detection speed while slightly improving detection accuracy. In the next research, the model will introduce richer datasets to enhance its generalisation capabilities and make the model better adapted to real-time monitoring in industrial scenarios. In this paper, the experience of processing steel surface defects dataset and designing detection algorithms accumulated in the experiments is hoped to be helpful for more researchers in dealing with steel surface defects.
References
1. 1. Kim S, Kim W, Noh YK, et al. Transfer learning for automated optical inspection. 2017 International Joint Conference on Neural Networks (IJCNN). IEEE; 2017, p. 2517–24.
* View Article
* Google Scholar
2. 2. Demir K, Ay M, Cavas M, Demir F. Automated steel surface defect detection and classification using a new deep learning-based approach. Neural Comput & Applic. 2022;35(11):8389–406.
* View Article
* Google Scholar
3. 3. Cheng X, Yu J. RetinaNet With Difference Channel Attention and Adaptively Spatial Feature Fusion for Steel Surface Defect Detection. IEEE Trans Instrum Meas. 2021;70:1–11.
* View Article
* Google Scholar
4. 4. Li Z, Wei X, Hassaballah M, Li Y, Jiang X. A deep learning model for steel surface defect detection. Complex Intell Syst. 2023;10(1):885–97.
* View Article
* Google Scholar
5. 5. Zou Y, Fan Y. An Infrared Image Defect Detection Method for Steel Based on Regularized YOLO. Sensors (Basel). 2024;24(5):1674. pmid:38475212
* View Article
* PubMed/NCBI
* Google Scholar
6. 6. Jin L, Li S, Qin G, et al. Outer surface defect detection of steel pipes with 3D vision based on multi-line structured lights. Measurement Science and Technology, 2024.
* View Article
* Google Scholar
7. 7. Tang B, Chen L, Sun W, Lin Z. Review of surface defect detection of steel products based on machine vision. IET Image Processing. 2022;17(2):303–22.
* View Article
* Google Scholar
8. 8. Luo Q, Fang X, Liu L, Yang C, Sun Y. Automated Visual Defect Detection for Flat Steel Surface: A Survey. IEEE Trans Instrum Meas. 2020;69(3):626–44.
* View Article
* Google Scholar
9. 9. Zhao C, Fan Y, Tan J, Li Q, Lin Z, Luo S, et al. FCS-YOLO: an efficient algorithm for detecting steel surface defects. Meas Sci Technol. 2024;35(8):086004.
* View Article
* Google Scholar
10. 10. Du N, Feng Q, Liu Q, Li H, Guo S. FSN-YOLO: Nearshore Vessel Detection via Fusing Receptive-Field Attention and Lightweight Network. JMSE. 2024;12(6):871.
* View Article
* Google Scholar
11. 11. Liu Z, Rasika D. Abeyrathna RM, Mulya Sampurno R, Massaki Nakaguchi V, Ahamed T. Faster-YOLO-AP: A lightweight apple detection algorithm based on improved YOLOv8 with a new efficient PDWConv in orchard. Computers and Electronics in Agriculture. 2024;223:109118.
* View Article
* Google Scholar
12. 12. Luo P, Wang B, Wang H, et al. An ultrasmall bolt defect detection method for transmission line inspection. IEEE Transactions on Instrumentation and Measurement. 2023;72:1–12.
* View Article
* Google Scholar
13. 13. Chen M, Chen J, Li C, Wang Q, Takamasu K. Defect detection of MicroLED with low distinction based on deep learning. Optics and Lasers in Engineering. 2024;173:107924.
* View Article
* Google Scholar
14. 14. Tang H, Liang S, Yao D, Qiao Y. A visual defect detection for optics lens based on the YOLOv5 -C3CA-SPPF network model. Opt Express. 2023;31(2):2628–43. pmid:36785272
* View Article
* PubMed/NCBI
* Google Scholar
15. 15. Song K, Sun X, Ma S, Yan Y. Surface Defect Detection of Aeroengine Blades Based on Cross-Layer Semantic Guidance. IEEE Trans Instrum Meas. 2023;72:1–11.
* View Article
* Google Scholar
16. 16. Agrawal P, Girshick R, Malik J. Analyzing the performance of multilayer neural networks for object recognition. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13. Springer International Publishing. 2014, p. 329–44.
* View Article
* Google Scholar
17. 17. Girshick R. Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision. 2015, p. 1440–48.
* View Article
* Google Scholar
18. 18. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, p. 779–88.
* View Article
* Google Scholar
19. 19. Bochkovskiy A, Wang CY, Liao HYM. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. 2020.
* View Article
* Google Scholar
20. 20. Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976. 2022.
* View Article
* Google Scholar
21. 21. Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, p. 7464–75.
* View Article
* Google Scholar
22. 22. Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing. 2016, p. 21–37.
* View Article
* Google Scholar
23. 23. Wang M, Yang L, Zhao Z, Guo Y. Intelligent prediction of wear location and mechanism using image identification based on improved Faster R-CNN model. Tribology International. 2022;169:107466.
* View Article
* Google Scholar
24. 24. Li Z, Wei X, Jiang X. SSDD-Net: A Lightweight and Efficient Deep Learning Model for Steel Surface Defect Detection. Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Singapore: Springer Nature Singapore, 2023: 237–48.
* View Article
* Google Scholar
25. 25. Zhao C, Shu X, Yan X, Zuo X, Zhu F. RDD-YOLO: A modified YOLO for detection of steel surface defects. Measurement. 2023;214:112776.
* View Article
* Google Scholar
26. 26. Zhou Y. A YOLO-NL object detector for real-time detection. Expert Systems with Applications. 2024;238:122256.
* View Article
* Google Scholar
27. 27. Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. 2017.
* View Article
* Google Scholar
28. 28. Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design. Proceedings of the European Conference on Computer Vision (ECCV). 2018, p. 116–31.
* View Article
* Google Scholar
29. 29. Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 1580–9.
* View Article
* Google Scholar
30. 30. Poudel R P K, Bonde U, Liwicki S, et al. Contextnet: Exploring context and detail for semantic segmentation in real-time. arXiv preprint arXiv:1805.04554. 2018.
* View Article
* Google Scholar
31. 31. Chen J, Lei B, Song Q, et al. A hierarchical graph network for 3d object detection on point clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 392–401.
* View Article
* Google Scholar
32. 32. Chen J, Kao S, He H, et al. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, p. 12021–31.
* View Article
* Google Scholar
33. 33. Ma X, Guo F M, Niu W, et al. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(04):5117–24.
* View Article
* Google Scholar
34. 34. Chollet F. Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, p. 1251–8.
* View Article
* Google Scholar
35. 35. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer vision. 2021, p. 10012–22.
* View Article
* Google Scholar
36. 36. Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 10781–90.
* View Article
* Google Scholar
37. 37. Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, p. 2117–25.
* View Article
* Google Scholar
38. 38. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, p. 8759–68.
* View Article
* Google Scholar
39. 39. He Y, Song K, Dong H, Yan Y. Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network. Optics and Lasers in Engineering. 2019;122:294–302.
* View Article
* Google Scholar
Citation: Yu S, Liu Z, Zhang L, Zhang X, Wang J (2025) FasterNet-YOLO for real-time detection of steel surface defects algorithm. PLoS One 20(5): e0323248. https://doi.org/10.1371/journal.pone.0323248
About the Authors:
Shiwei Yu
Roles: Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: CGN Digital Technology Co., Ltd., Shanghai, China
ORICD: https://orcid.org/0009-0001-7359-9160
Zelin Liu
Roles: Funding acquisition, Project administration, Writing – review & editing
Affiliation: CGN Digital Technology Co., Ltd., Shanghai, China
Liang Zhang
Roles: Funding acquisition, Project administration, Writing – review & editing
Affiliation: CGN Digital Technology Co., Ltd., Shanghai, China
Xiaoqiang Zhang
Roles: Funding acquisition, Project administration, Writing – review & editing
Affiliation: CGN Digital Technology Co., Ltd., Shanghai, China
Jikui Wang
Roles: Funding acquisition, Project administration, Writing – review & editing
Affiliation: CGN Digital Technology Co., Ltd., Shanghai, China
1. Kim S, Kim W, Noh YK, et al. Transfer learning for automated optical inspection. 2017 International Joint Conference on Neural Networks (IJCNN). IEEE; 2017, p. 2517–24.
2. Demir K, Ay M, Cavas M, Demir F. Automated steel surface defect detection and classification using a new deep learning-based approach. Neural Comput & Applic. 2022;35(11):8389–406.
3. Cheng X, Yu J. RetinaNet With Difference Channel Attention and Adaptively Spatial Feature Fusion for Steel Surface Defect Detection. IEEE Trans Instrum Meas. 2021;70:1–11.
4. Li Z, Wei X, Hassaballah M, Li Y, Jiang X. A deep learning model for steel surface defect detection. Complex Intell Syst. 2023;10(1):885–97.
5. Zou Y, Fan Y. An Infrared Image Defect Detection Method for Steel Based on Regularized YOLO. Sensors (Basel). 2024;24(5):1674. pmid:38475212
6. Jin L, Li S, Qin G, et al. Outer surface defect detection of steel pipes with 3D vision based on multi-line structured lights. Measurement Science and Technology, 2024.
7. Tang B, Chen L, Sun W, Lin Z. Review of surface defect detection of steel products based on machine vision. IET Image Processing. 2022;17(2):303–22.
8. Luo Q, Fang X, Liu L, Yang C, Sun Y. Automated Visual Defect Detection for Flat Steel Surface: A Survey. IEEE Trans Instrum Meas. 2020;69(3):626–44.
9. Zhao C, Fan Y, Tan J, Li Q, Lin Z, Luo S, et al. FCS-YOLO: an efficient algorithm for detecting steel surface defects. Meas Sci Technol. 2024;35(8):086004.
10. Du N, Feng Q, Liu Q, Li H, Guo S. FSN-YOLO: Nearshore Vessel Detection via Fusing Receptive-Field Attention and Lightweight Network. JMSE. 2024;12(6):871.
11. Liu Z, Rasika D. Abeyrathna RM, Mulya Sampurno R, Massaki Nakaguchi V, Ahamed T. Faster-YOLO-AP: A lightweight apple detection algorithm based on improved YOLOv8 with a new efficient PDWConv in orchard. Computers and Electronics in Agriculture. 2024;223:109118.
12. Luo P, Wang B, Wang H, et al. An ultrasmall bolt defect detection method for transmission line inspection. IEEE Transactions on Instrumentation and Measurement. 2023;72:1–12.
13. Chen M, Chen J, Li C, Wang Q, Takamasu K. Defect detection of MicroLED with low distinction based on deep learning. Optics and Lasers in Engineering. 2024;173:107924.
14. Tang H, Liang S, Yao D, Qiao Y. A visual defect detection for optics lens based on the YOLOv5 -C3CA-SPPF network model. Opt Express. 2023;31(2):2628–43. pmid:36785272
15. Song K, Sun X, Ma S, Yan Y. Surface Defect Detection of Aeroengine Blades Based on Cross-Layer Semantic Guidance. IEEE Trans Instrum Meas. 2023;72:1–11.
16. Agrawal P, Girshick R, Malik J. Analyzing the performance of multilayer neural networks for object recognition. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13. Springer International Publishing. 2014, p. 329–44.
17. Girshick R. Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision. 2015, p. 1440–48.
18. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, p. 779–88.
19. Bochkovskiy A, Wang CY, Liao HYM. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. 2020.
20. Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976. 2022.
21. Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, p. 7464–75.
22. Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing. 2016, p. 21–37.
23. Wang M, Yang L, Zhao Z, Guo Y. Intelligent prediction of wear location and mechanism using image identification based on improved Faster R-CNN model. Tribology International. 2022;169:107466.
24. Li Z, Wei X, Jiang X. SSDD-Net: A Lightweight and Efficient Deep Learning Model for Steel Surface Defect Detection. Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Singapore: Springer Nature Singapore, 2023: 237–48.
25. Zhao C, Shu X, Yan X, Zuo X, Zhu F. RDD-YOLO: A modified YOLO for detection of steel surface defects. Measurement. 2023;214:112776.
26. Zhou Y. A YOLO-NL object detector for real-time detection. Expert Systems with Applications. 2024;238:122256.
27. Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. 2017.
28. Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design. Proceedings of the European Conference on Computer Vision (ECCV). 2018, p. 116–31.
29. Han K, Wang Y, Tian Q, et al. Ghostnet: More features from cheap operations. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 1580–9.
30. Poudel R P K, Bonde U, Liwicki S, et al. Contextnet: Exploring context and detail for semantic segmentation in real-time. arXiv preprint arXiv:1805.04554. 2018.
31. Chen J, Lei B, Song Q, et al. A hierarchical graph network for 3d object detection on point clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 392–401.
32. Chen J, Kao S, He H, et al. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, p. 12021–31.
33. Ma X, Guo F M, Niu W, et al. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(04):5117–24.
34. Chollet F. Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, p. 1251–8.
35. Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer vision. 2021, p. 10012–22.
36. Tan M, Pang R, Le QV. Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, p. 10781–90.
37. Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, p. 2117–25.
38. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, p. 8759–68.
39. He Y, Song K, Dong H, Yan Y. Semi-supervised defect classification of steel surface based on multi-training and generative adversarial network. Optics and Lasers in Engineering. 2019;122:294–302.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Yu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Steel surface defect detection is an important application of object detection in industry. Achieving object detection in industry while balancing detection accuracy and real-time performance is a challenge. Therefore, this paper proposes an improved FasterNet-YOLO model based on the one-stage detector. Introduce the FasterNet network to reconstruct the YOLOv5 backbone network. Achievement of model lightweighting and significant improvement in detection speed, but with a slight reduction in accuracy. The YOLOv5 neck network’s ordinary convolution is improved by depthwise separable convolution. Continuing to improve detection speed while further reducing redundant parameters in the neck network. To improve model accuracy, the Swin-Transformer is integrated into the C3 module in the neck network. Solve the problem of cluttered backgrounds in defect photographs and easy confusion between defect types. Meanwhile, BiFPN is used for feature fusion. By retaining more informative features, the detector’s ability to adapt to targets at different scales is improved. The results indicated that when comparing FasterNet-YOLO with the original model, the parameters were reduced by 49.4%, GFLOPs were reduced by 57.0%, mAP increased by 6.2%, and FPS increased by 54.1%. The improved model not only increases the detection accuracy, but also significantly improves the speed of hot-rolled strip surface defect detection to meet the requirements of real-time detection.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer