This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
1. Introduction
The benefits and prospects of clean and renewable solar energy are obvious. One of the primary ways solar energy is converted into electricity is through photovoltaic (PV) power systems [1]. Although solar cells (SCs) are the smallest unit in this system, their quality greatly influences the system [2]. The presence of internal and external defects in SC can significantly influence their photoelectric conversion efficiency. Physical imperfections may detrimentally impact their power conversion performance and, in extreme instances, may lead to the deterioration of PV modules [3, 4]. Therefore, it is necessary for manufacturing enterprises to conduct defect detection on the quality of SC [5]. As shown in Figure 1, current mainstream research on detection mainly focuses on quality inspection of printing processes and finished SC slices, but the drawback is the untimely discovery of defect issues, leading to substantial cost wastage. There is a limited body of research concerning defect detection in SCs across various manufacturing processes. Presently, manual visual inspection serves as the primary method for detection; however, this approach is both unreliable and inefficient. In this paper, we design a data-driven, end-to-end learning defect detection algorithm for this task.
[figure(s) omitted; refer to PDF]
Industrial production lines have gradually transitioned from intensive labor toward automation and intelligence. In the realm of defect detection for PV SC, most machine vision-based automated detection algorithms have replaced inefficient manual inspection [6]. The primary technologies utilized for defect detection in PV SC are electroluminescence (EL) and photoluminescence (PL) [7]. EL [8] enables rapid imaging of internal defects in most types of SC, thus meeting the demand for online inspection. However, EL can only be applied to finished cells with closed-loop circuits. PL [9] is employed for online inspection in multiple critical processes, enabling real-time monitoring and screening of defective silicon wafers to prevent their progression to downstream processes, thus averting resource wastage.
The manufacturing process of SC encompasses seven critical steps, including chemical etching and texturing, diffusion, removal of phosphosilicate glass (PSG), silicon nitride ARC, frontside printing and backside printing, firing, and classification and sorting [10]. Due to material fragility and the complexity of manufacturing processes, defects often occur during production [11]. Figure 2 illustrates some image samples of monocrystalline SC (MSC) and polycrystalline SC (PSC) captured at different stages of production. Figure 2(a) depicts the primary phases of the production process. Postdiffusion (as shown in Figure 2(b)), the cells exhibit hollows and edge defects due to chemical reactions between water and carbides at high temperatures. In PSG removal (as shown in Figure 2(c)), uneven liquid rolling and the presence of water vapor affect watermark formation. Additionally, excessive etching can lead to black spots at the edges of cells. In silicon nitride ARC (as shown in Figure 2(d)), unevenness and scratching of the antireflection layer result in shadows and traces. In sorting and classification (as shown in Figures 2(e), 2(f), 2(g)), final inspection is required for finished products. Sciuto et al. [12] categorized defects utilizing atomic force microscopy imagery, given the intricate microscopic morphology and the diverse types of defects present in SC. Defects that may occur during the manufacturing process of SC include encompass both surface imperfections and internal flaws [13]. It is essential to promptly detect problematic units. Furthermore, timely feedback on defect information can be utilized to optimize the production process and adjust process parameters. The type, size, and contrast of SC vary in images from different production processes. Therefore, designing a rapid, robust, and accurate method to detect SC defects in various production processes presents a challenge.
[figure(s) omitted; refer to PDF]
Currently, defect detection for SC primarily involves traditional image processing techniques and deep learning technologies.
Traditional image processing techniques can be further categorized into statistical-based algorithms and filter-based algorithms. Statistical-based algorithms focus on recording and analyzing the spatial distribution characteristics of pixel values in images. This category includes various algorithms such as histogram statistics, autocorrelation analysis, co-occurrence matrix, and local binary patterns [14, 15]. Filter-based algorithms aim to suppress target image noise while preserving image detail features as much as possible. For the detection of hidden crack defects within SC, specific filter algorithms have been designed by Anwar and Abdullah [16], Chen et al. [17], and Fu, Ma, and Zhou [18] to enhance the features of hidden crack defects. Reference [17] proposed guided evidence filtering to enhance the contrast between hidden crack defects and the background, effectively detecting crack defects. For complex detection tasks, some studies have explored solutions in the transform domain, which are less sensitive to noise and intensity changes compared to designing filters in the spatial domain, including techniques based on Fourier transform [19], Hough transform algorithm [13], and wavelet transform [20]. However, traditional image processing techniques lack generality and stability; algorithm design is not only complex but also requires high domain knowledge from algorithm designers. Therefore, designing a data-driven, end-to-end learning defect intelligent detection algorithm is highly meaningful.
In the domain of end-to-end real-time detection, the convolutional neural network (CNN) detection algorithm is predominantly employed. While CNNs possess generalization capabilities, the processes of convolution and pooling can result in the loss of critical details within the feature map, which may hinder the detection of targets in complex backgrounds. To address these challenges, several effective enhanced algorithms have been introduced. For instance, Chen, Wang, and Zhang [21] integrated a spatial pyramid pooling (SPP) network with a probabilistic pooling approach to mitigate the adverse effects of irrelevant features from complex backgrounds on recognition accuracy, thereby achieving improved accuracy in aerial image recognition. However, this enhancement also leads to an increase in model parameters, thereby imposing greater demands on training conditions.
In the context of detecting objects against complex backgrounds, contextual information can significantly assist the attention mechanism in differentiating between objects and their backgrounds, thus enhancing the quality of object features. Wu et al. [22] introduced a context-based feature fusion technique utilizing deep CNN, which amalgamates feature maps from various levels to enhance the alignment between region proposals and target features. Additionally, Aktouf, Shivanna, and Dhimish [23] incorporated compact inverted block (CIB) and partial self-attention (PSA) modules within YOLOv10 to bolster feature extraction and classification accuracy.
Despite the advancements in deep learning–based object detection methods for complex backgrounds, certain deficiencies persist. The attention mechanism is capable of effectively filtering out extraneous background information; however, in specific scenarios, it is crucial to judiciously allocate weights to prevent false positives associated with small targets [24]. While contextual information can enhance the model’s comprehension of both background and foreground elements, it is imperative to filter this information appropriately. Consequently, investigating methodologies for detecting targets within complex backgrounds holds significant importance.
In recent years, researchers have conducted extensive studies on defect detection in SC based on deep learning. The focus of these detection networks is on acquiring specific location information and categories of defects [25].
In [26], the authors proposed a multispectral CNN for surface defect detection in SC panels. This algorithm divides each cell image into 7 × 7 image blocks with overlapping regions for network training.
In [27], the authors designed a network based on residual channel attention gates to detect hot spot defects in PV components. The network utilizes residual channel attention gate modules to fuse multiscale features, followed by global average pooling and multilayer perceptrons to refine the fused features dimensionally, and introduces gate mechanisms to generate attention maps, finally adjusting the channel feature weights.
For defect detection in multicrystalline silicon SC panels, the article [28] introduced a complementary attention module embedded into a fast candidate region network. The complementary attention module sequentially connects channel attention networks with spatial attention networks, leveraging the complementary advantages of channel features and spatial position features to adaptively suppress background noise while highlighting defect features.
In [29], the authors proposed a multifeature candidate region fusion algorithm for detecting hidden cracks and grid breaks inside multicrystalline silicon SC panels.
In [30], the authors combined manually crafted features with deep learning networks for detecting hidden crack defects inside multicrystalline silicon SC panels. They utilized Fourier filtering and local binary patterns to process original images, which were then fed into a neural network model. Subsequently, a self-attention mechanism was employed to integrate low-level features and high-level semantic features, thus obtaining more precise geometric information about defects.
In 2023, Lu, Wu, and Chen [31] successfully detected various defects in SC panels by introducing a CA attention mechanism and replacing the decoupling head of YOLOv5 to enhance feature extraction capabilities. Experimental results demonstrated that the optimized model achieved an mAP of 96.1% on the publicly available binary ELPV dataset. It can be concluded that in terms of balancing speed and accuracy, the YOLOv5 model exhibits favorable performance.
The main evaluation indexes of the defect target detection system of the SC are accuracy, efficiency, and robustness. The goal of this system is high accuracy, high efficiency, and strong robustness. In order to achieve these goals, it needs an excellent coordination of optical illumination, image acquisition, and image processing and defect detection [32].
The purpose of this paper is to develop an efficient and stable defect detection algorithm for SC produced from original silicon wafers, utilizing PL for defect visualization and based on YOLOv5. Our research enhances the detection performance of network models while maintaining minimal increases in training costs and detection time, a crucial factor for industrial detection applications. Prior methodologies have not adequately addressed this aspect. The primary contributions of this paper can be outlined as follows:
1. A novel dataset of PV cells has been compiled, which, to the best of my knowledge, represents the first dataset derived from original silicon wafers. This dataset consists of 4500 images and encompasses six typical yet challenging defects. To enhance the model’s generalization performance and robustness, data augmentation techniques were implemented, tailored to the specific characteristics of SC images.
2. Design of the C2f module, specifically addressing the issue of defect feature degradation as the convolutional network deepens. This module enhances the fusion capability of multiscale features, increases the receptive field, and reduces information loss problems.
3. Introduction of soft pooling techniques while keeping the model lightweight, designing a spatial pyramid pooling fast (SPPF) serial construction module to reduce the parameter count and improve inference speed by integrating essential functionalities.
The subsequent sections of this paper are organized as follows: Section 2 presents the methods for data collection and augmentation, along with an overview of the YOLOv5 model and the optimization techniques utilized. Section 3 details comparative experiments, providing a description and analysis of the experimental results. The concluding section offers a summary and outlines prospects for future research.
2. Materials and Methods
2.1. Image Acquisition
In image acquisition, we utilize PL technology to image the interior of SC. Figure 3 illustrates its imaging principle and online system. In a darkroom, we apply a positive bias voltage to the cell via a probe to generate photons, with a portion of these photons captured by a high-sensitivity camera. Subsequently, the images are processed by a computer and eventually displayed in a PL image, as depicted in Figure 3. PL enables the visualization and measurement of defects within SC and is applicable to some external surface defects as well [33].
[figure(s) omitted; refer to PDF]
The SC samples collected for this study originate from actual industrial production lines in China. Dushen 4 k black and white line scan camera (GLP4K4E02-H) is employed, with a 200-w red laser used as the light source. Given the diverse types of defects present in actual production and the color variation among silicon wafers from different batches, to enhance training effectiveness and increase sample diversity, we carefully selected 15,000 images to ensure data balance. Ultimately, 3000 images were chosen as the original dataset, stored in .tiff format, with a resolution of 1280 px × 1280 px.
In particular, to enhance the model’s ability to detect minor defects and reduce training time, one image from the original dataset is evenly divided into four images. Representative 4500 images were selected from these, stored in. tiff format, with a resolution of 640 px × 640 px.
We performed image preprocessing using the LabelImg software, primarily used for labeling image tags. In Figure 4, the labeled categories include six types: cross-undulation, cracks, irregular undulation, perforations, edges collapse, and breakage. During annotation, rectangular boxes are used to enclose defects on SC, accurately representing their position and type. Annotation files are saved in YOLO’s TXT format.
[figure(s) omitted; refer to PDF]
Figure 5 displays the quantity and spatial distribution of various defect types in the dataset used in this study. It can be observed that the distribution of defect quantities is relatively balanced across all types, with the cross-undulation defect type being the most numerous, exceeding 1000, while the least numerous defect type approaches around 400. Defects are uniformly distributed across the image, exhibiting a scattered pattern.
[figure(s) omitted; refer to PDF]
The Materials and Methods section should contain sufficient detail so that all procedures can be repeated. It may be divided into headed subsections if several methods are described.
2.2. Data Enhancement
Constructing a defect detection model for SC based on deep learning requires a large quantity of high-quality image datasets. Therefore, it was decided to apply the technique of data augmentation [34]. This paper adopts methods such as Mosaics, Mixup, HSV transformation, Gaussian noise, and random rotation, tailored to the types of defects in SC and practical needs. These data augmentation techniques increase the variety of defect samples, enhance the model’s training accuracy, and improve its robustness. The results of applying data augmentation techniques are illustrated in Figure 6.
[figure(s) omitted; refer to PDF]
Mosaics data augmentation enhances the richness of the training dataset based on the YOLOv5 model, reducing GPU memory usage. It can directly process data from four images, even with a small Mini-batch size, yielding satisfactory results. Notably, in the final 10 epochs, Mosaics augmentation is deactivated to effectively improve accuracy.
Mixup is a widely used algorithm for image class mixing augmentation in computer vision. It blends images from different categories to expand the training dataset, effectively addressing the issue of insufficient data for SC images, enhancing the overall performance and generalization ability of the model, and strengthening its resistance to adversarial attacks.
HSV transformation provides better visual effects. As SCs on industrial production lines exhibit grayscale variations across different batches, enhancing color HSV channels increases the diversity of training data, further enhancing the robustness and generalization ability of the YOLOv5 model.
Incorporating Gaussian noise into the model better adapts it to real detection scenarios, improving its robustness. In SC images collected from actual production lines, noise is not solely caused by a single source but by a combination of many different noise sources. Assuming real noise comprises numerous additive and independent random variables with different probability distributions, according to the central limit theorem, as the number of noise sources increases, their normalization tends to converge to a Gaussian distribution. Hence, Gaussian noise is more realistic.
2.3. Model Introduction
Among various object detection algorithms, the YOLO framework stands out for its remarkable balance between detection speed and accuracy, enabling swift completion of object detection and recognition [35]. YOLOv5’s design across its modules meticulously considers the balance between speed and accuracy, resulting in outstanding performance in object detection tasks. Compared to its predecessors, YOLOv5 boasts significant advantages, making it an excellent choice for model deployment. The YOLOv5 series consists of four models: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. Among them, the -s model, YOLOv5s, is suitable for deployment on localized devices. Therefore, this paper selects YOLOv5s for deployment purposes.
As shown in Figure 7, the current stage of the YOLOv5 object detection model is primarily divided into three parts: Backbone, Neck, and Head.
[figure(s) omitted; refer to PDF]
Before the image enters the convolutional network, a slicing operation is performed. Specifically, every other pixel in an image is sampled, akin to nearest neighbor downsampling. This results in four complementary yet similar images, preserving information without loss. Consequently, information from the image’s width (W) and height (H) is concentrated into the channel space. The input channel count is quadrupled, as the concatenated images transform from the original RGB three-channel mode to 12 channels. Finally, after convolutional processing, feature maps are obtained with twice the downsampling without information loss.
2.3.1. Backbone
The main network of the model is responsible for feature extraction from input images by increasing the depth of convolutional layers. It extracts feature maps with different receptive fields (P1–P5), progressively increasing the receptive field. The Backbone mainly consists of four parts: convolutional module (ConvModule), DarknetBottleneck, CSPNet (C3), and SPP.
1. ConvModule: The ConvModule is a fundamental building block commonly used in CNNs. It extracts local spatial information through convolutional operations, normalizes feature value distributions via batch normalization (BN) layers, and introduces nonlinear transformation capabilities through activation functions, thus transforming and extracting input features. This module consists of Conv2d, BatchNorm2d, and an activation function. Conv2d is a basic layer in CNNs used to extract local spatial information from input features. The convolution operation can be seen as a sliding window that moves over the input features and convolves the feature values within the window with the convolution kernel to obtain output features. Convolutional layers typically consist of multiple convolution kernels, with each kernel corresponding to an output channel. The size, stride, padding, and other hyperparameters of the convolutional kernel determine the output size and receptive field size of the convolutional layer. In CNNs, convolutional layers are typically used to construct feature extractors. BatchNorm2d performs data normalization operations along the channel dimension. After convolutional layers, BatchNorm2d is usually added to normalize the data, which helps maintain the stability of the network. An activation function is a nonlinear function used to introduce nonlinear transformation capabilities to neural networks. Commonly used activation functions include sigmoid, ReLU, LeakyReLU, ELU, etc. They exhibit different output behaviors for different ranges of input values and can better adapt to different types of data distributions.
2. C3: In object detection tasks, using C3 as the Backbone can significantly improve performance while reducing computational complexity. C3 effectively enhances the learning ability of CNNs, enabling them to better adapt to the complexity of object detection tasks. C3 module is a key component in the YOLOv5 network, aiming to increase the depth and receptive field of the network to improve feature extraction. The module consists of three Conv blocks, where the first Conv block has a stride of 2, which halves the size of the feature map, while the second and third Conv blocks have a stride of 1. Each Conv block adopts a 3 × 3 convolutional kernel. Between each Conv block, BN layers and SiLU activation functions are added to enhance the stability and generalization performance of the model.
3. SPP: The SPP module is a pooling module commonly used in CNNs to enhance the recognition capability of the network. The main idea is to perform pooling operations at different scales on the input feature map to capture features of different scales. First, the SPP module conducts pooling operations at multiple scales on the input feature map. Then, the resulting feature maps are concatenated together and dimensionally reduced through fully connected layers to obtain a fixed-size feature vector. This approach enables the model to achieve spatial and positional invariance on the input data, thus improving the model’s ability to recognize objects of different scales.
2.3.2. Neck (as Shown in Figure 8)
The Neck module illustrates the structure of feature pyramid network (FPN) [36] and path aggregation network (PAN) [37]. FPN employs multiscale features to detect objects of different sizes, while PAN is a bottom-up feature pyramid structure. By passing strong semantic features top-down through the FPN layers and strong positional features bottom-up through the PAN layers, these two structures combined can aggregate features from different backbone layers to different detection layers. Compared to FPN, the bottom-up feature propagation in PAN is more efficient, achieving better semantic segmentation with fewer computational resources. Moreover, the feature fusion method in PAN is also advantageous, better preserving fine-grained details in low-resolution feature maps, thereby improving segmentation accuracy.
[figure(s) omitted; refer to PDF]
2.3.3. Prediction
The object detection head is the part responsible for detecting objects in the feature pyramid, which includes convolutional layers, pooling layers, and fully connected layers. In the YOLOv5 model, the detection head module is mainly responsible for multiscale object detection on the feature maps extracted by the backbone network.
2.4. Model Optimization
2.4.1. Multiscale Fusion: Replacing the C3 Module With the C2f Module
The C3 module in YOLOv5 is a crucial component that employs a hierarchical structure for feature extraction, connecting low-level feature maps to high-level ones through a series of convolutional layers and bottleneck structures to capture contextual information. However, it tends to overlook the preservation of spatial information, which poses limitations when dealing with small objects requiring fine spatial details. To address this, drawing on the idea of ELAN in YOLOv7 [38], C3 and ELAN were combined to design the C2f module.
The C2f module is a CNN module designed for feature extraction, aiming to enhance detection accuracy while maintaining efficiency. Its design concept involves fusing lower level feature maps with higher level ones to capture richer semantic information. This cross-stage fusion helps detection algorithms better handle objects of different scales and hierarchies.
As shown in Figure 9, the C2f module is an improvement upon the C3 module. First, the input feature map undergoes transformation through the first convolutional layer (cv1), resulting in two parts of output feature maps. These two parts are then processed separately through different convolutional layers before being merged. This process helps the model capture more contextual information, thus enabling more accurate target recognition. The merged feature map then undergoes processing through the second convolutional layer (cv2) before the final output. The CBS module comprises Conv2d + BatchNorm2d + SLiU activation function. C2f enhances the multiscale representation capability of features by introducing cross-stage partial network fusion between feature maps at different levels.
[figure(s) omitted; refer to PDF]
In the C2f module, the first Conv block has a stride of 1, halving the size of the feature maps via the bottleneck structure to increase the network’s receptive field while reducing computational complexity. The second Conv block maintains a stride of 1, preserving spatial resolution and further extracting features to deepen the network and expand the receptive field.
Overall, the C2f module enhances feature extraction capabilities by increasing network depth and receptive field, which are crucial for computer vision tasks like object detection requiring accurate object recognition and localization, both of which rely on effective feature extraction.
The C2f module offers the following advantages:
1. Efficient Performance: C2f employs a series of convolutional and pooling layers, coupled with skip connections, to effectively extract image features while maintaining fast computation speed. This enables the model to handle large volumes of data quickly, suitable for real-time applications without compromising accuracy.
2. Multiscale Feature Fusion: By utilizing convolutional kernels and pooling operations of different sizes, C2f extracts features at various scales and merges them. This multiscale feature fusion effectively captures target information at different scales in the image, enhancing detection accuracy and robustness.
3. Powerful Receptive Field: Through multiple layers of convolutional operations, C2f enlarges the receptive field, enabling the network to better comprehend contextual information in the image. This expansion of the receptive field helps the model gain a more comprehensive understanding of the image content, thereby improving detection accuracy and generalization.
4. Reduced Information Loss: Traditional object detection algorithms often suffer from information loss. The introduction of the C2f mitigates information loss to some extent, enhancing the model’s perception of image details. This characteristic of reducing information loss helps the model accurately detect objects in the image, especially for handling small objects or complex scenes, which is of paramount importance.
2.4.2. Serial Structure SPPF Module as a Replacement for the SPP Module
SC images have complex backgrounds, and the contrast varies greatly with the process. One of the key factors in improving detection accuracy is to further utilize multiscale representation methods to capture the distribution characteristics of targets. Therefore, we designed a SPPF module with a serial structure backbone network. The module adaptively adjusts the size of feature mapping vectors to a fixed value, which helps counteract the distortion caused by the scaling process in different areas of the image. By avoiding repeated feature extraction, this method not only improves the accuracy of the network, but also reduces the computational cost.
To achieve fast computation of SPP, we replaced the parallel structure of pooling layers with a serial structure. The purpose of this structure is to utilize the output of the previous pooling layer as the basis for each subsequent pooling layer, thereby reducing the number of redundant operations and improving network efficiency. By segmenting and then recombining the pooling layers, the network can avoid redundant computations and focus on extracting higher level features from the input.
As shown in Figure 10, compared to the parallel structure, the SPPF with a serial structure reduces the computation time by approximately 50%. In CNNs, pooling layers play a critical role in reducing activation size. They decrease the computational load of the model while achieving spatial invariance and increasing the receptive field of subsequent convolutions [39, 40].
[figure(s) omitted; refer to PDF]
In deep learning, represented by convolutional networks, average pooling and max pooling are widely used. However, they often lose critical feature information when the differences between features are not significant. Average pooling averages the influence of all activations in the region, while max pooling selects only the single highest activation value in the region. To address this issue and improve the network’s generalization performance, inspired by SoftPool [41], we propose using SoftPool instead of MaxPool.
SoftPool is based on the softmax weighting method, aiming to preserve the fundamental properties of the input while amplifying features with higher intensity. In contrast to the traditional maximum pooling used in conventional SPP, our approach is differentiable. During backpropagation, each input receives gradients, thereby improving the neural connections during training.
SoftPool does not require trainable parameters, making it independent of the training data used. Moreover, it offers higher computational and memory efficiency compared to learning-based methods. The definition diagram of SoftPool is illustrated in Figure 11.
[figure(s) omitted; refer to PDF]
During backpropagation, all activations within the local kernel neighborhood are assigned a gradient proportional to at least the minimum value. The weights represent the ratio of the natural exponentials of the activations to the sum of the natural exponentials of all activations within the region R, as shown in the following formula [41]:
Through the combination of nonlinear transformation and activation values, higher activation values will dominate. Since pooling in SPPF is performed in a high-dimensional feature space, emphasizing activations with greater effects is a more balanced approach compared to simply selecting the average or maximum value. In the pooling process focusing on feature information, discarding most activation values may risk losing important information, so a more reasonable approach is to consider the overall region’s feature strength.
Regarding gradient computation, during the training update phase of the network, the gradients of all network parameters are updated based on the error derivatives computed by subsequent layers. Based on differentiability, during backpropagation, the minimum nonzero weight is assigned to each positive activation within the kernel region. This allows the calculation of gradients for each nonzero activation in that region, as illustrated in Figure 12.
[figure(s) omitted; refer to PDF]
The upgraded YOLOv5 is shown in Figure 13, C2f replaces C3, and SPPF replaces SPP.
[figure(s) omitted; refer to PDF]
3. Results and Discussion
3.1. Experimental Platform Evaluation Indicators
The model was trained on the Windows 10 operating system and the PyTorch framework. The testing device has a CPU model of Gen Intel(R) Core(TM) i9-12900KF 3.20 GHz, a GPU model of GeForce RTX 3080Ti 12 G, a base RAM memory of 32 G, and the software environment includes CUDA 11.8, CUDNN 8.9.3.28, Python 3.9, and PyTorch 2.0.1.
We trained the YOLOv5 and upgraded YOLOv5 models separately. The parameter settings were as follows: maximum iterations set to 300, momentum factor set to 0.937, and initial learning rate of 0.001. The YOLOv5 model used the Adam optimizer, with a batch size of 16 to expedite the training process. The input image resolution followed the standard size of 640 × 640.
The assessment criteria for the models encompass Precision, Recall, and mean average precision (mAP). The mathematical representations of these metrics are detailed as follows [42]:
3.2. Ablation Experiment
To investigate the impact of data augmentation, the C2f module, and the SPPF module on the YOLOv5 model, we conducted ablation experiments, and the results are shown in Table 1.
Table 1
Statistical results of ablation experiments.
YOLOv5 | Data enhancement | C2f | SPPF | Precision (%) | Recall (%) | [email protected]:0.95 (%) | [email protected] (%) | [email protected] (%) increase |
√ | 65.7 | 60.4 | 27.6 | 71.2 | ||||
√ | √ | 85.1 | 84.2 | 44.9 | 88.7 | 17.5 | ||
√ | √ | √ | 85.7 | 85.2 | 46.1 | 90.5 | 1.8 | |
√ | √ | √ | √ | 86.8 | 86.2 | 45.7 | 91.5 | 1.0 |
From Table 1, it can be observed that by introducing data augmentation, the model’s performance on [email protected] increased by 17.5%, and there was a 17.3% improvement in [email protected]:0.95. Furthermore, the precision increased by 20.6%, and the recall increased by 23.8%. When combining data augmentation with the C2f module, it exhibited outstanding performance enhancement, with improvements in [email protected] (1.8%), [email protected]:0.95 (2.2%), precision (0.6%), and recall (1.0%). Similarly, when combining data augmentation, the C2f module, and SPP, the improvement in [email protected] reached 1.0%, precision increased by 1.1%, and recall increased by 1.05%. These results indicate that the optimized model has significantly improved accuracy, demonstrating better detection performance.
3.3. Comparison of the Original Model
Table 2 presents a comparative analysis of the performance between the enhanced network model (which incorporates data augmentation, C2f, and SPPF) and the original YOLOv5 model, specifically examining the mAP at a threshold of 0.5 ([email protected]) across various categories. The findings indicate that the enhanced network model exhibits an increase in [email protected] across all categories assessed. Notably, the original model recorded a [email protected] of 43.4% for crack defects, whereas the upgraded model achieved a [email protected] of 93.8%, reflecting a substantial improvement of 50.4%. Figure 14 illustrates the comparative data. The experimental results underscore the efficacy of the upgraded model.
Table 2
Comparison of the original model and the upgraded model on [email protected] (%) for different defect categories.
Model | Cross-undulation (%) | Cracks (%) | Irregular undulation (%) | Perforation (%) | Edge collapse (%) | Breakage (%) | All (%) |
Original model | 86.6 | 62.5 | 82.7 | 86.6 | 66.6 | 43.4 | 71.2 |
Upgraded model | 95.3 | 84.6 | 95.9 | 94.5 | 84.6 | 93.8 | 91.5 |
[figure(s) omitted; refer to PDF]
Furthermore, this study conducts a comparative analysis of the upgraded model against the original YOLOv5 model in terms of [email protected], [email protected]:0.9, and precision. The corresponding change curves are presented in Figure 15.
[figure(s) omitted; refer to PDF]
As illustrated in Figure 15, the upgraded network model demonstrates superior accuracy in object detection compared to the original YOLOv5 network model. Furthermore, the upgraded model exhibits reduced fluctuations during the convergence process, with the average precision stabilizing after 200 iterations. Overall, the upgraded model has shown enhanced performance in the detection and recognition of SC when compared to the original network model.
Table 3 presents a comparative analysis of the training time required by both the original model and the upgraded model. The data indicate that after 100 iterations, the original model requires 98 min, whereas the upgraded model completes the same task in 93 min, reflecting a time reduction of 5.10%. Furthermore, at 200 and 300 iterations, the time reductions observed are 5.67% and 5.86%, respectively. These findings suggest that the upgraded model demonstrates superior performance in terms of energy efficiency and contributes to a decrease in computational costs.
Table 3
Model training time at different iterations.
Model | 100 (min) | 200 (min) | 300 (min) |
Original model | 98 | 194 | 290 |
Upgraded model | 93 | 183 | 273 |
3.4. Comparison of the Various Models
This paper presents a comparative analysis of the enhanced YOLOv5 model against six prominent detection models: YOLOv3 [43], YOLOv7 [44], YOLOv9 [45], YOLOv10 [46], Faster R-CNN [47], and SSD [48]. A statistical examination of five key parameters—model parameters, detection frames per second (FPS), precision, recall, and mAP at 0.5 ([email protected])—is conducted, with the results summarized in Table 4.
Table 4
Performance comparison results between the upgraded YOLOv5 and other algorithms.
Models | Parameters (M) | FPS | P (%) | R (%) | [email protected] (%) |
YOLOv3-tiny | 243 | 18.2 | 79.8 | 80.2 | 75.2 |
YOLOv7-tiny | 6.2 | 28.6 | 84.7 | 83.5 | 87.8 |
YOLOv9-s | 7.1 | 22.4 | 85.5 | 85.1 | 88.2 |
YOLOv10-s | 11.2 | 21.0 | 86.7 | 86.3 | 89.8 |
Faster R-CNN | 108 | 3.6 | 87.1 | 83.5 | 83.2 |
SSD | 100 | 20.7 | 79.7 | 81.2 | 73.3 |
Upgraded YOLOv5 | 8.7 | 24.2 | 86.8 | 86.2 | 91.5 |
As illustrated in Table 4, the parameters of the upgraded YOLOv5 model constitute 11.1% of those of the SSD model, while metrics such as [email protected], precision, recall, and FPS have improved by 18.2%, 7.1%, and 5.0%, respectively, thereby surpassing the performance of the SSD model in all evaluated aspects.
Although the two-stage detection model, Faster R-CNN, achieves a higher precision of 87.1, it has a FPS of only 3.6, which does not satisfy the speed requirements for industrial detection. In contrast, the upgraded YOLOv5 model demonstrates a detection speed that is 672.2% faster than that of the Faster R-CNN model, thereby meeting the necessary industrial inspection speed criteria.
While YOLOv7 exhibits the highest FPS, it does so at the expense of other detection performance metrics. Specifically, when compared to the upgraded YOLOv5, YOLOv7 shows reductions in [email protected], precision, and recall by 3.7%, 2.1%, and 2.7%, respectively.
In comparison with YOLOv10-s, the upgraded YOLOv5 model has approximately 28% fewer parameters, yet it achieves a 0.7% increase in mAP and an increase in FPS from 21.0 to 24.2, representing a 15% improvement. When compared to YOLOv9-s, the parameters of the upgraded YOLOv5 have not significantly increased; however, there is an 8% enhancement in FPS, a 1.3% increase in precision, a 1.1% rise in recall, and a notable 3.3% improvement in [email protected].
The comparative results presented previously indicate that the YOLOv5 model developed in this study exhibits substantial advancements across all metrics when juxtaposed with existing methodologies. The enhancements made to the YOLOv5 model have led to significant improvements in all evaluation indicators without a considerable increase in model parameters. Through the analysis of experimental data, it can be concluded that the improved YOLOv5 model demonstrates superior overall detection performance, making it highly suitable for the effective identification of defects in semiconductor manufacturing.
3.5. Visualization of Detection for Model
3.5.1. Detection Results and Feature Map Analysis
As shown in Figure 16, we utilized the confusion matrix function to compute the confusion matrix. Finally, we normalized each element of the confusion matrix by dividing it by the total sum of its corresponding row, resulting in a normalized confusion matrix. By interpreting the normalized confusion matrix, we can obtain the prediction accuracy and prediction error rate for each class. The diagonal values in the matrix represent the percentage of correctly predicted defects for each type of SC defect. Specifically, the accuracy rates for cross-undulation, cracks, irregular undulation, perforation, edge collapse, breakage, and background are 91%, 82%, 88%, 95%, 95%, and 92%, respectively. The accuracy rates for all defect categories exceed 80%, with an average above 90%. Therefore, this model demonstrates practical value for defect detection in SC.
[figure(s) omitted; refer to PDF]
This study conducted an analysis of feature map visualization within a segment of the feature extraction layer [49]. Figure 17 presents the feature maps and detection outcomes from selected networks based on both the original YOLOv5 and its enhanced version. The term “Stage 2” refers to the comprehensive feature maps encompassing all channels within the initial C3 module (C2f module) subsequent to the data’s passage through the original YOLOv5 model (and the upgraded YOLOv5), along with their 1:1 fusion. The designations “P3, P4, P5” correspond to the feature maps that are transmitted to the detection head from the three respective detection layers. An examination of the feature maps in Stage 2, as depicted in Figure 17, reveals that the upgraded model’s feature maps exhibit greater clarity than those of the original model, thereby demonstrating a superior retention of defect characteristics. This observation suggests that the C2f module is more effective in preserving intricate details.
[figure(s) omitted; refer to PDF]
In the detection process utilizing the original YOLOv5, both Image 1 and Image 2 encountered misclassifications. Specifically, in Image 1, a black spot (indicative of production debris) was incorrectly identified as an edge collapse, while in Image 2, a white water stain was erroneously classified as a crack. Conversely, the detection process employing the upgraded YOLOv5 did not yield any misclassifications, thereby indicating enhanced robustness of the upgraded model. In Image 3, which contains a subtle crack and an irregular undulation, both models successfully identified the irregular undulation defect; however, the original model failed to detect the crack defect, whereas the upgraded model accurately identified it.
The analysis of the feature maps suggests that the original model is less effective in concentrating on defects compared to the upgraded model. This indicates that the upgraded model is more adept at emphasizing relevant features of the target while diminishing the influence of extraneous or irrelevant features, ultimately enhancing the performance of the network model.
To substantiate the efficacy of the optimized model, the detection outcomes are presented in Figure 18. The data illustrated in the figure indicate that the model demonstrates exceptional detection and localization capabilities, exhibiting a high degree of accuracy. Furthermore, it is capable of accurately identifying multiple defects, including minor defects, even in complex backgrounds. The visual results depicted in Figure 18 distinctly highlight the model’s superior detection and positioning abilities, underscored by its high accuracy.
[figure(s) omitted; refer to PDF]
3.5.2. Comparative Analysis With the Original Model
To conduct a visual analysis of the performance disparity between the two models, we present the prediction outcomes of both models on the validation set, as illustrated in Figure 19. This figure delineates the detection results of the original model alongside those of the optimized model. The first row displays the detection outcomes of the original model, while the second row presents the results from the upgraded model. Instances of missed detections are indicated by white marks, whereas incorrect detections are represented by black marks.
[figure(s) omitted; refer to PDF]
In the second column, the original model fails to detect minor defects, whereas the upgraded model successfully identifies them. Similarly, in the third column, the original model overlooks weak defect features, which the upgraded model accurately detects. Furthermore, in the fourth column, the original model exhibits erroneous detections in the presence of a complex background, while the upgraded model effectively mitigates the interference posed by such backgrounds.
The analysis indicates that the original YOLOv5 model frequently encounters difficulties when confronted with complex backgrounds or minor defects. Conversely, the upgraded YOLOv5 model demonstrates considerable advantages in defect detection under challenging conditions, attributable to its improved capacity to capture positional information and its enhanced resilience to interference.
3.6. Sensitivity Test and Field Tests
3.6.1. Sensitivity Test Under Different Lighting Conditions
This research evaluated the robustness of the proposed model across varying lighting conditions by manipulating light intensity during the experimental procedures. New samples of SC were collected, featuring six distinct types of defects distributed in a nearly uniform manner. During the image acquisition phase for the same batch of SC, light intensity was systematically adjusted to create datasets under three distinct conditions: strong light, medium intensity light (the standard intensity for typical image acquisition), and weak light.
The detection results of the model are illustrated in Figure 20. The first row of results demonstrates that the model effectively identifies defect targets across all three lighting conditions. The second row presents detection images of two defect types, revealing confidence levels for crack defects of 0.55 and 0.58 under strong and weak light conditions, respectively, while the confidence level under medium light was 0.59, indicating comparable performance. The third and fourth rows of detection results further confirm that the model can accurately locate defects under varying conditions, showcasing commendable stability. The fifth row indicates that under medium light conditions, the confidence levels for crack defects and breakage defects reached their peak values of 0.76 and 0.54, respectively; under strong light, these confidence levels were 0.75 and 0.50, while under weak light, they were 0.75 and 0.51. These findings suggest that the model maintains good stability and accuracy in response to fluctuations in lighting.
[figure(s) omitted; refer to PDF]
Table 5 summarizes the test results, indicating that the correct identification rates (or recall rates) under strong and weak light conditions were 97.38% and 96.91%, respectively, which are comparable to the 98.09% achieved under normal lighting conditions. Among all detected SC defects under strong light, 0.71% were misclassified as background; in weak light, this proportion increased to 1.43%, while under normal lighting conditions, it was 0.71%, reflecting minimal discrepancies. In the analysis of SC identified under varying lighting conditions, it was observed that 1.90% of defects were falsely identified under strong light, compared to 1.66% under weak light, and 1.18% under normal lighting conditions. These findings suggest that the discrepancies in misclassification rates across different lighting environments are minimal. These results illustrate that the proposed model demonstrates significant robustness to variations in lighting, a crucial attribute for the functionality of industrial detection models in complex environments.
Table 5
Defect results under different lighting conditions.
Illumination | SC count | Correctly identified | Falsely identified | Missed | |||
Amount | Rate (%) | Amount | Rate (%) | Amount | Rate (%) | ||
Strong light | 421 | 410 | 97.38 | 8 | 1.90 | 3 | 0.71 |
Medium light | 421 | 413 | 98.09 | 5 | 1.18 | 3 | 0.71 |
Weak light | 421 | 408 | 96.91 | 7 | 1.66 | 6 | 1.43 |
3.6.2. Field Testing
In the context of industrial production inspections, detection systems are subject to a variety of influencing factors, such as noise, ambient light, vibrations, and hardware configurations. The key performance metrics for evaluating the defect detection system utilized for SCs include accuracy, efficiency, and robustness. To assess the stability of the model, it was implemented on the production line for empirical testing. To maintain the integrity of the experimental conditions, no modifications were made to the existing hardware or environmental factors. Figure 21 shows a schematic diagram of the detection results of the deployment software interface.
[figure(s) omitted; refer to PDF]
Table 6 presents a summary of the test outcomes, revealing that a total of 2948 images were utilized to thoroughly evaluate the model’s performance. This dataset comprised nearly 200 images for each defect category, alongside 1608 images representing defect-free samples. The detection accuracy achieved was greater than 95.00%, culminating in an overall detection accuracy of 98.30%. The average time taken for detection per image was recorded at 40 milliseconds, which translates to approximately 24.39 FPS, aligning closely with our experimental finding of 24.2 FPS. In summary, the enhanced YOLOv5 model fulfills the application requirements pertinent to industrial settings and exhibits significant potential for commercial application.
Table 6
On-site test statistical results.
Defect type | Amount | Correct classification | Error classification | Accuracy (%) | Average detection time (ms) |
Cross-undulation | 243 | 238 | 5 | 97.79 | / |
Cracks | 210 | 201 | 9 | 95.71 | / |
Irregular undulation | 220 | 209 | 11 | 95.00 | / |
Perforations | 206 | 201 | 5 | 97.57 | / |
Edges collapse | 263 | 258 | 5 | 98.10 | / |
Breakage | 198 | 192 | 6 | 98.99 | / |
No defects | 1608 | 1599 | 9 | 96.97 | / |
Total | 2948 | 2898 | 50 | 98.30 | 41 |
4. Conclusions
In this study, we employ PL imaging to identify defects in SC and investigate a detection methodology based on the YOLOv5 model for the identification of defects in SC. A comprehensive dataset of SC images was developed, comprising over 4427 images with a resolution of 640 × 640 pixels, which were subsequently analyzed and preprocessed. Initially, a C2f is developed to enhance the depth and receptive field of the network, thereby augmenting its capacity for feature extraction. Furthermore, to augment the convolutional network’s proficiency in capturing target features, we proposed a serial SPPF module integrated with soft pooling. This approach aims to minimize redundant operations, enhance network efficiency, and prioritize the extraction of higher level features from the input data. To bolster the model’s generalization performance, we incorporated a variety of data augmentation techniques, including Mosaic, Mixup, HSV transformation, Gaussian noise, and scale transformation. The results from comparative testing and ablation studies indicated that the mAP at 0.5% for the improved detection model reached 91.5%, representing a 20.3% increase over the original model, with certain defect categories exhibiting enhancements of up to 50.4%. The model achieved a processing speed of 24.2 FPS, with “Precision” and “Recall” values of 86.8% and 86.2%, respectively. In conclusion, the improved object detection network, when integrated with various data augmentation techniques, demonstrates the capability to effectively and efficiently identify SC defects in intricate environments. The positive outcomes of the enhancements indicate that the model possesses significant potential for practical applications.
Future research will focus on further optimizing the network architecture to enhance detection accuracy. Additionally, deep learning methodologies typically necessitate substantial quantities of labeled data for effective training, alongside stringent standards for data quality and diversity. Insufficient labeled data can result in diminished performance. The acquisition of defect data pertaining to SC poses challenges, as obtaining large-scale labeled datasets can be both difficult and costly. Future research endeavors may investigate the feasibility of leveraging unlabeled or weakly labeled data to enhance learning performance and reasoning capabilities.
Funding
No funding was received for this manuscript.
[1] H. Dong, B. Zeng, Y. Wang, Y. Liu, M. Zeng, "China’s Solar Subsidy Policy: Government Funding Yields to Open Markets," IEEE Power and Energy Magazine, vol. 18 no. 3, pp. 49-60, DOI: 10.1109/mpe.2020.2971824, 2020.
[2] C. Schuss, K. Leppänen, K. Remes, "Detecting Defects in Photovoltaic Cells and Panels and Evaluating the Impact on Output Performances," IEEE Transactions on Instrumentation and Measurement, vol. 65 no. 5, pp. 1108-1119, DOI: 10.1109/tim.2015.2508287, 2016.
[3] C. Schuss, K. Remes, K. Leppänen, J. Saarela, T. Fabritius, B. Eichberger, T. Rahkonen, "Detecting Defects in Photovoltaic Panels With the Help of Synchronized Thermography," IEEE Transactions on Instrumentation and Measurement, vol. 67 no. 5, pp. 1178-1186, DOI: 10.1109/tim.2018.2809078, 2018.
[4] G. Lo Sciuto, G. Capizzi, R. Shikler, S. Napoli, "Organic Solar Cells Defects Classification by Using a New Feature Extraction Algorithm and an EBNN With an Innovative Pruning Algorithm," International Journal of Intelligent Systems, vol. 36 no. 6, pp. 2443-2464, DOI: 10.1002/int.22386, 2021.
[5] J. Xu, Y. Liu, H. Xie, F. Luo, "Surface Quality Assurance Method for Lithium-Ion Battery Electrode Using Concentration Compensation and Partiality Decision Rules," IEEE Transactions on Instrumentation and Measurement, vol. 69 no. 6, pp. 3157-3169, DOI: 10.1109/tim.2019.2929670, 2020.
[6] J. Xu, Y. Liu, Y. Wu, "Automatic Defect Inspection for Monocrystalline Solar Cell Interior by Electroluminescence Image Self-Comparison Method," IEEE Transactions on Instrumentation and Measurement, vol. 70,DOI: 10.1109/tim.2021.3096602, 2021.
[7] H. Nesswetter, P. Lugli, A. W. Bett, C. G. Zimmermann, "Electroluminescence and Photoluminescence Characterization of Multijunction Solar Cells," IEEE Journal of Photovoltaics, vol. 3 no. 1, pp. 353-358, DOI: 10.1109/jphotov.2012.2213801, 2013.
[8] U. Rau, "Superposition and Reciprocity in the Electroluminescence and Photoluminescence of Solar Cells," IEEE Journal of Photovoltaics, vol. 2 no. 2, pp. 169-172, DOI: 10.1109/jphotov.2011.2179018, 2012.
[9] M. Demant, H. Höffler, D. Schwaderer, A. Seidl, S. Rein, "Evaluation and Improvement of a Feature-Based Classification Framework to Rate the Quality of Multicrystalline Silicon Wafers," Presented at the 28th European PV Solar Energy Conference and Exhibition, .
[10] Y. Liu, J. Xu, Y. Wu, "A CISG Method for Internal Defect Detection of Solar Cells in Different Production Processes," IEEE Transactions on Industrial Electronics, vol. 69 no. 8, pp. 8452-8462, DOI: 10.1109/tie.2021.3104584, 2022.
[11] M. Köntges, I. Kunze, S. Kajari-Schröder, X. Breitenmoser, B. Bjorneklett, "The Risk of Power Loss in Crystalline Silicon Based Photovoltaic Modules Due to Micro-Cracks," Solar Energy Materials and Solar Cells, vol. 95 no. 4, pp. 1131-1137, DOI: 10.1016/j.solmat.2010.10.034, 2011.
[12] G. Sciuto, C. Napoli, G. Capizzi, R. J. O. Shikler, "Organic Solar Cells Defects Detection by Means of an Elliptical Basis Neural Network and a New Feature Extraction Technique," Optik, vol. 194,DOI: 10.1016/j.ijleo.2019.163038, 2019.
[13] W.-C. Li, D.-M. Tsai, "Automatic Saw-Mark Detection in Multicrystalline Solar Wafer Images," Solar Energy Materials and Solar Cells, vol. 95 no. 8, pp. 2206-2220, DOI: 10.1016/j.solmat.2011.03.025, 2011.
[14] D.-m. Tsai, J. Luo, "Mean Shift-Based Defect Detection in Multicrystalline Solar Wafer Surfaces," IEEE Transactions on Industrial Informatics, vol. 7 no. 1, pp. 125-135, DOI: 10.1109/tii.2010.2092783, 2011.
[15] J. Ko, J. Rheem, "Defect Detection of Polycrystalline Solar Wafers Using Local Binary Mean," The International Journal of Advanced Manufacturing Technology, vol. 82 no. 9-12, pp. 1753-1764, DOI: 10.1007/s00170-015-7498-z, 2016.
[16] S. A. Anwar, M. Z. Abdullah, "Micro-crack Detection of Multicrystalline Solar Cells Featuring an Improved Anisotropic Diffusion Filter and Image Segmentation Technique," EURASIP Journal on Image and Video Processing, vol. 2014 no. 1,DOI: 10.1186/1687-5281-2014-15, 2014.
[17] H. Chen, H. Zhao, D. Han, K. Liu, "Accurate and Robust Crack Detection Using Steerable Evidence Filtering in Electroluminescence Images of Solar Cells," Optics and Lasers in Engineering, vol. 118, pp. 22-33, DOI: 10.1016/j.optlaseng.2019.01.016, 2019.
[18] Y. Fu, X. Ma, H. Zhou, "Automatic Detection of Multi-Crossing Crack Defects in Multi-Crystalline Solar Cells Based on Machine Vision," Machine Vision and Applications, vol. 32 no. 3,DOI: 10.1007/s00138-021-01183-9, 2021.
[19] D. M. Tsai, S. C. Wu, W. C. Li, "Defect Detection of Solar Cells in Electroluminescence Images Using Fourier Image Reconstruction," Solar Energy Materials and Solar Cells, vol. 99, pp. 250-262, DOI: 10.1016/j.solmat.2011.12.007, 2012.
[20] W.-C. Li, D.-m. Tsai, "Wavelet-Based Defect Detection in Solar Wafer Images With Inhomogeneous Texture," Pattern Recognition, vol. 45 no. 2, pp. 742-756, DOI: 10.1016/j.patcog.2011.07.025, 2012.
[21] Z. Chen, M. Wang, J. Zhang, "Object Detection in UAV Images Based on Improved YOLOv5," pp. 267-278, .
[22] C. Wu, R. Liang, S. He, H. Wang, "Real-Time Vehicle Detection Method Based on Aerial Image in Complex Background," pp. 508-518, .
[23] L. Aktouf, Y. Shivanna, M. Dhimish, "High-Precision Defect Detection in Solar Cells Using YOLOv10 Deep Learning Model," Inside Solaris, vol. 4, pp. 639-659, DOI: 10.3390/solar4040030, 2024.
[24] G. Tang, J. Ni, Y. Zhao, Y. Gu, W. Cao, "A Survey of Object Detection for UAVs Based on Deep Learning," Remote Sensing, vol. 16 no. 1,DOI: 10.3390/rs16010149, 2023.
[25] Z. X. Zou, K. Y. Chen, Z. W. Shi, Y. H. Guo, J. P. Ye, "Object Detection in 20 years: A Survey," Proceedings of the IEEE, vol. 111 no. 3, pp. 257-276, DOI: 10.1109/jproc.2023.3238524, 2023.
[26] H. Chen, Y. Pang, Q. Hu, K. Liu, "Solar Cell Surface Defect Inspection Based on Multispectral Convolutional Neural Network," Journal of Intelligent Manufacturing, vol. 31 no. 2, pp. 453-468, DOI: 10.1007/s10845-018-1458-z, 2020.
[27] B. Su, H. Chen, K. Liu, W. Liu, "RCAG-Net: Residual Channel-Wise Attention Gate Network for Hot Spot Defect Detection of Photovoltaic Farms," IEEE Transactions on Instrumentation and Measurement, vol. 99, 2021.
[28] B. Su, H. Y. Chen, P. Chen, G. B. Bian, K. Liu, W. Liu, "Deep Learning-Based Solar-Cell Manufacturing Defect Detection With Complementary Attention Network," IEEE Transactions on Industrial Informatics, vol. 17 no. 6, pp. 4084-4095, DOI: 10.1109/tii.2020.3008021, 2021.
[29] X. Zhang, T. Hou, Y. Hao, H. Shangguan, A. Wang, S. Peng, "Surface Defect Detection of Solar Cells Based on Multiscale Region Proposal Fusion Network," IEEE Access, vol. 9 no. 99, pp. 62093-62101, DOI: 10.1109/access.2021.3074219, 2021.
[30] N. Zhang, S. Shan, H. Wei, K. Zhang, "Micro-Cracks Detection of Polycrystalline Solar Cells With Transfer Learning," Journal of Physics: Conference Series, vol. 1651 no. 1,DOI: 10.1088/1742-6596/1651/1/012118, 2020.
[31] S. Lu, K. X. Wu, J. X. Chen, "Solar Cell Surface Defect Detection Based on Optimized YOLOv5," IEEE Access, vol. 11, pp. 71026-71036, DOI: 10.1109/access.2023.3294344, 2023.
[32] Z. Ren, F. Fang, N. Yan, Y. Wu, "State of the Art in Defect Detection Based on Machine Vision," International Journal of Precision Engineering and Manufacturing-Green Technology, vol. 9 no. 2, pp. 661-691, DOI: 10.1007/s40684-021-00343-6, 2022.
[33] S. H. Lim, J.-J. Li, E. H. Steenbergen, Y.-H. Zhang, "Luminescence Coupling Effects on Multijunction Solar Cell External Quantum Efficiency Measurement," Progress in Photovoltaics: Research and Applications, vol. 21 no. 3, pp. 344-350, DOI: 10.1002/pip.1215, 2013.
[34] M. Xu, S. Yoon, A. Fuentes, D. S. Park, "A Comprehensive Survey of Image Augmentation Techniques for Deep Learning," Pattern Recognition, vol. 137,DOI: 10.1016/j.patcog.2023.109347, 2023.
[35] C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, "YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors," IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464-7475, DOI: 10.1109/cvpr52729.2023.00721, .
[36] T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, S. Belongie, "Feature Pyramid Networks for Object Detection," 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, .
[37] S. Liu, L. Qi, H. F. Qin, J. P. Shi, J. Y. Jia, "Path Aggregation Network for Instance Segmentation," 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759-8768, .
[38] C.-Y. Wang, A. Bochkovskiy, H.-Y. M. Liao, "YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors," IEEE/CVF Conference on Computer Vision Pattern Recognition, pp. 7464-7475, DOI: 10.1109/cvpr52729.2023.00721, .
[39] E. Real, A. Aggarwal, Y. Huang, Q. V. Le, "Regularized Evolution for Image Classifier Architecture Search," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33 no. 01, pp. 4780-4789, DOI: 10.1609/aaai.v33i01.33014780, 2019.
[40] S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, "Aggregated Residual Transformations for Deep Neural Networks," IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492-1500, .
[41] A. Stergiou, R. Poppe, G. Kalliatakis, "Refining Activation Downsampling With SoftPool," pp. 10337-10346, .
[42] A. E. Maxwell, T. A. Warner, L. A. Guillén, "Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review," Remote Sensing, vol. 13 no. 13,DOI: 10.3390/rs13132450, 2021.
[43] L. S. Fu, Y. L. Feng, J. Z. Wu, "Fast and Accurate Detection of Kiwifruit in Orchard Using Improved YOLOv3-Tiny Model," Precision Agriculture, vol. 22 no. 3, pp. 754-776, DOI: 10.1007/s11119-020-09754-y, 2021.
[44] I. Gallo, A. U. Rehman, R. H. Dehkordi, N. Landro, R. La Grassa, M. Boschetti, "Deep Object Detection of Crop Weeds: Performance of YOLOv7 on a Real Case Dataset from UAV Images," Remote Sensing, vol. 15 no. 2,DOI: 10.3390/rs15020539, 2023.
[45] C.-Y. Wang, I. H. Yeh, H. Y. Mark Liao, "YOLOv9: Learning what You Want to Learn Using Programmable Gradient Information," Lecture Notes in Computer Science,DOI: 10.1007/978-3-031-72751-1_1, 2024.
[46] A. Wang, H. Chen, L. Liu, "YOLOv10: Real-Time End-to-End Object Detection," 2024. https://arxiv.org/abs/2405.14458
[47] S. H. Wan, S. Goudos, "Faster R-CNN for Multi-Class Fruit Detection Using a Robotic Vision System," Computer Networks, vol. 168,DOI: 10.1016/j.comnet.2019.107036, 2020.
[48] L. Huang, C. Chen, J. T. Yun, "Multi-Scale Feature Fusion Convolutional Neural Network for Indoor Small Target Detection," Frontiers in Neurorobotics, vol. 16,DOI: 10.3389/fnbot.2022.881021, 2022.
[49] H.-C. Shin, H. R. Roth, M. Gao, "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning," IEEE Transactions on Medical Imaging, vol. 35 no. 5, pp. 1285-1298, DOI: 10.1109/tmi.2016.2528162, 2016.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2025 Gengcong Xu et al. Journal of Engineering published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution License (the “License”), which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Solar cells (SCs) are prone to various defects, which affect energy conversion efficiency and even cause fatal damage to photovoltaic modules. In this paper, photoluminescence (PL) imaging is used to visualize SC defects, based on which a detection method based on the YOLOv5 model is explored. At the same time, five data enhancement methods such as Mosaic, Mixup, HSV transformation, Gaussian noise, and rotation transformation are introduced to improve the representativeness of the data set and enhance the detection ability of the model. Second, a C2f module is designed to enhance the network model’s ability to fuse features. In order to further improve the convolutional network’s ability to capture target features, a series SPPF module combined with soft pooling is proposed to reduce the number of repeated operations, improve network efficiency, and focus on extracting higher level features from the input. Experimental results show that the optimized model’s mAP reaches 91.5%, which is 20.3% higher than the original model. The mAP increase of some defect types reaches 50.4%, and the detection speed reaches 24.2 FPS. The model’s defect detection capability for SC has been significantly enhanced, meeting the speed requirements at the same time.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer