Content area
Purpose
This study aims to solve the problems of large training sample size, low data sample quality, low efficiency of the currently used classical model, high computational complexity of the existing concern mechanism, and high graphics processing unit (GPU) occupancy in the current visualization software defect prediction, proposing a method for software defect prediction termed recurrent criss-cross attention for weighted activation functions of recurrent SE-ResNet (RCCA-WRSR). First, following code visualization, the activation functions of the SE-ResNet model are replaced with a weighted combination of Relu and Elu to enhance model convergence. Additionally, an SE module is added before it to filter feature information, eliminating low-weight features to generate an improved residual network model, WRSR. To focus more on contextual information and establish connections between a pixel and those not in the same cross-path, the visualized red as integer, green as integer, blue as integer images are inputted into a model incorporating a fused RCCA module for defect prediction.
Design/methodology/approach
Software defect prediction based on code visualization is a new software defect prediction technology, which mainly realizes the defect prediction of code by visualizing code as image, and then applying attention mechanism to extract the features of image. However, the challenges of current visualization software defect prediction mainly include the large training sample size and low sample quality of the data, and the classical models used today are not efficient, and the existing attention mechanisms have high computational complexity and high GPU occupancy.
Findings
Experimental evaluation using ten open-source Java data sets from PROMISE and five existing methods demonstrates that the proposed approach achieves an F-measure value of 0.637 in predicting 16 cross-version projects, representing a 6.1% improvement.
Originality/value
RCCA-WRSR is a new visual software defect prediction based on recurrent criss-cross attention and improved residual network. This method effectively enhances the performance of software defect prediction.
1. Introduction
With the advancement of science and technology, software systems are becoming increasingly complex. The complexity of software systems implies more interactions and components, which can lead to an increase in hidden defects or errors. Defects in software, especially those in production environments, can potentially result in significant economic losses and damage to reputation. Therefore, early prediction and identification of these defects have become crucial.
Currently, in most studies on software defect prediction (SDP), researchers often use deep learning techniques to extract various features from the source code of software, thus improving the accuracy of defect prediction models (Croft et al., 2022; Chen and Wang, 2022; Xie and Jianhua, 2024; Meng et al., 2023a; Bhutamapuram, 2023). Some scholars attempt to incorporate semantic information of programs into defect prediction using deep belief networks (DBN). Additionally, some scholars design long short-term memory networks (LSTM) to parse the structural information of programs and the relationships between code elements. Pornprasit and Tantithamthavorn (2022) proposed a deep learning approach to automatically learn semantic attributes of the surrounding tokens and lines to identify defective files and lines. Yu et al. (2021) also introduced an SDP method with a self-attention mechanism to predict software defects. They transform programs into Abstract Syntax Trees (ASTs) and encode ASTs into token vectors.
However, in these studies, many features are likely to be overlooked, such as features hidden in the AST that may not be represented using existing features. Moreover, the transmission of semantic information with ASTs is indirect and requires additional tools and methods to construct ASTs, resulting in the loss of significant information in the indirect process.
Therefore, in recent years, some scholars have attempted to abandon intermediate tools and use more direct methods to express the semantic information of the code. Chen et al. (2020) introduced a defect prediction method combining software visualization and deep transfer learning (DTLDP). This method converts code into letter-level images through visualization, avoiding information loss that may occur due to manual feature extraction or AST transformation. Qiu et al. (2022) proposed the CNN-global self-attention (GSA) method, which introduces the global attention layer (GAL) to obtain more effective information for classification.
However, the global attention method often exhibits high computational complexity and consumes a large amount of graphics processing unit (GPU) memory. And as research on visual SDP progresses, issues such as the increase in sample size and the low quality of training samples gradually become apparent.
The various attention methods applied in the field of visualization software defects do not play a good role, and greatly slow down the efficiency. Therefore, this paper introduces the cyclic cross-over attention method, aiming at reducing the computational complexity and memory consumption of GPU. What is more, this paper directly converts the code into ASCII code and then maps it to the image, so the context information of the code will be directly displayed in the image, while the cross-over attention method can pay more attention to the context information of the image. The research on the object code is more accurate, and the cyclic cross attention method adopted in this paper, through two layers of cross attention network, can establish links between each pixel in the image and pixels that are not in the same cross-path, avoiding problems such as improper grasp of the overall situation.
As visual software defect prediction is an emerging software defect prediction method, some older and classical models have become increasingly unsuitable for current samples. However, the improved residual network algorithm adopted in this paper refilters feature information and automatically removes low-weight features. More channels can obtain more feature information. It is conducive to further convolution operations, and can achieve more accurate positioning and more effective efficiency for samples of big data. Moreover, the Relu activation function that has always been used in neural networks is replaced, which will make the model more convergent.
In general, this paper proposes a software visualization prediction method based on recurrent cross-attention and improved residual networks (RCCA-WRSR), aiming to enhance the performance of SDP. First, after code visualization, this paper replaces the activation function of the SE-Resnet model with a weighted combination of Relu + Elu and adds an squeeze-and-excitation networks (SE) module in front of it to generate a new model, WRSR. Then, the visualized red as integer, green as integer, blue as integer (RGB) images are input into the model, integrating the recurrent cross attention module for defect prediction.
Our experiments on the 10 open Java projects provided by PROMISE have proven our method to enhance SDP performance.
The remainder of this paper is organized as follows: Section 2 reviews related work to this study. The RCCA-WRSR method is introduced mainly in Section 3, and the experimental setup is described in Section 4. Section 5 discusses the results and performance of RCCA-WRSR, including ablation experiments. Section 6 concludes and outlines future work.
2. Related work
Software defect prediction based on code visualization
In recent years, in the field of SDP, visualization techniques have gradually become a hot direction. In the study by Xiong et al. (2022), they developed a visual analytics system aimed at exploring the process of different script language data transformations. This system constructs diverse mappings from data to visual forms within the data transformation space, providing detailed descriptions of the semantics of data transformation and its input−output relationships to assist data workers in understanding the data transformation process more deeply. Additionally, Wang et al. (2020) introduced the convolutional neural network (CNN) Explainer, an interactive visualization tool designed specifically for CNNs. It combines visualization techniques such as overview with details and animation to effectively illustrate the structure of complex models, facilitating smooth transitions and collaboration between different abstraction levels, enabling users to gain a comprehensive understanding of the complex concepts within CNNs.
However, current code visualization for SDP often relies on specific tools for feature extraction to construct intermediate representations of code. Nevertheless, such methods sometimes lead to the loss of important information directly related to defects. Therefore, Chen et al. (2020) proposed a visual SDP method in which they convert code characters into pixels, with colors based on ASCII decimal values. These pixels are then arranged into matrices to obtain code images. By comparing these images, significant differences between each corresponding image can be intuitively identified and structural and semantic differences can also be visually recognized in these images. Qiu et al. (2022) introduced the CNN-GSA method, which incorporates a GAL to obtain more effective classification information. Fan et al. (2024) proposed a stable learning-based software defect prediction method (SDP-SL), which combines code visualization techniques with residual networks. Although residual networks require multiple operations during training data sets, they fail to capture hierarchical structural information in software images (Chen et al., 2021).
In summary, for the study of visual SDP, it has been proved that the method of directly converting code into images for defect prediction is more effective than indirect conversion. However, the current model still has some problems such as low efficiency, and with the increase of the number of samples, the model cannot accurately locate defects. In this study, inspired by Yang et al. (2023), the traditional residual network is improved.
Cross-attention mechanisms
In image recognition networks, incorporating contextual information to enhance feature representation is a common approach, and attention mechanisms also play a crucial role in human perception (Meng et al., 2023b; Diao et al., 2023; Wen et al., 2022). In the realm of SDP, Li and Yao (2022) proposed a network detection method based on CBAM-ResNet and self-attention, which combines convolutional block attention module with residual attention networks to improve model performance. Diao et al. (2023) introduced a software defect prediction method defect prediction via mixed attention mechanism based on a hybrid attention mechanism. It extracts syntax-semantic sequences based on AST trees from program modules and performs word embedding encoding and positional encoding. Then, it autonomously learns context syntax-semantic information based on a multihead attention mechanism and uses a global attention mechanism to extract critical syntax-semantic features for constructing SDP models and identifying potential defective code modules.
However, these prediction methods have a significant drawback: whether it is a global attention mechanism or a self-attention mechanism, these neural network-based approaches need to generate large attention maps to measure relationships between each pair of pixels, resulting in high complexity in terms of time and space. Hence, Huang et al. (2019) proposed a mechanism for cross attention, using several sparse connection graphs to replace common single-dense connection graphs. This significantly reduces GPU memory usage and improves computational efficiency.
While extensive research has been conducted in the field of SDP to improve the accuracy of predictions, the effectiveness of SDP is also mainly affected by two key factors: the selection of classifiers, data quality and the modeling process, especially the version and number of model components, commonly referred to as feature selection (Mustaqeem et al., 2024; Bhandari et al., 2019; El Habib Daho et al., 2021). Therefore, feature selection is the top priority in the field of SDP, and the cross-attention mechanism can effectively solve the problem of efficiency and control the context more accurately, which is undoubtedly a good method for visual SDP in this paper.
Weighted activation function
In recent years, with deeper research into neural networks, the potential benefits of activation functions in enhancing network performance have become increasingly evident. The six most common activation functions on the market aim to help networks learn complex patterns in their data, with the most representative saturating activation functions being Sigmoid and Tanh. The most commonly used nonsaturating activation functions include Relu, Leaky Relu, Elu and Softplus.
For many years, bounded activation functions like Sigmoid and Tanh have been the preferred choice for neural networks, with researchers proving their effectiveness, especially in shallow network architectures (Fakhfakh and Chaari, 2024). However, despite their common usage, they are limited in effectiveness when training deep neural networks due to issues such as gradient vanishing.
To alleviate the problem of gradient vanishing, the success of Relu has inspired the development of many new activation functions, but it is not without drawbacks. Its non-differentiability at zero is a major concern, leading to the evolution of activation functions like Leaky Relu.
However, each activation function has its specific advantages and inherent limitations. Su and Huang (2023) proposed a method that combines activation functions with some weights between 0 and 1 to enhance the effectiveness of SDP. When dealing with a large number of classifications, a weighted combination method can be applied to combine the strengths of different types of activation function and create new models.
To sum up in visual SDP, addressing issues such as a large number of low-quality samples, and the less distinctiveness of RGB image information compared to other images when standard residual networks are faced with software feature images, led to the inspiration from Yang et al. (2023). To ensure the gradient descent process of the model and address problems like gradient vanishing, this paper proposes a Weighted Residuals with Sequential Recurrent (WRSR) network, incorporating a weighted activation function, to enhance the accuracy of model training.
Visual prediction often entails higher computational complexity and consumes more GPU memory. Moreover, during the conversion of programs into RGB images, due to the coherence of programs, program information is more intuitively manifested in the context of images, which existing attention mechanisms do not consider. Addressing this issue, this paper draws inspiration from the literature (Huang et al., 2019) and uses a method of weighted analysis of features using a recurrent cross attention mechanism to enhance its effectiveness.
3. Methodology
The overall framework of the proposed method is illustrated in Figure 1. It mainly consists of two phases:
visualization of the source code; and
establishment of the RCCA-WRSR model for the prediction of software defects.
In the framework based on the WRSR network, this paper introduces an SE module before the Se-Resnet model to suppress unimportant features and enhance significant features, replacing the activation function in the original feature extraction network. Additionally, a cross-attention module is introduced after feature extraction. The cross attention module collects contextual information in both horizontal and vertical directions to enhance the pixel representation capability of the image. To construct pixel connections that are not on the same cross-path, this paper uses a recurrent cross-attention module. Finally, the output results are fed back into the WRSR network for classification and defect prediction.
Code visualization
As shown in Figure 2, in this step, each code file is first converted into an image based on the file’s size, where the size of the image is determined by its width and height. The width varies according to the different file sizes, as specified in Table 1. This paper converts the letters or symbols in the code into corresponding ASCII decimal values (for example, the letter “A” can be converted to 65), then arranges them row by row and column by column, interpreting them as a matrix image to generate the vector. The method visualizes 8-bit values as grayscale values, where 0 represents black and 255 represents white. However, such generated images may not effectively highlight the key points. Hence, the grayscale values generated are grouped into sets of three decimal values, representing the values of the RGB three channels, for instance, “int” can be converted into [105, 110, 116], and then the above process is repeated to generate RGB images. If the generated RGB image exceeds the preset length and width, it is cropped, and if not, it is zeroed out.
In addition, since the size of the code files in our selected data set is generally between 0 and 100 KB, the conversion results in an image corresponding to a pixel count between 0 and 34,133 (100 × 1024 / 3). The size of the image is the width multiplied by the height, where the width is obtained according to different file sizes. We specify that the image width corresponding to a file smaller than 10 KB is 32, the image width corresponding to a file larger than 10 KB smaller than 30 KB is 64, and the image width corresponding to a file larger than 30 KB smaller than 60 KB is 128 and the image width corresponding to a file larger than 60 KB and less than 100 KB files corresponds to an image width of 256.
Criss-cross attention
As shown in Figure 3, this paper provides a local feature map H ∈ RC*W*H, this module first applies two convolutional layers with 1 × 1 filters on H to generate two feature maps Q and K, where {Q,K}∈ RC′*W*H, and C′’s channel number is less than C.
After obtaining Q and K, this paper further computes an attention map A ∈ RH + W-1)*(W*H). In the Q dimension, for each u, we obtain a vector Qu ∈ RC′. Simultaneously, by extracting features from the same row or column of K at position u, we can obtain a set Ωu ∈ R (H + W-1)*C′. Where Ωi,u ∈ RC′ represents the ith element of Ωu, formulated as follows: (1)
where: di,u ∈ D,D ∈ R(H + W-1)*(W*H)
Afterward, this paper applies softmax along the channel dimension D of A to compute the attention map. Another convolutional layer with a 1 × 1 filter is applied on H to generate Vu ∈ RC*W*H, which facilitates adaptive feature transformation. At each position u in the spatial dimension of V, this paper obtains a vector Vu ∈ RC and a set Φu ∈ R(H + W-1)*C. Where set Φu comprises features from V that are in the same row and column as position u, while aggregating contextual information is done through the following formula: (2) where: is the feature vector, Ai,u is the marker of channel i and position u.
This paper adds contextual information to the local feature H, and after a series of operations, finally the weighted feature map H′, where H′ can easily capture contextual information in the horizontal and vertical directions.
Weighted activation function
In this study, six commonly used activation functions were selected, including saturating functions such as Sigmoid and TanH, and nonsaturating functions including ReLU, Leaky ReLU, ELU and Softplus. Table 2 presents the characteristics of each selected activation function in this study.
Inspired by experiments (Su and Huang, 2023), a simple combination of the six selected activation functions was used by assigning some weights. For instance, when encountering pairwise combinations of Sigmoid and ReLU activation functions, the study adopts the following weighted activation function: (3)
Similarly, when encountering combinations of Sigmoid, TanH and Softplus, the study adopts the following weighted activation function: (4)
Recurrent criss-cross attention
As illustrated in Figure 4, two attention-based approaches are compared. In Figure 4(a), each module generates dense attention maps with N weights (depicted in blue) for every position (e.g. highlighted in yellow). Conversely, Figure 4(b) uses a two-stage cross-attention mechanism where sparse attention maps with approximately weights are generated after dual-cross attention modules for each position (e.g. highlighted in yellow). These maps ultimately aggregate weighted information from all pixels to produce feature maps, incorporating comprehensive pixel-wise information (e.g. highlighted in red).
To link a pixel with those in its surroundings that are not within the same criss-cross path, this paper introduces the recurrent criss-cross attention (RCCA) mechanism. The RCCA module can be expanded into R loops. In this study, we adopt a form with two loops (for example, R = 2). In the first loop, criss-cross attention takes the feature map H extracted from the WRSR model as input and outputs the feature map H′, where H and H′ have the same shape. In the second loop, the criss-cross attention takes the feature map H′ as input and outputs the feature map H″.
As shown in Figure 5, point α(αx, αy) is not within the criss-cross path of point β(βx, βy). After visualizing the information propagation in Figure 5, it is observed that position (αx, αy) (highlighted in yellow) first transmits information to (αx, βy) and (βx, αy) in loop 1 (highlighted in light blue), where f represents the information transmission function. Similarly, in loop 2, the same operation is performed, ultimately transmitting information to (βx, βy) (highlighted in blue). Even though α(αx, αy) and β(βx, βy) are not on the same criss-cross path, information can still be transmitted.
The final weighted feature map output is re-input into the WRSR network for classification and defect prediction. The RCCA module in this paper is equipped with two loops (R = 2), allowing it to capture the complete image contextual information of all pixels. This generates new features with dense and rich contextual information, addressing the limitations of criss-cross attention. Compared to criss-cross attention, the RCCA module does not introduce additional parameters and achieves better performance with minimal computational overhead.
Improved WRSR model construction of residual network
The structural diagram of the SE module used in this paper is shown in Figure 6. First, the input information P is mapped to V through transformation. Then, global average pooling is applied over channels to compress the feature map of size H*W*C, which contains global information, into a feature vector Z of size 1 * 1*C. The squeezing operation aggregates and maps the spatial dimensions of each channel, alleviating the problem of channel dependency, defined as follows: (5)
Following the activation layer, a two-layer fully connected gating mechanism is used, where embeddings serve as input, yielding modulation weights S for each channel. The dimensionality of S is 1 * 1 * C. Finally, the normalized weights are applied to each channel and multiplied element-wise with the corresponding channel’s two-dimensional matrix to obtain the output, as shown in the following formula: (6)
As shown in Figure 7, the proposed WRSR model in this paper, built upon the Se-Resnet model used in this study, incorporates an additional SE module before the model. This SE module, known for its redundancy suppression capability, refines the feature information, automatically eliminating low-weight features. More channels facilitate the retrieval of additional feature information, thus aiding further convolutional operations.
Additionally, due to the nonsmoothness of the ReLU function at zero and its exclusive generation of positive values, gradient descent in the model may not proceed smoothly and could potentially lead to local optima with information loss. Therefore, the ReLU activation function is replaced by a weighted activation function to enhance the model convergence.
The basic architecture of the WRSR model used in this paper is outlined as follows:
Input layer: Accepts images of size 224 * 224*3 (RGB images). Layer 1: Convolutional layer with 3 input channels, 64 output channels, a kernel size of 7 * 7, a stride of 2 and padding of 3. Layer 2: Max-pooling layer with a pooling kernel size of 3 * 3, a stride of 2 and padding of 1. Layer 3: Comprises four basic convolutional blocks, each consisting of two 3 * 3 convolutional layers and a skip connection. Both input and output channels are set to 64. The kernel size is 3 * 3, with a stride of 1 and padding of 1. Layer 4: Comprises four basic convolutional blocks, with 64 input channels and 128 output channels. The configurations are identical to those of Layer 3. Layer 5: Comprises four basic convolutional blocks, with 128 input channels and 256 output channels. The configurations are identical to those of Layer 3. Layer 6: Comprises four basic convolutional blocks, with 256 input channels and 512 output channels. The configurations are identical to those of Layer 3. Global Average Pooling Layer: Reduces the spatial dimensions of the feature map to 1 * 1 while retaining the channel dimensions, resulting in a final output of 1 * 1 * 512.
4. Data set description
To assess the effectiveness of RCCA-WRSR, experiments were conducted on 10 Java open-source projects obtained from the publicly available software database PROMISE. Table 3 provides detailed information on these data sets, including project names, versions, total number of files, average number of files, average file size and average defect rate. The average number of files across all projects ranges from 150 to 1,046, with average file sizes varying from 2.9 KB to 8.7 KB. These projects have been widely used in recent SDP studies. Two consecutive versions of each project were used for defect prediction, with the older version used for training and the newer version used for testing (e.g. camel-1.2 as the training set and camel-1.4 as the testing set; xalan-2.4 as the training set and xalan-2.5 as the testing set).
Performance index
To more accurately evaluate the model’s performance, this paper adopts the F-measure (Christen et al., 2023) metric. In SDP, the F-measure is widely used as it captures the accuracy of predictors. Combine precision and recall to provide a comprehensive evaluation metric for predictions.
In binary classification tasks of SDP, the model predicts whether unlabeled instances have defects, resulting in four prediction outcomes: instances predicted as having defects that have defects are true positives (TP); instances predicted as having defects but do not have defects are false positives (FP); instances predicted as not having defects but have defects are false negatives (FN); instances predicted as not having defects and do not have defects are true negatives (TN). Based on these outcomes, with precision denoted as P and recall denoted as R, the F-measure is calculated as: (7) (8) (9)
To compare the performance of RCCA-WRSR with other models in our experiments, we also applied the Scott Knott Generalized ESD test (ESD) (Qiu et al., 2024) to compare the performance of RCCA-WRSR. The Scott Knott ESD test used in this article is a means comparison method that uses hierarchical clustering to divide a set of measurements, such as F measurements, into different groups where there is no statistically significant difference. The mechanism of the Scott Knott ESD test consists of two steps:
Finding a partition that maximizes the average of measurements between groups.
Divide into two groups or merge into one group.
Comparison method
To evaluate the performance of RCCA-WRSR, the following two research questions (RQs) are investigated:
In this section, experiments are conducted to assess the performance of RCCA-WRSR and compare it with existing deep learning and visualization techniques in the field of SDP. The experiments are conducted on a laptop equipped with an i7-12600K processor and NVIDIA 3060 graphics card. Unless otherwise stated, each experiment is run ten times and the average results are computed.
For RQ1, this study refers to an experiment (Su and Huang, 2023) comparing ten combinations of six activation functions. Under the experiment model (Chen et al., 2020), five experiment of prediction of the project in the within-version are conducted to select a combination of activation functions for RQ2.
Regarding RQ2, this study compares five methods covering three different types of software defect prediction methods: DBN+, CNN, DP-CNN, DTLDP and CNN-GSA. DBN+, CNN and DP-CNN are prediction methods based on feature extraction from ASTs, while DTLDP is a defect prediction method based on feature extraction from code images, and CNN-GSA uses a global self-attention mechanism in CNNs. Sixteen experiments are conducted for comprehensive performance evaluation. The compared methods are as follows:
DBN+: A variant of Deep Belief Network (DBN) combining features generated by DBN with manually crafted features (Tao et al., 2024).
CNN: A defect prediction method using standard convolutional neural networks (CNN) for feature extraction (Tao et al., 2024).
DP-CNN: An improved version of CNN that progressively deepens the network to capture long-range textual dependencies (Tao et al., 2024).
DTLDP: A deep learning model using code visualization techniques for defect prediction (Chen et al., 2020).
CNN-GSA: Incorporating global attention mechanism into the original DTLDP model (Qiu et al., 2022).
In this study, the primary comparative method is RCCA-WRSR. For consistency in comparison, network parameter settings are maintained as per the actual implementation, as shown below:
Epoch = 1000.
Batch size = 64.
Initial learning rate (init_lr) = 0.0003.
Momentum setting = 0.9.
Weight decay = 0.0005.
γ = 0.0003.
Impact factor = 0.75.
The training process uses the stochastic gradient descent algorithm.
5. Results and discussion
Regarding RQ1: As shown in Table 4, this study compares the F-measure values of ten combinations of weighted activation functions based on the WRSR model. Due to the stochastic nature of deep learning-based methods, each experiment is carried out ten times and the average values are recorded. The highest F-measure values among the five within-version project groups are highlighted in bold in the table. From the table, it can be observed that the DW combination method outperforms single activation functions or TW combination methods. Additionally, the combination of Relu+Elu activation functions performs better than other combinations.
Concerning RQ2: As shown in Table 5, this study compares the F-measure values of RCCA-WRSR with those of DBN+, CNN, DP-CNN, DTLDP and CNN-GSA. Due to the stochastic nature of deep learning-based methods, each experiment is carried out ten times and the average values are recorded. The highest F-measure values among the 16 cross-version project pairs are highlighted in bold in the table. From the table, it can be seen that RCCA-WRSR outperforms other algorithms.
The W/T/L in the penultimate row of the table represents the number of wins, draws and losses after comparing the RCCA-WRSR results to the current column of the method. For example, the W/T/L of DBN+ is 16/0/0, which means that in 16 pairs of defect prediction tasks, the F metric of RCCA-WRSR exceeds deep adaptation network+ 16 times, flat 0 times and fails 0 times. For the W/T/L comparison, the best result for W/T/L was 16/0/0 and the worst result also was 16/0/0. Therefore, the results show that the features extracted by RCCA-WRSR are more helpful for software defect prediction, as well as the following.
Figure 8 displays the comparison results of Scott Knott ESD tests for the 16 cross-version project pairs. The blue diamond in the figure represents the mean value. It can be observed that RCCA-WRSR achieves the highest average F-measure measurement. Additionally, its median is higher than all other methods. Moreover, the gap between the upper and lower edges of the method proposed in this study is smaller than that of all other methods, indicating that the proposed method exhibits greater stability.
To compare the differential impacts of the RCCA and the Activation Functions of the SE-ResNet Network (WRSR) Module on the algorithm, this study conducted ablation experiments. An original Alexnet model for classification was established and then modified by integrating an RCCA mechanism and an Activation Function of the SE-ResNet Network Module, respectively. This paper compares 16 pairs of cross-version projects as shown in Table 6. It is observed that the addition of the RCCA Module and the WRSR Module improves the F-measure of predictions, with the incorporation of the WRSR Module significantly enhancing the data prediction.
6. Conclusion and future work
This paper presents an SDP method, RCCA-WRSR, based on cyclic cross-attention and improved residual networks. In this method, after code visualization, the activation functions of the SE-Resnet model are replaced with a weighted combination of Relu+Elu, and an SE module is added in front to generate the new model, WRSR. The visualized RGB images are then input into the model with fusion cyclic cross-attention modules for defect prediction. Experimental results demonstrate that the DW combination method outperforms single activation functions or TW combination methods. Moreover, the proposed RCCA-WRSR method achieves an average F-measure value of 0.637 in cross-version projects, surpassing other related defect prediction models. In the future, we will further explore visualization SDP techniques for small sample data and the application of more weighted activation function combinations in other SDP techniques.
This work was supported in part by the Capacity Building Project of Local Universities Science and Technology Commission of Shanghai Municipality No.22010504100; the Shanghai Rising-Star Program (Sailing Program), China, Grant No. 22YF1448100; the Development Fund for Young and Middle-aged Scientific and Technological Talents of Shanghai Institute of Technology under Grant No. ZQ2021-19; the Development of Science and Technology of Shanghai Institute of Technology under Grant Nos Kjfz2021-176, Kjfz2021-177.
Figure 1.The RCCA-WRSR model
Figure 2.RGB image conversion
Figure 3.Criss-cross attention module framework
Figure 4.Comparison of common attention module and cross-cross attention module
Figure 5.Example of information propagation with a cycle number of two
Figure 6.Criss-cross attention module framework
Figure 7.WRSR module framework
Figure 8.Scott Knott ESD
Table 1.
Image width
| File size range | Image width |
|---|---|
| <10 KB | 32 |
| 10–30 KB | 64 |
| 60–100 KB | 128 |
| 100–200 KB | 384 |
| 200–500 KB | 512 |
| 500–1000 KB | 768 |
| >1000 KB | 1024 |
Source: Authors’ own work
Table 2.
The properties of the six activation functions
| Activation function | Saturator | Centralization | Range |
|---|---|---|---|
| Elu | No | No | (−1, ∞) |
| Leaky Relu | No | No | (−∞, ∞) |
| Relu | No | No | [0, ∞] |
| Sigmoid | Yes | No | (0, 1) |
| Softplus | No | No | (0, ∞) |
| Tanh | Yes | Yes | [−1, 1] |
Source: Authors’ own work
Table 3.
Data set
| Project | Versions | #Files | Avg files | Avgsize(kb) | %Defective |
|---|---|---|---|---|---|
| Ant | 1.5, 1.6, 1.7 | 1465 | 488 | 6.2 | 13.4 |
| Camel | 1.2, 1.4, 1.6 | 3140 | 1046 | 2.9 | 18.7 |
| jEdit | 3.2, 4.0, 4.1 | 1935 | 645 | 8.7 | 19.2 |
| log4j | 1.0, 1.1 | 300 | 150 | 3.4 | 49.7 |
| Lucene | 2.0, 2.2, 2.4 | 607 | 402 | 3.8 | 35.8 |
| Xalan | 2.4, 2.5 | 1984 | 992 | 4.6 | 29.6 |
| Xerces | 1.2, 1.3 | 1647 | 549 | 2.9 | 15.7 |
| Ivy | 1.4, 2.0 | 622 | 311 | 4.1 | 20.0 |
| Synapse | 1.0, 1.1, 1.2 | 661 | 220 | 3.8 | 22.7 |
| Poi | 1.5, 2.5, 3.0 | 1248 | 416 | 3.6 | 40.7 |
Source: Authors’ own work
Table 4.
F-measure comparison of ten combinations of weighted activation functions
| Method of combination | ant1.5 → 1.6 | log41.0 → 1.1 | lucene2.0 → 2.2 | synapse1.1 → 1.2 | poi2.5 → 3.0 |
|---|---|---|---|---|---|
| Relu | 0.494 | 0.694 | 0.674 | 0.610 | 0.780 |
| Elu | 0.483 | 0.702 | 0.673 | 0.611 | 0.794 |
| Relu+Elu | 0.563 | 0.747 | 0.742 | 0.618 | 0.832 |
| Relu+Sigmoid | 0.476 | 0.671 | 0.614 | 0.596 | 0.780 |
| Relu+Softplus | 0.533 | 0.714 | 0.730 | 0.625 | 0.829 |
| Relu+Softmax | 0.527 | 0.708 | 0.716 | 0.614 | 0.786 |
| Sigmoid+Tanh | 0.465 | 0.653 | 0.651 | 0.599 | 0.755 |
| Relu+Lrelu+Softplus | 0.551 | 0.727 | 0.708 | 0.613 | 0.820 |
| Tanh+Relu+Softplus | 0.548 | 0.711 | 0.731 | 0.614 | 0.817 |
| Sigmoid+Tanh+Elu | 0.522 | 0.686 | 0.710 | 0.620 | 0.806 |
Source: Authors’ own work
Table 5.
Comparison of F-measure among 6 algorithms
| Project | Versions | DBN+ | CNN | DP-CNN | DTLDP | CNN-GSA | RCCA-WRSR |
|---|---|---|---|---|---|---|---|
| Ant | 1.5->1.6 | 0.200 | 0.361 | 0.446 | 0.494 | 0.542 | 0.593 |
| 1.6->1.7 | 0.432 | 0.453 | 0.466 | 0.496 | 0.527 | 0.578 | |
| Camel | 1.2->1.4 | 0.337 | 0.400 | 0.355 | 0.346 | 0.365 | 0.427 |
| 1.4->1.6 | 0.225 | 0.266 | 0.318 | 0.324 | 0.353 | 0.403 | |
| jEdit | 3.2->4.0 | 0.566 | 0.557 | 0.527 | 0.591 | 0.626 | 0.674 |
| 4.0->4.1 | 0.592 | 0.601 | 0.607 | 0.612 | 0.645 | 0.686 | |
| log4j | 1.0->1.1 | 0.637 | 0.693 | 0.703 | 0.694 | 0.747 | 0.825 |
| Lucene | 2.0->2.2 | 0.597 | 0.602 | 0.599 | 0.676 | 0.701 | 0.788 |
| 2.2->2.4 | 0.631 | 0.669 | 0.660 | 0.731 | 0.767 | 0.806 | |
| Xalan | 2.4->2.5 | 0.100 | 0.275 | 0.260 | 0.645 | 0.533 | 0.648 |
| Xerces | 1.2->1.3 | 0.158 | 0.177 | 0.195 | 0.252 | 0.324 | 0.390 |
| Ivy | 1.4->2.0 | 0.174 | 0.212 | 0.233 | 0.358 | 0.401 | 0.508 |
| Synapse | 1.0->1.1 | 0.213 | 0.167 | 0.259 | 0.440 | 0.501 | 0.552 |
| 1.1->1.2 | 0.298 | 0.405 | 0.500 | 0.610 | 0.612 | 0.644 | |
| Poi | 1.5->2.5 | 0.804 | 0.807 | 0.794 | 0.762 | 0.783 | 0.827 |
| 2.5->3.0 | 0.720 | 0.720 | 0.731 | 0.782 | 0.788 | 0.845 | |
| W/T/L | 16 / 0/0 | 16 / 0/0 | 16 / 0/0 | 16 / 0/0 | 16 / 0/0 | ||
| Average | 0.418 | 0.460 | 0.478 | 0.551 | 0.576 | 0.637 |
Source: Authors’ own work
Table 6.
Comparison of F-measure
| Project | Versions | Base | +RCCA | +WRSR | GAN-CSA |
|---|---|---|---|---|---|
| Ant | 1.5 → 1.6 | 0.498 | 0.567 | 0.575 | 0.593 |
| 1.6 → 1.7 | 0.501 | 0.563 | 0.561 | 0.578 | |
| Camel | 1.2 → 1.4 | 0.358 | 0.399 | 0.421 | 0.427 |
| 1.4 → 1.6 | 0.343 | 0.360 | 0.373 | 0.403 | |
| jEdit | 3.2 → 4.0 | 0.608 | 0.624 | 0.645 | 0.674 |
| 4.0 → 4.1 | 0.611 | 0.628 | 0.664 | 0.686 | |
| log4j | 1.0 → 1.1 | 0.704 | 0.712 | 0.723 | 0.825 |
| Lucene | 2.0 → 2.2 | 0.687 | 0.697 | 0.710 | 0.788 |
| 2.2 → 2.4 | 0.692 | 0.708 | 0.714 | 0.806 | |
| Xalan | 2.4 → 2.5 | 0.581 | 0.524 | 0.533 | 0.648 |
| Xerces | 1.2 → 1.3 | 0.288 | 0.339 | 0.339 | 0.390 |
| Ivy | 1.4 → 2.0 | 0.376 | 0.471 | 0.480 | 0.508 |
| Synapse | 1.0 → 1.1 | 0.507 | 0.511 | 0.526 | 0.552 |
| 1.1 → 1.2 | 0.596 | 0.607 | 0.621 | 0.644 | |
| Poi | 1.5 → 2.5 | 0.768 | 0.814 | 0.825 | 0.827 |
| 2.5 → 3.0 | 0.782 | 0.811 | 0.825 | 0.845 | |
| W/T/L | 16 / 0/0 | 16 / 0/0 | 16 / 0/0 | ||
| Average | 0.556 | 0.583 | 0.596 | 0.637 |
Source: Authors’ own work
© Emerald Publishing Limited.
