1. Introduction
Under the von Neumann architecture, software vulnerabilities are inevitable. As software scales grow larger and functional modules become increasingly complex, vulnerabilities continue to emerge. The rise of various open-source communities has also made code sharing more common. During code replication, developers may inadvertently introduce vulnerabilities, making code reuse a significant cause of software vulnerabilities [1]. According to the National Vulnerability Database (NVD), over 18,000 security vulnerabilities were disclosed in 2021, with more than 3000 classified as high-risk [2]. Conducting vulnerability detection on source code can help security professionals identify software vulnerabilities early and ensure system security.
Deep learning-based vulnerability detection methods are generally categorized into sequence-based and graph-based approaches [3]. Sequence-based models are adept at learning code semantics, but they can only capture the surface structure of source code text, making it difficult to extract structural and semantic features of the source code [4]. As shown in [5], existing LSTM-based methods often suffer from poor accuracy. Although GNN-based techniques can capture global structures and deep semantic information and are more effective than text-based approaches [6,7], they still have three major issues: 1.. Different forms of code representation in these methods retain only partial information (syntax or semantics) [8], making it difficult to integrate the contextual semantic information of the original code [3]. 2.. These methods often represent a function as a single graph, where each node corresponds to a statement, neglecting fine-grained information within the statements [4]. 3.. Directly feeding raw source code, which contains a large amount of redundant code, into a graph neural network significantly increases the training time [9].
Additionally, while deep learning-based static analysis techniques have proven effective in detecting vulnerabilities in programming languages, especially mainstream ones like C/C++, Java, and PHP, relatively few studies have specifically focused on the unique characteristics of the Go language. Existing Go program detection methods, such as GCatch, GFix [10], and GFUZZ [11], primarily focus on detecting concurrency errors and overlook other types of vulnerabilities in Go programs, such as SQL injection, cross-site scripting (XSS), and insecure file uploads.
To address these challenges, we propose GoVulDect, a hybrid semantic-based Go source code vulnerability detection model that comprehensively captures Go source code information for vulnerability detection. We use graph random walk networks [12] to extract deep semantic and structural information of each concurrent vulnerability in Go source code. Additionally, we introduce lexical analysis on top of taint analysis and leverage a Transformer model with a multi-head attention mechanism to learn the contextual semantic information of various types of vulnerabilities in Go source code. Finally, we concatenate information from both dimensions and use an XGBoost classifier [13] for classification and detection, minimizing feature omission and enabling the detection of various complex types of vulnerabilities in Go programs. To reduce time overhead, we employ program pruning techniques during preprocessing to extract essential code segments and adopt a synchronized feature extraction strategy.
Overall, this paper makes the following contributions:
(1) Go Program Pruning Method for Structural Integrity: We propose a Go program pruning method that ensures structural integrity. It pre-filters code lines closely related to Go vulnerabilities and retains original structural relationships using differential comparison techniques, enabling a more comprehensive extraction of vulnerability context information.
(2) Hybrid Semantic-Based Graph Neural Network Vulnerability Detection Framework: We propose a hybrid semantic-based GNN Go source code vulnerability detection framework. The system employs graph random walk networks and Transformer models with multi-head attention mechanisms to extract both graph-level and token-level features of Go source code. The integration of both dimensions enhances detection effectiveness.
(3) Validation on Real-World Vulnerabilities: To validate the effectiveness of our method, we conducted comprehensive experiments on a real-world dataset. We selected one vulnerability detection tool and three GNN-based detection models (RATS, CSGVD [14], VDoTR [15], and AMPLE [16]) for comparison. Experimental results show that our method achieves an F1-score of 91.35% for Go source code vulnerability detection.
The rest of the paper is organized as follows: Section 2 reviews the related works while Section 3 shows preliminaries. Section 4 represents the methodology, and Section 5 discusses the results. Finally, the conclusions and future work directions are summarized in Section 6.
2. Related Work
2.1. Go Program Vulnerability Detection
As a mainstream programming language, memory corruption attacks targeting C/C++ have become a major threat to computer systems. The recently developed Go programming language is designed to prevent such attacks through its robust static type system, appropriate compiler optimizations, and runtime boundary checks [17]. In fact, Go is considered one of the best languages for developing secure systems [18] and has been progressively deployed in many popular applications and codebases. While Go prevents memory corruption and includes garbage collection to provide temporary security, it is still prone to vulnerabilities when interacting with other languages. For instance, when Go interacts with C’s memory management, it may introduce use-after-gc errors and more complex double free errors. Additionally, Go is vulnerable to a new type of supply chain attack targeting source code, known as the Trojan Source attack [1].
To address Go program vulnerabilities, researchers have been continuously improving Go source code vulnerability detection methods to enhance program security. Traditional source code vulnerability detection approaches can be categorized into rule-based program analysis [19] and pattern-based machine learning [20]. (1) Rule-based methods are inspired by traditional error detection techniques [21]. (2) Pattern-based methods employ conventional machine learning techniques to automatically learn vulnerability patterns from previously collected training samples [8].
However, both approaches heavily rely on the expertise of developers and security professionals, often resulting in high false-positive and false-negative rates. Additionally, they struggle to detect unknown vulnerabilities and require extensive manual verification [14]. Traditional Go vulnerability detection methods primarily depend on manual review and automated static analysis tools. However, these approaches typically demand substantial human effort and time while being limited to detecting concurrency-related errors, making it difficult to identify complex and subtle vulnerabilities. In fact, an analysis of the dataset we collected shows that about 98.25% of Go vulnerabilities are not related to concurrency, underscoring the need for more comprehensive detection approaches.
Deep learning (DL) has recently been introduced into the field of vulnerability detection due to its ability to process large amounts of software code and vulnerability data [22,23,24,25]. DL models automatically capture the structural representation of programs from training samples and use this information for detection [26,27]. In vulnerability detection, deep learning-based approaches can be divided into two main categories: (1) Sequence-based methods [23,26,27]: These approaches represent source code or its structural features as lexical token sequences [28] and apply natural language processing (NLP) techniques [26] to detect vulnerabilities by learning sequential features. (2) Graph-based methods [5,22,25,29]: These approaches transform source code into heterogeneous graph structures and utilize graph neural networks (GNNs) to capture local structures and dependencies [24].
In recent years, deep learning-based vulnerability detection techniques have made some progress, but relatively few models have been specifically designed for Go programs. We propose representing Go source code as graphs (e.g., abstract syntax trees (ASTs), control flow graphs (CFGs), etc.) and analyzing them using GNNs to extract Go program features for vulnerability detection. However, solely using GNN-based models has drawbacks—it can capture global structures and deep semantic information but often fails to retain contextual information.
2.2. Taint Analysis-Based Vulnerability Detection
Taint analysis is a program analysis technique used to track the flow of sensitive data [30]. It has been widely applied in fields such as vulnerability detection [31], cryptographic key misuse detection [32], and privacy leakage detection [33]. Additionally, taint analysis has been used to analyze various programming languages, including Java [34] and C [35], frameworks such as Android [36] and iOS, and microservices [37]. In vulnerability detection, research on taint analysis is mainly divided into static taint analysis and dynamic taint analysis.
In the field of static taint analysis, PATA [38] introduced a path-aware taint analysis model capable of accurately identifying repeatedly occurring variables based on execution path information. Fluffy [39] proposed a bi-modal taint analysis method that allows machine learning models to predict whether a taint flow is expected or unexpected based on the natural language information embedded within it. LATTE [40] combined large language models with static binary taint analysis, making it more cost-effective for vulnerability detection. However, these approaches have certain limitations. PATA integrates dynamic taint analysis, leading to significant performance overhead, and its complex path constraints hinder the effectiveness of fuzz testing. Fluffy is limited to JavaScript and relies on manually labeled evaluation data, which is prone to human error. LATTE struggles to analyze complex nested or jump-based code fragments, especially when public information about such vulnerabilities is lacking, making it difficult for large language models to analyze them effectively.
In the field of dynamic taint analysis, Spectre [41] applies dynamic taint analysis at the system level to detect vulnerability fragments associated with Spectre-type attacks. Another approach [42] introduced an efficient container tagging scheme based on a simplified ordered binary decision diagram, accelerating container tag execution efficiency in areas such as protocol reverse engineering and fuzz testing. AirTaint [43] integrates basic block-level taint abstraction with assembly-level instrumentation, enabling faster and more efficient high-level dynamic taint analysis. However, these dynamic taint analysis techniques generally come with high performance overhead.
Compared with the above-mentioned approaches, our proposed taint analysis strategy introduces several innovations and distinctions at the application level. Existing methods such as PATA, Fluffy, and LATTE rely on techniques like path constraint modeling, natural language inference, and large language model reasoning. However, these approaches are not well-suited to the structural characteristics of Go programs, suffering from high performance overhead, limited domain adaptability, and heavy reliance on manual effort. In contrast, our method is specifically tailored for Go language programs, integrating program structure pruning and syntax-based analysis. By leveraging Go’s explicit API calls and well-defined static structure, we optimize the taint analysis process for this context.
3. Preliminaries
A well-designed preprocessing step can significantly reduce training overhead, improve detection efficiency, and enhance accuracy. To maximize the retention of vulnerability-related information during slicing and enable comprehensive vulnerability mining, we adopt a Go program pruning method that ensures structural integrity. This method allows us to extract highly relevant vulnerability-related code while preserving the original structural relationships of the code.
3.1. Preprocessing
Go source code contains extensive semantic information, which cannot be fully captured by simply using graph structures. Additionally, large projects often consist of thousands of lines of code, whereas vulnerabilities are usually concentrated within just a few lines. Therefore, we first perform preprocessing operations on Go source code to reduce the amount of code and thus reduce the training overhead.
(1) Filter Go files and remove redundant code: We retain only files with the .go extension from the project since large projects often contain multiple programming languages. Our model focuses solely on detecting vulnerabilities in Go code. We also remove comments, test files, and import statements, as they are unrelated to actual vulnerability detection and only increase the code volume.
(2) Extract each API function and construct an AST: Most security vulnerabilities are related to function calls, so extracting each function is essential for comprehensive vulnerability analysis. The Go language provides the and packages, which allow us to parse Go source code into an abstract syntax tree (AST). By traversing the AST, we can accurately locate each function call.
3.2. Graph-Level Features
To comprehensively extract graph-level features and global structures, we apply the following operations during the preprocessing stage before extracting graph-level features.
(1) Identify potential concurrent functions and generate slicing sequences: Due to Go’s concurrency features, vulnerabilities in Go programs are often concurrency-related. We perform precise control flow and data flow analysis on the AST to determine control dependencies and data dependencies, allowing us to identify concurrency patterns in the program. These slicing sequences help capture execution paths while significantly reducing code size.
(2) Generate sliced code based on slicing sequences: The Go language provides interfaces and type systems, which facilitate module interaction. However, existing slicing methods often disrupt source code structure, altering semantic dependencies and affecting vulnerability detection accuracy. To address this, we design a Go program pruning method that ensures structural integrity. This method performs differential analysis (diff operation) between the slice and the original source code, supplementing the sliced code structure based on the source code structure. As a result, the pruned code maintains semantic completeness.
(3) Standardize variable naming: To improve the accuracy and consistency of program analysis, we perform standardized renaming of user-defined variables and functions. This step facilitates better comprehension and modeling of source code, significantly reduces the token count, and enhances the efficiency of neural network training, as well as the precision of automated vulnerability detection tools. Specifically, we adopt a one-to-one mapping strategy by replacing user-defined variables with symbolic names (e.g., “VAR1”, “VAR2”) and renaming functions similarly (e.g., “FUN1”, “FUN2”). Variable renaming is performed with respect to lexical scope and lifetime information to ensure that variables with the same name in different scopes are not mistakenly conflated, thereby avoiding semantic ambiguity. Moreover, given that Go supports closures, goroutines, and cross-file function calls, we carefully model inter-procedural variables. We track the propagation and scope transitions of such variables across functions to ensure their original semantic context is maintained throughout the abstraction and renaming process.
(4) Complete sliced code structures to ensure independent execution: For Go programs, we need to restore goroutines, their corresponding channels, and deferred function calls to maintain logical consistency. Algorithm 1 illustrates the process, where the SliceCode S is derived through Steps 1, 2, and 3.
Algorithm 1: Go graph-level feature sliced code completion algorithm. |
3.3. Semantic Features
Although we preserve rich semantic information during graph-level feature processing, some contextual information loss is inevitable during slicing. This loss may hinder the detection of certain vulnerabilities with complex trigger conditions. To address this, we extract token sequences to capture the contextual semantic features of vulnerable code. Taint analysis enables us to track the flow of tainted data, so we leverage taint analysis as the preprocessing step for token-level feature extraction.
After constructing the AST, we start from defined taint sources, analyze the AST, and trace taint propagation along the data flow graph. This allows us to identify and extract taint propagation chains, which include all key code lines involved in taint propagation. We maintain a queue to record tainted variables and, finally, arrange tainted code lines in sequence, standardize variable names, and complete sliced code structures.
4. Methodology
4.1. System Framework
Existing detection models and methods for Go source code fail to comprehensively extract lexical, syntactic, and semantic features, limiting their ability to identify various types of vulnerabilities. To address this issue, we propose a hybrid semantic-based graph neural network vulnerability detection method for Go programs, named GoVulDect.
Figure 1 illustrates the detailed design of GoVulDect, which consists of three main modules: (1) Graph-Level Feature Extraction Module: This module represents potentially concurrent Go functions as code property graphs (CPGs) and utilizes GraphSAGE, a graph neural network based on random walks, to extract graph-level features that incorporate multiple types of semantic information. (2) Token-Level Feature Extraction Module: This module extracts token sequences using taint analysis and SpanBERT, a pre-trained model, to embed them into vectors. A Transformer model with multi-head attention is then employed to extract fine-grained token-level features. (3) Detection Module: This module fuses the source code features extracted by the previous two modules. Specifically, the graph-level features obtained via GraphSAGE are concatenated with the token-level features extracted by a Transformer-based model, forming a comprehensive representation of the source code. This fused feature vector is then passed to a pre-trained XGBoost classifier [13] to detect vulnerabilities in Go source code. XGBoost, as a scalable end-to-end tree boosting system, allows us to efficiently handle large-scale imbalanced data.
In contrast to existing hybrid detection approaches such as HyVulDect, which are primarily designed for traditional programming languages like C/C++, GoVulDect is specifically tailored to the concurrency semantics and structural characteristics of the Go language. During preprocessing, we enhance the static modeling of goroutines and channels to more accurately reconstruct potential concurrent execution paths. For feature extraction, GoVulDect incorporates a pretrained SpanBERT model with a span boundary objective (SBO) to generate semantically rich token representations. In addition, GraphSAGE is employed to capture global contextual features from the structural graphs of source code, while the Transformer further models deep semantic dependencies over long token distances. These design choices significantly improve the model’s capacity to represent and detect vulnerability patterns in Go source code. Detailed explanations of each component are provided in the following sections.
4.2. Graph-Level Feature Extraction
(1) Code Representation. Vulnerabilities often arise from improper function calls and parameter references. Although normal and vulnerable code may differ by only a few lines, control flow and data flow dependencies reveal clear distinctions. To comprehensively extract these dependencies, we transform preprocessed Go source code into a code property graph (CPG). The code property graph (CPG) is a unified graph representation that integrates abstract syntax trees (ASTs), control flow graphs (CFGs), and data flow graphs (DFGs) to comprehensively model both syntactic and semantic aspects of programs. In a CPG, nodes represent key program entities such as functions, variables, operators, and control structures. Each node is enriched with type information, scope, and source code location. Edges capture relationships between these entities, including call edges, data flow dependencies, and control flow dependencies.
First, we use Go’s official standard packages and to parse the source code and generate the corresponding abstract syntax tree (AST). Then, we perform type checking and static semantic analysis on the AST using the package to resolve the types of variables, functions, and expressions. Based on the type-annotated AST, we utilize the package to convert the code into static single sssignment (SSA) intermediate representation. As the intermediate representation adopted by the Go compiler, SSA simplifies control and data flow analysis, enhances concurrency-related analysis, and significantly improves the compiler’s optimization capabilities. The control flow graphs (CFGs) and data flow graphs (DFGs) constructed based on SSA are further used to model control and data dependencies within the program.
Although the initial CPG representation contains static structural information, control flow, and data flow semantics, further optimization is necessary. We remove redundant nodes and prune unnecessary parts that do not affect analysis results. Figure 2 shows an optimized CPG example derived from the Go code in Figure 3.
Finally, to convert Go programs into semantic vector representations suitable for neural network input, we employed a pretrained SpanBERT model to embed nodes in the source code, generating high-dimensional semantic vectors for each code fragment. In this study, we observed that most nodes had effective feature lengths no greater than 20 after vectorizing the nodes and edges in the graph. Therefore, to ensure both representational completeness and consistency in vector dimensions, we set the feature vector length to 20. A detailed introduction to SpanBERT is provided in the next section, while this section focuses on the GraphSAGE model.
(2) Graph-Level Feature Extraction. Treating code solely as text overlooks critical control dependencies and data dependencies. To extract deep semantic information more effectively from Go source code, we construct a detection model based on the GraphSAGE network, which employs random walk sampling on graphs.
GraphSAGE is particularly well-suited for handling large-scale and complex projects because it learns node embeddings by iteratively sampling a fixed number of neighboring nodes rather than requiring access to every node during training. This makes it capable of efficiently learning from graphs of varying sizes and structures. Additionally, when aggregating features, GraphSAGE allows the model to select different aggregation functions based on specific tasks. This approach enhances the understanding of local graph structure features, efficiently captures control and data dependencies, and provides a novel method for obtaining global graph features and contextual information, potentially improving classification performance. The overall framework of GraphSAGE is illustrated in Figure 4.
At each iteration k, we randomly sample a fixed-size neighborhood for each node v. Then, we apply a mean aggregation function to merge the feature vectors of the neighboring nodes and update the representation of node v. The mean aggregation function is defined as follows:
(1)
where denotes the aggregated feature vector of node neighborhood at the k-th layer. Specifically, we average the feature vectors of all sampled neighbors to obtain a representation of the local neighborhood of node v at the current iteration.Subsequently, node v updates its own representation by incorporating the aggregated neighborhood feature vector , using the following update function:
(2)
where represents the weight matrix at the k-th layer, is the nonlinear activation function, and is the learnable weight matrix. The operator denotes concatenation, which combines the current features of node v with the aggregated features from its neighbors.To prevent excessive feature scale growth and maintain consistency, we normalize each node’s feature vector after every update. Specifically, we apply L2 normalization as follows:
(3)
The final graph representation integrates the entire graph’s structural and feature information without relying on expensive matrix operations or requiring storage of the complete graph structure. Therefore, in practical applications, GraphSAGE efficiently and accurately learns to distinguish the graph patterns of vulnerable and benign Go code, enabling effective detection of potential security vulnerabilities in Go source code.
4.3. Semantic Feature Extraction
(1) Code Representation. To extract contextual semantic features, we first apply lexical analysis to convert the preprocessed Go source code into a sequence of tokens, providing a more fine-grained representation of the source code, as illustrated in Figure 5. We then utilize SpanBERT to embed the tokens extracted from the code slices.
SpanBERT is an extension and optimization of the BERT model specifically designed to enhance the modeling of span-level semantic information in text, making it particularly suitable for extracting semantic features from complex program code. The model learns relationships between different words in the text and maps each word to a high-dimensional vector that captures both its semantic meaning and contextual information. Words with similar meanings are located closer together in the embedding space. As a self-supervised pre-trained model, SpanBERT introduces a novel span boundary objective (SBO), which strengthens the model’s ability to represent span boundaries. This is especially beneficial for identifying structural elements in code, such as function calls and control blocks, which are critical for understanding code semantics. The inclusion of SBO also enables more efficient access to span-level information during fine-tuning, allowing for more comprehensive extraction of both local and global structural features in Go source code. Figure 6 illustrates how SpanBERT extracts and represents features from Go code.
Formally, a line of code is decomposed into a token sequence , where each token represents a lexical unit. For each token , we integrate positional embeddings and segment embeddings to obtain the final input representation of Go code. The position embedding P helps the model understand structural elements such as loops and conditional statements, while the segment embedding S enhances the model’s ability to distinguish different code blocks and maintain strong contextual awareness in complex programs:
(4)
Next, the SpanBERT model learns code representations through two pre-training tasks: masked language modeling (MLM) and span boundary objectives (SBOs). After pre-training, the model produces deep bidirectional representations. During forward propagation, the model generates a series of hidden layer states . Notably, the Transformer model integrates a self-attention mechanism, allowing it to capture long-range dependencies between tokens.
(5)
Ultimately, the final layer output representation of the code line is denoted as . After passing through L layers of Transformer networks, it aggregates the contextual semantic information of the entire Go code slice. The same process is applied to other preprocessed code slices, as described in Section 3.3.
(2) Semantic Feature Extraction. While various neural network architectures are available for natural language processing (NLP), many suffer from the vanishing gradient (VG) problem, which can lead to ineffective training.To better capture the contextual semantic information in taint propagation chains, we take the final representation of the token sequence X embedded by SpanBERT. We use multiple vectors as input, and employ a Transformer model with a multi-head self-attention mechanism to extract contextual semantic features from the code slice.
The Transformer model’s self-attention mechanism addresses the challenge of long-range dependencies in code while mitigating the vanishing gradient and exploding gradient problems. By attending to different parts of the token sequence simultaneously, the model can effectively learn complex relationships between different code components, improving vulnerability detection accuracy.
5. Experiments and Results
5.1. Dataset
We collect a real-world Go program vulnerability dataset [44] from two sources: the GitHub Security Advisory Database and open-source projects on GitHub. The GitHub Security Advisory Database contains source code vulnerabilities associated with CWE identifiers. For open-source projects, we focus on high-star repositories, from which we extract CVE vulnerability files, patch files, and diff files based on commit information.
In the raw vulnerability dataset, we collected a total of 630 CWE-labeled vulnerabilities and, after preprocessing, segmented them into 129,978 Go code snippets, with an equal number of positive and negative samples (64,989 each). During data splitting, we strictly followed a project-level division strategy to ensure that code from the same GitHub project does not appear in both the training and test sets. The final dataset was divided into training, validation, and test sets in an 8:1:1 ratio. Table 1 shows examples of selected CWE vulnerability samples in our dataset.
5.2. Experimental Setup and Evaluation Metrics
(1) Experimental Setup. We conducted experiments using a machine equipped with an NVIDIA RTX 2080TI GPU and an Intel(R) i9 CPU with 128 GB RAM. The complete experimental environment is detailed in Table 2.
(2) Model Hyperparameter Settings. For neural networks, hyperparameter selection is critical as it directly impacts model performance, training speed, and generalization ability. Since hyperparameters are set before training and cannot be adjusted automatically, careful selection and tuning are necessary. Table 3 lists the hyperparameters used in our GraphSAGE model. To determine appropriate values, we performed a random search over the hyperparameter space and selected the configuration that yielded the best validation performance across multiple trials. In addition, XGBoost was used with its default parameters: 100 trees, a maximum depth of 6, a learning rate of 0.3, no early stopping, and a
(3) Evaluation Metrics. We use four widely adopted evaluation metrics to comprehensively measure the vulnerability detection capability of our model: , , , .
These metrics are calculated using true positives (), false positives (), true negatives (), and false negatives ().
To further assess classification performance, we also employ ROC curves and AUC values. Since AUC is insensitive to class distribution, it serves as a robust metric for evaluating model performance. An AUC score closer to 1 indicates better classification performance.
5.3. Results and Analysis
(1) Model Classification Performance. To comprehensively evaluate the classification performance and generalization ability of the GoVulDect model, we analyzed the trends of training loss and validation accuracy during the training and validation phases, as shown in Figure 7. The training loss continuously decreased and stabilized around 0.1 after approximately 500 steps, indicating convergence. Meanwhile, the validation accuracy steadily increased and eventually plateaued at around 95%, without any noticeable fluctuation or decline. The close alignment between the training loss and validation accuracy suggests that the model effectively converged without signs of overfitting, demonstrating strong stability and generalization performance.
To further assess the binary classification capability of the model, we tested it across various sample categories. As presented in Table 4, the GoVulDect model achieves precision, recall, and F1-scores above 94% for both benign and vulnerable samples, with an overall accuracy of 95%. These results indicate that the model demonstrates strong classification performance and is capable of effectively distinguishing between vulnerable and non-vulnerable code. Moreover, the close values of the “Macro AVG” and “Weighted AVG” metrics suggest that the model performs consistently across classes, without exhibiting significant bias toward any particular category.
To verify whether the model can accurately identify different types of vulnerabilities, we generate a multi-class confusion matrix, as shown in Figure 8. Notably, the model achieves a detection accuracy of over 81% for CWE-79 and CWE-200, demonstrating that GoVulDect exhibits strong classification capability and high accuracy when handling various types of vulnerabilities.
(2) Effectiveness of Feature Fusion. To validate the effectiveness of our feature fusion approach, we visualized the distribution of token-level features (bottom), graph-level features (middle), and their fused features (top) in both 2D (left) and 3D (right) spaces, as illustrated in Figure 9. The results clearly show that the fused features (top) exhibit the best separability, indicating that feature fusion significantly improves the model’s performance.
To further confirm the effectiveness of feature fusion, we conducted an ablation study, comparing models using only graph-level features (GoVulDect-Graph), only token-level features (GoVulDect-Tokens), and our full GoVulDect model. Table 5 presents the results.
The results indicate that GoVulDect-Graph achieves a precision of 90.58%, slightly outperforming GoVulDect-Tokens (88.78%). However, the full GoVulDect model, which integrates both graph and token-level features, achieves a significantly higher precision of 94.77%. This demonstrates that feature fusion effectively improves vulnerability detection, as it captures both structural information and contextual semantics of vulnerable code.
(3) Comparative Experiments. The code property graph (CPG) integrates multiple semantic representations, making it a powerful method for representing source code. To verify the advantage of CPG in vulnerability detection, we compared the ROC curves of the GraphSAGE model when applied to different code representations, as shown in Figure 10.
The results indicate that the CPG-based model achieves the highest AUC score of 0.984, demonstrating that CPG captures a more comprehensive and rich set of graph structural features. Additionally, we observe that the AUC scores of the control flow graph (CFG) and control dependence graph (CDG) are lower than those of the data dependence graph (DDG). This suggests that data flow plays a more significant role in vulnerability detection compared to control flow.
To achieve optimal classification performance, we compared several widely used classification models, including traditional machine learning algorithms (e.g., k-nearest neighbors (KNN), support vector machine (SVM), random forest (RF)), neural network-based models (e.g., multi-layer erceptron (MLP), bi-directional LSTM), and ensemble learning methods such as XGBoost. These classifiers were selected to represent a diverse set of learning paradigms and are commonly used in software vulnerability detection tasks as baseline models. Although some classifiers such as KNN and SVM may be considered less advanced in modern deep learning research, we include them to ensure a complete and comprehensive evaluation of the effectiveness of our proposed graph-based feature representation. It is worth noting that SVM, especially when using the RBF kernel, suffers from scalability issues due to its quadratic complexity with respect to the number of training samples [45]. In our study, SVM is used solely as a baseline model for comparison purposes. For practical deployment and large-scale performance, more scalable classifiers such as XGBoost and BiLSTM are preferred.
As shown in Table 6, XGBoost achieves the highest performance across all evaluation metrics, with an accuracy exceeding 94%. This demonstrates that XGBoost provides the most effective classification results in our vulnerability detection task. Furthermore, the consistently strong performance of our model across different classifiers also confirms the robustness and general applicability of the learned graph features.
To comprehensively evaluate GoVulDect’s vulnerability detection performance, we compared it with one commonly used detection tool and three state-of-the-art detection models. The results are summarized in Table 7 and Table 8.
We observe the following key findings: (1) RATS, a multi-language static vulnerability analysis tool, achieves the lowest performance across all metrics. This is because RATS relies solely on expert-defined vulnerability patterns, which leads to high false negatives due to the lack of adaptability to novel vulnerabilities. (2) CSGVD, a code semantic graph-based vulnerability detection model, outperforms RATS, but its performance remains the lowest among the four learning-based models. This is because CSGVD primarily captures shallow sequential local semantic features, limiting its effectiveness. (3) VDoTR introduces circular gated graph neural networks (CircleGGNNs) to embed node feature vectors and employs 1D convolutional layers for vulnerability classification. By capturing richer structural information, VDoTR outperforms CSGVD across all evaluation metrics. (4) AMPLE incorporates edge-aware graph convolutional networks to aggregate heterogeneous edge information into node representations. To handle long-range dependencies among distant nodes, AMPLE employs kernel-scaled representation techniques, significantly enhancing its ability to analyze complex code structures. As a result, AMPLE outperforms VDoTR in all aspects. (5) GoVulDect achieves over 91% in all detection metrics, surpassing all existing methods and tools. This demonstrates the effectiveness of our model in Go vulnerability detection and highlights the significance of our contributions.
To provide a more comprehensive evaluation, we also compare the training and detection time of GoVulDect with the models listed above. The results are summarized in Table 9.
The key observations are as follows: (1) GoVulDect achieves the shortest training and detection times among all learning-based models. This performance gain is mainly attributed to three key aspects. First, during preprocessing, we effectively eliminate redundant and non-essential code. Second, we apply program slicing techniques to accurately extract code segments that are highly relevant to vulnerability detection. Together, these two steps significantly reduce the volume of code that needs to be processed, thereby lowering computational overhead. Finally, we design a parallel architecture that enables the synchronized extraction of graph-level and token-level features, further improving the overall efficiency of both training and inference. (2) RATS, as a static vulnerability analysis tool, does not require training and achieves the shortest detection time. However, its accuracy is significantly lower than learning-based models. (3) CSGVD does not preprocess raw source code, and its training process requires separate training of the PE-BL module before node embedding, resulting in higher time consumption. (4) VDoTR requires the longest training time because it captures global and complex graph structures during training. (5) AMPLE applies graph simplification techniques, reducing the number of graph nodes, which significantly decreases both training and detection time.
6. Conclusions
At present, relatively few deep learning-based detection models have been specifically designed for Go language vulnerability detection. Most existing Go vulnerability detection tools focus only on concurrency errors and do not address other types of vulnerabilities. Therefore, we propose GoVulDect, a fine-grained, hybrid semantic-based graph neural network system for Go source code vulnerability detection.
First, we extract each function and generate slices, then represent them as code property graphs (CPGs) and use GraphSAGE to extract graph-level structural features. Although we strive to retain as much graph structure information as possible, some local and context-based semantic information loss is inevitable. To address this, we apply taint analysis to extract vulnerability slices and capture fine-grained token-level features. Finally, we use XGBoost to classify the fused features, enabling vulnerability detection. The fused features not only incorporate global control dependencies, data dependencies, and other semantic information of the source code but also preserve contextual semantics and local details of vulnerable code. Experiments on CVE vulnerability datasets demonstrate that GoVulDect achieves an F1-score of over 91%, significantly outperforming all existing vulnerability detection tools and models.
For future work, we plan to address several important limitations and explore further improvements. (1) Enhancing semantic feature representation: The current token-level semantic extraction module has limited capacity in representing complex semantics. With the rapid advancement of large language models (LLMs), we aim to adopt more powerful pretrained models to improve semantic understanding and representation. (2) Improving feature fusion between graph and semantic views: At present, graph and token-level features are fused via simple concatenation, which may underutilize their complementary nature. We plan to explore more integrated fusion strategies, such as co-training frameworks inspired by GraphCodeBERT, to enable joint learning and deeper feature interaction. (3) Handling multi-label vulnerability attribution: In real-world scenarios, a single code segment may correspond to multiple CWE types. For example, a buffer overflow vulnerability may involve both missing boundary checks (CWE-119) and use-after-free issues (CWE-416). Our current model assumes single-label classification, which limits its applicability. We plan to incorporate multi-label classification techniques to better capture such complex cases.
Conceptualization, L.Y.; methodology, L.Y.; software, L.Y.; validation, L.Y.; formal analysis, Y.F.; investigation, Q.Z.; resources, L.Y.; data curation, Y.F.; writing—original draft preparation, L.Y.; writing—review and editing, Y.F. and Y.X.; visualization, Q.Z.; supervision, Z.L.; project administration, Q.Z.; funding acquisition, Z.L. and Y.X. All authors have read and agreed to the published version of the manuscript.
The data used in this paper are collected through our own experiments and are not yet publicly available. However, data may be obtained from the authors upon reasonable request.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 GoVulDect framework. (Module 1: Graph-Level Feature Extraction Module, which extracts graph-level features of potentially concurrent Go function slices. Module 2: Token-Level Feature Extraction Module, which extracts token-level features of slices using taint analysis. Module 3: Detection Module, which classifies and detects vulnerabilities in Go source code.)
Figure 2 Go program code property graph after preprocessing.
Figure 3 Sample Go code.
Figure 4 Sampling and aggregation process of graph random walk network.
Figure 5 Go code lexical analysis.
Figure 6 Examples of SpanBERT pre-trained model extraction and representation.
Figure 7 Loss and accuracy curves during training and validation. (The green curve represents the training loss of the model as the number of training epochs increases, while the red curve represents the model’s validation accuracy.)
Figure 8 The confusion matrix of the model’s classification results for multiple types of data. (The darker the color, the higher the model’s accuracy in distinguishing vulnerabilities).
Figure 9 Scatter plot comparison of fused features. (Yellow and purple represent different vulnerability features, respectively.)
Figure 10 Different code representation comparison experiment. ((left) displays the ROC curves of GraphSAGE under several different code representations, while in the (right), the blue bar represents the CFG, the orange bar represents the CPG, and the green bar represents the DFG.)
Top 10 CWE types by frequency in the dataset.
CWE-ID | Description | Number |
---|---|---|
CWE-79 | Improper neutralization of input during web page generation. | 67 |
CWE-22 | Improper limitation of a pathname to a restricted directory. | 45 |
CWE-400 | Uncontrolled resource consumption. | 39 |
CWE-20 | Improper input validation. | 35 |
CWE-287 | Improper authentication. | 24 |
CWE-284 | Improper access control. | 21 |
CWE-200 | Exposure of sensitive information to an unauthorized user. | 19 |
CWE-601 | URL redirection to an untrusted site. (“Open Redirect”) | 17 |
CWE-863 | Incorrect authorization. | 17 |
CWE-352 | Cross-Site Request Forgery. (CSRF) | 14 |
Experimental environment configuration details.
Component | Configuration |
---|---|
Operating System | Ubuntu 18.04.1 |
Programming Language | Programming Language |
Major Python Libraries | Pyg-lib == 0.3.1 + pt21cu121 |
Torch == 2.0.1 + cu118 | |
Scikit-learn == 1.3.2 | |
Transformers == 4.35.2 |
GraphSAGE model hyperparameters.
Parameter | Description | Value |
---|---|---|
num_samples | Neighbor sampling size | 25 |
aggregator_type | Aggregation function | mean |
embedding_size | Node embedding size | 64 |
num_layers | Number of graph convolution layers | 2 |
l2_reg | L2 regularization strength | 0.0001 |
learning_rate | Learning rate | 0.01 |
dropout | Dropout rate | 0.3 |
epochs | Training epochs | 500 |
Classification report of GoVulDect.
Class | Precision | Recall | F1-Score | Sample Size |
---|---|---|---|---|
Benign Samples | 0.96 | 0.94 | 0.95 | 1646 |
Vulnerable Samples | 0.94 | 0.96 | 0.95 | 1604 |
Accuracy | / | / | 0.95 | 3250 |
Macro AVG | 0.95 | 0.95 | 0.95 | 3250 |
Weighted AVG | 0.95 | 0.95 | 0.95 | 3250 |
Comparison of ablation experiment results.
Model | Precision | Recall | F1-Score |
---|---|---|---|
GoVulDect-Graph | 0.9058 | 0.9039 | 0.9043 |
GoVulDect-Tokens | 0.8878 | 0.8857 | 0.8859 |
GoVulDect | 0.9477 | 0.9513 | 0.9489 |
Comparison of the results of different classifiers.
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
MLP | 0.91 | 0.90 | 0.90 | 0.90 |
RF | 0.91 | 0.91 | 0.90 | 0.90 |
KNN | 0.89 | 0.88 | 0.88 | 0.88 |
SVM | 0.90 | 0.89 | 0.89 | 0.89 |
BiLSTM | 0.93 | 0.92 | 0.92 | 0.92 |
XGBoost | 0.94 | 0.95 | 0.94 | 0.95 |
Comparison of existing methods and tools.
Model/Tool | Architecture | Target Language | Supported |
---|---|---|---|
RATS | Rule-based static | C, C++, Perl, PHP, | Buffer overflow, |
CSGVD | PE-BL + Residual | C/C++ | / |
VDoTR | Third-order tensor | C/C++ | CWE-120, CWE-119, |
AMPLE | Graph simplification + | C/C++ | / |
GoVulDect | SpanBERT + | Go | Concurrency |
“/” indicates that the original paper did not provide detailed information.
Comparison of existing methods and tools.
Model/Tool | Dataset Size | Precision * | Recall * | F1-Score * |
---|---|---|---|---|
RATS | — | 0.5291 | 0.5400 | 0.5288 |
CSGVD | 27,318 | 0.7205 | 0.7442 | 0.6733 |
VDoTR | 93,539 | 0.7947 | 0.7800 | 0.7762 |
AMPLE | 219,829 | 0.8872 | 0.8860 | 0.8859 |
GoVulDect | 129,978 | 0.9164 | 0.9137 | 0.9135 |
* indicates that Precision, Recall, and F1-Score are evaluated on the dataset used in this study.
Performance comparison results of different models.
Model/Tool | Training Time (s) | Detection Time (s) |
---|---|---|
RATS | / | 0.012 |
CSGVD | 7589.810 | 62.063 |
VDoTR | 17,822.211 | 79.180 |
AMPLE | 5125.166 | 17.121 |
GoVulDect | 4221.945 | 3.116 |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
With the widespread application of the Go language, the demand for vulnerability detection in Go programs is increasing. Existing detection models and methods have deficiencies in extracting source code features of Go programs and mainly focus on detecting concurrency vulnerabilities. In response to these issues, we propose a Go program vulnerability detection method based on a graph neural network (GNN). The core of this approach is to utilize GraphSAGE to extract the global structure and deep semantic information of each concurrent function, maximizing the learning of concurrency vulnerability features. To capture contextual information of fine-grained code fragments in source code, we employ taint analysis to extract taint propagation chains and use a Transformer model with a multi-head attention mechanism, based on lexical analysis, to extract fine-grained vulnerability features. We integrate graph-level and token-level features to maximize the detection of various complex types of vulnerabilities in Go source code. Experimental results on a real-world vulnerability dataset demonstrate that our model outperforms existing detection methods and tools, achieving an F1-score of 91.35%. Furthermore, ablation experiments confirm that the proposed feature fusion method effectively extracts deep vulnerability features.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer