Content area
Log levels are crucial to distinguish the severity of logs and directly reflecting the urgency of transactions in software systems. Automatically and efficiently determining log levels is a crucial and challenging task in log management. Current log-level automatic prediction approaches using Abstract Syntax Tree-based representation graphs do not consider the fine-grained semantics, e.g., the effects of subtle syntactic differences among similar programs and the semantics of different edges, which leads to poor accuracy in log-level prediction. To address these issues, we perform data augmentation by changing the shape of the abstract syntax tree based on code transformations without changing the semantics of the code. Meanwhile, we integrate Data Flow and Call Relationships into a code representation graph and define eight types of edges in the graph. Then, we design a multi-relational graph neural network that learns the impact of different types of edges on the log-level prediction task and learns the corresponding weights of these edges based on their types. To verify the effectiveness of our proposed approach, we conduct experiments in widely-used open-source systems. Experimental results show that our proposed approach has prominent advantages over state-of-the-art methods in predicting log levels.
Introduction
Logs are vital in software development, system operations, maintenance, and data analysis (Wang et al. 2017; He et al. 2017; Fu et al. 2014). These provide a detailed record of system operational status, application behavior, security events, user activities, and other important information essential for understanding and maintaining complex systems (Nagappan et al. 2009; Chen et al. 2017). A log statement usually consists of log-level, text message, and associated variables. For example, “LOG.error(“Address is not in the configuration”: + m.addr)” is a log statement in a piece of code, where “Address is not in the configuration:” is the log message, m.addr is the relevant variable, and the log-level is set to error, which indicates that the current state of the system is incorrect and may cause the system to crash. Log levels signify the severity and importance of log messages, which helps professionals to filter and view logs according to different scenarios and needs.
[See PDF for image]
Fig. 1
Code samples with different syntactic structures for the same semantics
In today’s world of exploding artificial intelligence, internet services, and software systems, deciding log levels correctly and quickly for many logs in such fields is crucial. However, manually deciding log levels correctly and quickly remains a challenge for software developers (Li et al. 2017). Recent researches are devoted to assisting software developers in deciding log levels. For instance, LANCE is the first to use deep learning to implement a sufficient system to automatically generate and inject complete log statements using a pre-trained model and associated pre-training tasks to generate log statements based on the code context automatically (Mastropaolo et al. 2022). Ding et al. (2023b) and Li et al. (2021a) explore a deeper relationship between logs and code. The former investigates the timing relationship between log generation and code execution, while the latter analyzes the connection between repeated log statements and code cloning.
To automatically decide log levels, numerous approaches are proposed. For example, DeepLV finds that the grade of a log is closely related to the location where the log statement is located, and therefore uses Abstract Syntax Tree (AST) to represent the syntax-related contextual features of log statements (Li et al. 2021b). However, this approach only considers syntax and context location relationships and fails to capture the semantic information of the code fully. To remedy this deficiency, the TeLL studies consider syntax and context location relationships and introduces a Control Flow Graph (CFG) on top of AST, thus adding stronger semantic relationships to the code representation (Liu et al. 2022). Although the TeLL combines syntactic and semantic relations of code, the granularity of its code representation still requires further fine-grained refinement. Specifically, firstly TeLL’s embedding of code representation graphs using graph neural networks does not fully consider the impact of different types of edges of the graph (e.g., syntactic edges, control flow edges) on the log-level prediction task. Secondly, TeLL does not consider data flow, while GraphCodeBERT demonstrates the enhanced code representation by introducing a Data Flow Graph (DFG) in the AST has achieved significant results (Guo et al. 2020). This inspires us to further incorporate the DFG in the code representation graph. In doing so, it enables the model to capture the dynamic behavior of the code and the data transfer paths more comprehensively, thus improving the accuracy of log-level prediction.
However, the AST used in previous work is an ordered tree structure that effectively represents the syntactic information of the code. In an AST, the order of the nodes is crucial to accurately represent the syntactic structure of the program (Zhang et al. 2019). This means that even if two sibling nodes have the same parent, there exists a definite relative order between them which reflects the actual order of occurrence in the source code. However, in some special scenarios, different code syntaxes (different AST structures) may express the same semantics (Xian et al. 2024; Li et al. 2023). For example, as shown in Fig. 1, after transforming the for loop of a method in Zookeeper into a while loop, although the AST of the method has changed, the semantics it expresses is still a loop that executes count times. This situation may lead to misleading log-level prediction tasks based on graph neural networks, as the model may over-rely on syntactic structures and ignore semantic consistency.
To address the above-mentioned problems, we propose a novel model LLP-DAMG (log-level prediction based on code representation of data augment multi-relational graph) that learns graph-based code representations through a customized neural network architecture to enable end-to-end log-level prediction. We integrate multiple code representation methods to construct a comprehensive multi-relational graph. This graph combines the Abstract Syntax Trees (ASTs), Control Flow Graphs (CFGs), Data Flow Graphs (DFGs), and Call Relationships, with its edges labeled into eight distinct types. To cope with the problem of syntactically different but semantically identical ASTs, we introduce data augmentation to further enhance the semantic learning capability of the graph-based code representation. Furthermore, we design a specialized neural network to extract features from the multi-relational graphs for log-level suggestions. Then, we obtain a finer-grained code representation. To evaluate the performance of the proposed model, we compare LLP-DAMG with the state-of-the-art methods based on nine large open-source systems. The experimental results show: 1) The accuracy of log-level prediction is improved from 71.0% to 76.2%. 2) Transformational code data augmentation and multi-relational graph neural network (MRGNN) help the log-level prediction. In summary, we have made the following contributions:
To address the issue of different syntax tree structures with the same semantics, we introduce a transformation method that changes only the syntactic structure of the code without affecting its original semantics. Our approach adapts the structure of the AST to change the morphology of the code representation graph.
We integrate the data flow information and the call relationships, automatically learning the relative importance of different edge types in the code table view. These relationships are labelled as eight distinct types to enhance the fine-grained semantic representation of the code graph.
We conduct experiments on nine large open-source systems and the results validate the effectiveness of the LLP-DAMG for predicting log levels, outperforming state-of-the-art methods.
Related work
The task of logging is to construct log statements using appropriate descriptions and necessary program variables, and to insert these log statements into the correct locations in the source code to ensure the completeness and readability of the log information, as well as to facilitate the subsequent debugging, monitoring and troubleshooting. To improve the efficiency and accuracy of logging further, the automated generation of log statements has become an important research direction (Li et al. 2024b). The automated generation of log statements mainly concerns: where the logs should be recorded in the source code and what should be logged.
Where the logs should be recorded
Regarding in terms of where to log, Zhao et al. (2017) collected path frequency data through static and dynamic analyses, calculated entropy values using information theory, and found the optimal placement of log print statements within a specified performance overhead threshold using a dynamic programming algorithm. Li et al. (2018) investigated log statement placement decisions in software logging, and for the first time, used topic models (Topic Models) to analyze the relationship between the topics of code snippets and their logging possibilities. The results of the studies showed that code snippets with certain topics were more likely to contain log statements and that these topics had similar logging tendencies across systems. In addition, topic modeling significantly improved the accuracy of predicting whether a code snippet contained log statements.
What should be logged
In terms of what needs to be logged in a log, some previous work has investigated log-level recommendations using various machine learning and deep learning techniques (Li et al. 2017, 2021b; Liu et al. 2022; Ding et al. 2023a; Ouatiti et al. 2022). Li et al. (2021b) found that log messages and log statement locations were useful, and used recurrent neural networks to analyze ASTs to extract the syntactic context and message features of the log statements and make log-level recommendations accordingly. Liu et al. (2022) applied graph neural networks to encode intra- and inter-block features into code block representations for log-level recommendation. Li et al. (2017) utilized ordered regression machine learning models to recommend appropriate log levels for log statements. Ouatiti et al. (2022) focused on log-level prediction in a multi-component system, revealing the special challenges of this environment and the solution strategies. Ding et al. (2023a) utilized templates to generate messages for log statements, which simply meant that the code preceding the log statement served as the input, the source code was parsed into an AST, and the code that was similar to the log served as the context.
Several studies have explored generating complete log statements. Mastropaolo et al. (2022) was the first to implement a model for generating complete log statements using a pre-trained model, using the T5 pre-trained model as the basis for pre-training and micro-training for the log statement generation task. Li et al. (2021a) focused on the characterization of repetitive log statements in code cloning and the relationship between them. Insufficient information in the capture block and inconsistent error diagnosis information. Ding et al. (2023b) studied the timing relationship between log generation and code execution, derived rules to detect logical and semantic timing relationships between log records and code, and rules to detect timing inconsistencies between log records and code.
Log with Large Language Models(LLMs)
Some approaches using LLMs, for example, Li et al. (2024a) suggested using a self-refinement approach based on Chain of Thought (COT) hints derived from static analysis to merge static context into code hints. Xu et al. (2024) proposed UniLog, which used Codex (Chen et al. 2021), a fine-tuned GPT language model, to generate log statements, examined context learning and fine-tuning methods, and used several models from the GPT-3 family for comparison. Li et al. (2024b) further evaluated the LLMs for automated log generation and showed that pre-trained language models could effectively understand complex code structures and semantics and automatically generate high-quality log statements.
Code representation
Ben-Nun et al. (2018) constructed context flow graphs using programming language-independent LLVM-IR and represented code semantics using inst2vec embedding methods. Alon et al. (2019) captureed code semantics using AST paths and combined this approach with neural network models for training. Zhang et al. (2019) proposed the ASTNN model, which decomposed large ASTs into small expression trees, and processed the natural order with a bi-directional RNN to generate vector representations of code segments. Feng et al. (2020) proposed CodeBERT, a bimodal pre-trained model based on the Transformer architecture, for supporting multiple natural language-programming language applications, with the introduction of a replacement marker detection task. Guo et al. (2020) proposed GraphCodeBERT by introducing data streams as semantic-level structures and combined a structure-aware pretraining task with an efficient graph-guided attention mechanism. Guo et al. (2022) introduced mask attention matrices and prefix adapters to control model behavior, used cross-modal content (e.g., ASTs and code annotations) to enhance the semantic representation of code, and proposed a one-to-one mapping approach from ASTs to sequence structures. Li et al. (2022) presented a large-scale pre-training model and a multi-task learning framework to automate code review activities.
Table 1 summarizes the key aspects considered in the study of log-level prediction tasks and code representation learning. Most existing research relies on AST-based code representation graphs or pre-training on code tokens. However, the AST-based code representation graph approach has several limitations: it does not simultaneously incorporate the AST, CFG, DFG, and call relationships, nor does it account for the impact of different types of edges on log-level prediction tasks. Additionally, this method suffers from graph structure changes caused by syntax variations, even when the underlying code semantics remain unchanged. To address these research gaps, this paper proposes LLP-DAMG.
Table 1. Different aspects considered in various studies
Source | Ordinal Regression | Tokens of Code | AST | CFG | DFG | Call Relationships | Data Augmentation | Type of Edge | Log Message | Graph Neural Network |
|---|---|---|---|---|---|---|---|---|---|---|
Li et al. (2017) | ||||||||||
Li et al. (2021b) | ||||||||||
Liu et al. (2022) | ||||||||||
Mastropaolo et al. (2022) | ||||||||||
Ding et al. (2023a) | ||||||||||
Li et al. (2024a) | ||||||||||
Xu et al. (2024) | ||||||||||
Zhang et al. (2019) | ||||||||||
Feng et al. (2020) | ||||||||||
Guo et al. (2020) | ||||||||||
Guo et al. (2022) | ||||||||||
Li et al. (2022) | ||||||||||
LLP-DAMG |
[See PDF for image]
Fig. 2
Overview of the LLP-DAMG
Methodology
Overview of the methodology
To help professionals decide log levels quickly and accurately, we propose an end-to-end log-level prediction approach LLP-DAMG. Figure 2 summarizes our approach, which is divided into three phases: 1) data augmentation of the transformed code, 2) multi-relational graph construction, and 3) prediction of the log levels on the multi-relational graph.
In the data augmentation of the transformed code phase, for a given code file, we use the tree-sitter toolset1 to parse the code file into an AST. Without changing the semantics of the code itself, we transform the AST structure (i.e., change the syntactic architecture of the code) to six new ASTs, with considering the six syntax transform structures (detailed in Section 3.2). Then, the transformed code files are generated from the newly generated ASTs, which together with the original code files form the complete dataset after data augmentation.
In the multi-relational graph construction phase, we flatten the original code files and the transformed code files within the complete dataset into a series of code files, then we use tree-sitter to parse all the flattened code files into ASTs. During the parsing, we eliminate syntax errors from the complete dataset. After this, the control flows, the data flows and the call relationships are extracted based on the ASTs, which are then combined into a code representation graph structure with multi-type edge fusion. In the Graph, there are eight types of edges, where each type represents a syntax or semantic relationship.
In the prediction of the log levels phase, we further construct the MRGNN to predict the level of logs by using a multi-relational graph with fusing multiple types of edges. To learn the syntax and semantic features of the code, our approach utilizes the information transformation and aggregation of the graph neural network, while considering the effects of different types of edges on the syntax and semantics of the code. After obtaining the syntax and semantic feature representation of the code, the LLP-DAMG predicts the appropriate log-level for the labeled multi-relational graph nodes, helping professionals to understand the system better.
Data augmentation for transforming codes
For a given code file, we parse it into an AST, which is a tree structure used to represent the syntactic information of the code. Then, as illustrated in the Transform Code step in Fig. 2, we transform the AST to generate six new ASTs that are different in AST structure (i.e., different in code syntax) but have the same semantics as the source code, which realizes data augmentation. Our code data augmentation is similar to previous work (Xian et al. 2024) and includes the following six patterns:
Changing the Order of Declarations: Changing the order of declarations of local variables within a method without generating syntax errors and without changing the semantics of the code. As shown in the following formula,
1
where indicates interconversion between codes. Changing the order of declarations exchanges two local variable declarations, switching the position of the left and right subtrees of a node in the abstract syntax tree.Loop Conversion: Convert while statements to for statements and convert for statements to while statements.
2
Loop Conversion converts the original for loop statement into a while loop statement, which changes the structure of a subtree in the abstract syntax tree.Adding try/catch: Adding a try/catch statement to a non-log statement without changing the semantics.
3
Adding try/catch adds a pair of try/catch statements after the Statement1 to increase the number of children of a node in the AST.Adding Irrelevant Statements: Adding some statements that have no relation to the current method and do not affect the semantics at random locations in the method to change the structure of the AST.
4
Adding Irrelevant Statements adds the Statement3 between the Statement1 and the Statement2 that has nothing to do with the context and does not affect the semantics, such Statement3 can be a print statement that prints meaningless information or something else.Terms of Exchange: Swapping conditions in an if statement or a loop statement.
5
A block of if statements are exchanged to change the syntactic structure (i.e., the structure of the abstract syntax tree) without changing the semantics. The if statement condition in terms of exchange is changed from “” to “” with the same semantics, which changes the corresponding symbol node in the AST.Transformation of Operator: Transforming operators into an abbreviated form.
6
“” is transformed into “” form, which changes the depth and structure of the corresponding subtree in the AST.
Build multi-relational graph
In the AST, leaf nodes usually contain lexical information, such as variable names, identifiers, keywords, string literals, etc. These characteristics directly correspond to the actual text in the source code, which constitutes the basic unit of the program logic and is the key to understand the program logic. The non-leaf nodes in the AST represent the syntactic structures of the code, such as expressions, statements, declarations, and so on, which help to capture the syntactic features of the code.
[See PDF for image]
Fig. 3
Multi-relationship code representation
We build a multi-relationship graph initially based on an AST, and gradually add control flow and data flow relationships to this graph by traversing the AST and analyzing it. We use the tree-sitter tool to obtain ASTs from the original codes and the transformed codes. By traversing the entire structure of an AST, e.g., using breadth-first search, we identifiy control-flow related statements (e.g., if statements, for statements, etc.). When two interrelated control flow statements are found, we create an edge labelled “CFG” to connect their corresponding AST nodes. If an edge already exists when these two nodes are connected, the edge is labeled as an edge shared by the AST and the CFG. In the same AST traversal, we also extract information of variable nodes with the same variable name and pair them, which could be potential data flow paths. After completing the AST traversal, we filter out edges that are unlikely to be data flows, and then create an edge labelled “DFG” to represent the relation of the remaining potential data flow pairs into the graph. If the edge already exists in the graph, we add the “DFG” label to the edge. By this step, there are a total of seven types of edges in the Multi-relationship code graph, which are “AST”, “CFG”, “DFG”, “AST & CFG”, “AST & DFG”, “CFG & DFG”, and “AST & CFG & DFG”.
[See PDF for image]
Fig. 4
Illustration of Multi-relational graph neural network
In addition, we add the call relationships of the same file into the multi-relationship code graph. The process of call relationships is similar to that of the data flow, which will not be repeated here. Finally, the multi-relationship code graph contains eight types of edges, which enriches the variety of relationships compared with the original AST. As shown in Fig. 3, this multi-relationship code graph integrates multiple types of relations, where the edges and nodes of the AST capture the syntactic information of the code, and the control flow edges, data flow edges, and call relationships capture the semantic information of the code.
In this way, we utilize graph neural networks to obtain an embedded representation of the code on this combinatorial graph that contains complete syntactic and semantic information.
Predicting log levels using MRGNN
In this paper, the MRGNN utilizes the heterogeneity of edges in the code representation graph to make predictions about log levels. As shown in Fig. 4, the network consists of three main components: 1) constructing the feature matrix of the multi-relational graph, 2) starting from the node where the log-level prediction needs to be performed, and updating the embedding of the current node by propagating the embeddings of the neighboring graph nodes at multiple levels, 3) predicting the log level and generating an appropriate level for the log-level node based on its embedding.
We will describe these in detail in the following sections.
Constructing the feature matrix of a code representation graph
As our code representation graph is built on AST, to better learn the predictive features of the code representation, we take the same initialization embedding approach as TAILOR (Liu et al. 2023) which is also based on AST. We first traverse the multi-relational graph along the edges of the AST to obtain a node type/token symbol training corpus for training Word2Vec (Mikolov et al. 2013). We train the Word2Vec model to generate feature vectors of node types/token symbols in the AST. Since AST nodes contain types and tokens, we fuse the feature vectors of types and symbols. The fused vector will be used to represent the features of nodes in the graph. It is shown below:
7
where is the initial node feature vector generated using Word2Vec, and represent the type of node and the unique heat encoding of the token respectively, || is the concatenation operator. And is the feature vector of node .Code representation of information aggregation between graph nodes
First of all, since the multi-relational graph parsed from the transformed code files is very large, we use a neighbor sampler to sample the neighbor nodes of the target node. The sampling process extracts corresponding connected edges in the graph, where each edge contains a source node, a destination node, and a type of relationship between them in the form of a ternary. The edge is represented as (, , ), where is the source node, is the destination node, and is the type of relationship between source and destination nodes. Neighbor nodes in the multi-relational graph represent syntactic and semantic information of the code context.
To achieve a comprehensive modeling of the code context, LLP-DAMG constructs a graph structure that integrates multi-dimensional code relationships. The graph includes four types of relationships: AST, CFG, DFG, and Call, as well as combinations such as AST & CFG, AST & DFG, CFG & DFG, and AST & CFG & DFG. Inspired by R-GCN (Chen et al. 2019) and Bert (Koroteev 2021), we propose a Multi-Relationship Graph Neural Network (MRGNN), as illustrated in Fig. 4.
First, each layer propagates the neighbor information of the current layer through an R-GCN layer. The formula is as follows,
8
where is the target node, is an lth-order neighbor node of . is the set of lth-order neighbor nodes of , and R is a set of all the relationship types of the lth-order neighbor nodes of ; is the identity matrix of , and is the identity matrix of ; is the parameter matrix of the l-th layer, used to linearly transform the features of neighboring nodes with edge type , and is the parameter matrix of linear transformations of the source node; uses the LeakyReLU activation function.Then, we perform layer normalization on the output results of each R-GCN layer:
9
where is the output after message aggregation at layer l. E denotes the expected value. Var denotes the variance. is a very small constant, usually set to 1e-5 or a similar value to avoid the denominator being zero. is a learnable scaling parameter. is a learnable offset parameter.Afterward, another Droupout process is performed to avoid gradient explosion:
10
If the l-th layer is not the last layer, then is the input for the next layer. Otherwise, is the output of a layer network in MRGNN.After four iterations of information propagation, a series of representations for node is established, denoted as . Based on these representations, they are combined using a concatenation operator.
11
Log-level prediction
After the construction of the graph neural network with multi-type edge information, we continue the practice of DeepLV (Li et al. 2021b) and TeLL (Liu et al. 2022) of using log messages as auxiliary information to refine the code representation. Specifically, the log message embedding is represented as qm in the following equation:
12
where are the Word2Vec initialized embeddings of tokens in the log message, and N is the number of unique tokens, and is the average pooling function that combines the token embeddings into the log message representation.Then the appropriate log level needs to be predicted. In this paper, our log levels are categorized into five, which are Trace, Info, Warn, Debug, and Error. We merge the embeddings of target nodes in the multi-relational graph with the embeddings of log messages as features for classification. Technically, the classifier used in the classification is a fully connected layer that transforms the graph features and log messages into five log levels:
13
where is the log level probability distribution, is the result of information aggregation of multi-relational graphs by graph neural networks, , and are transpose parameter matrices and is the bias parameter.Experiment and evaluation
In this section, We first describe the implementation of our approach and the experimental parameter settings. Then, we evaluate the effectiveness of the LLP-DAMG in terms of log-level prediction. Specifically, we examine the following research questions:
RQ1: How does the LLP-DAMG perform on log-level prediction tasks compared to state-of-the-art techniques?
RQ2: To what extent does the design of the different components in the LLP-DAMG affect the performance of log-level prediction tasks?
RQ3: How well does the LLP-DAMG generalize?
RQ4: What are the overlaps and differences in prediction results between LLP-DAMG and baseline approaches?
Implementation
We use the tree-sitter to extract ASTs from Java files. Data enhancement and the construction of multi-relational graphs are developed using Python. The implementation of data augmentation is based on Transform Code (Xian et al. 2024), and the construction of multi-relational graphs is based on constructing code attribute graphs in the TAILOR (Liu et al. 2023). We implemented the MRGNN model using PyTorch (Paszke et al. 2019). The model uses the CrossEntropyLoss loss function with parameters optimized by the Adam optimizer (Kingma and Ba 2015). We use Word2Vec and skimming graph algorithms (Mikolov et al. 2013) to obtain the embedding of type/token in AST and tokens in log messages. All experiments performe on a single server, AMD EPYC 7402 24-Core, 32G of RAM, and an NVIDIA RTX3090 with 24G of display memory.
Table 2. Log level distribution of software systems
Systems | Version | Trace | Debug | Info | Warn | Error | Totals |
|---|---|---|---|---|---|---|---|
Cassandra (Apache 2024a) | 4.1.7 | 446 | 408 | 668 | 396 | 335 | 2253 |
Elasticsearch (Elastic 2024) | 8.17.0 | 1223 | 1916 | 3986 | 1039 | 906 | 9070 |
Flink (Apache 2024b) | 1.20.0 | 40 | 764 | 798 | 468 | 343 | 2413 |
Hbase (Apache 2024c) | 2.5.10 | 149 | 797 | 1272 | 600 | 365 | 3183 |
Jmeter (Apache 2024d) | 5.6.3 | 1 | 714 | 345 | 438 | 425 | 1923 |
Kafka (Apache 2024e) | 3.9.0 | 296 | 930 | 770 | 399 | 742 | 3137 |
Karaf (Apache 2024f) | 4.4.6 | 15 | 168 | 163 | 130 | 171 | 647 |
Wicket (Apache 2024g) | 9.18.0 | 10 | 169 | 188 | 133 | 166 | 666 |
Zookeeper (Apache 2023) | 3.5.6 | 37 | 251 | 796 | 387 | 304 | 1775 |
Average | - | 129 | 491 | 576 | 361 | 352 | 1910 |
Experiment
Datasets: To evaluate our approach, we collect nine widely used Java-based system software projects from GitHub3
These systems have been extensively utilized in recent log-related research. Table 2 summarizes the log-level distribution across these systems. For each system, we deduplicate the data-augmentation labels and the original labels. Subsequently, we allocate 60%, 20%, and 20% of the log-level labels to form the disjoint training, validation, and test sets, respectively.
Experimental parameters’ setup: (1) Approach parameter settings: we use 4 layers of RGCNConv, and the number of neurons in each layer is 64, and we set Dropout probability to 0.4 between each layer of the RGCN model. (2) Training settings: since the graph structure data parsed from the code file is very large, we use batched training and neighbor sampler (Hamilton et al. 2017) with batch size of 32, fixed sampling neighbor node layer of 4, and number of sampled neighbor nodes per layer of 5. The sampled heterogeneous subgraphs are fed into the model for training.
These parameter values are set based on the results of comparative experiments, which are described in a later section.
Evaluation
Log-level prediction is a categorization task that predicts the log-level at a given location by understanding the code context syntax and semantics. We evaluate the performance of the approach in terms of accuracy (ACC), the area under the curve (AUC), and average ordered distance score (AOD). Similar to the usage in previous log-level prediction studies (Li et al. 2021b; Liu et al. 2022), the accuracy is the percentage of log levels that are correctly suggested in all suggested results. Higher accuracy means that the method can correctly predict more log levels.
14
where N is the total number of samples, is the number of true labels of the ith sample, is the i-th label predicted by the approach. I is the indicator function, where = 1 when , and 0 otherwise.The AUC, on the other hand, is a numerical metric that quantifies the classification performance. The AUC values are distributed between 0 and 1. The higher the value of AUC, the better the performance of the classifier. Following prior work (Li et al. 2021b; Liu et al. 2022), we use a multiple-class version of the AUC defined by Hand and Till (2001).
The AOD is used to calculate the average distance between the predicted log-level and the actual log-level with the following formula:
15
where N is the number of log statements, measures the distance between the actual log-level and the proposed log-level , and represents the maximum possible distance from the actual log-level .Table 3. Performance comparison among the DeepLV, TeLL and our proposed method across various systems
Systems | DeepLV | TeLL | Ours | ||||||
|---|---|---|---|---|---|---|---|---|---|
Accuracy | AUC | AOD | Accuracy | AUC | AOD | Accuracy | AUC | AOD | |
Cassandra | 0.606 | 0.842 | 0.805 | 0.635 | 0.884 | 0.812 | 0.694(+7.0%) | 0.923(+3.9%) | 0.874(+6.2%) |
ElasticSearch | 0.577 | 0.813 | 0.802 | 0.703 | 0.905 | 0.841 | 0.761(+5.8%) | 0.935(+3.0%) | 0.885(+2.4%) |
Flink | 0.652 | 0.851 | 0.838 | 0.729 | 0.925 | 0.863 | 0.744(+1.5%) | 0.905(-2%) | 0.842(-2.1%) |
Hbase | 0.603 | 0.842 | 0.817 | 0.707 | 0.921 | 0.873 | 0.725(+2.2%) | 0.913(-0.8%) | 0.858(-1.5%) |
Jmeter | 0.623 | 0.839 | 0.809 | 0.737 | 0.921 | 0.872 | 0.800(+6.3%) | 0.951(+3.0%) | 0.904(+3.2%) |
Kafka | 0.518 | 0.795 | 0.775 | 0.642 | 0.888 | 0.812 | 0.708(+6.6%) | 0.925(+3.6%) | 0.880(+6.8%) |
Karaf | 0.672 | 0.856 | 0.816 | 0.750 | 0.908 | 0.867 | 0.797(+4.7%) | 0.938(+3.0%) | 0.891(+1.4%) |
Wicket | 0.638 | 0.850 | 0.793 | 0.744 | 0.899 | 0.856 | 0.785(+4.1%) | 0.897(-0.2%) | 0.852(-0.4%) |
Zookeeper | 0.609 | 0.848 | 0.820 | 0.746 | 0.924 | 0.887 | 0.780(+3.4%) | 0.945(+2.1%) | 0.912(+2.5%) |
Average | 0.611 | 0.837 | 0.808 | 0.710 | 0.710 | 0.854 | 0.755(+4.5%) | 0.926(+1.8%) | 0.878(+2.1%) |
RQ1: How does the LLP-DAMG perform on log-level prediction tasks compared to state-of-the-art techniques?
We use the ACC, AUC, and AOD to measure the performance of the LLP-DAMG and state-of-the-art. Table 3 shows the performance of our method compared to two other state-of-the-art methods, DeepLV and TeLL, on nine different systems. The results show that our method achieves better performance in most systems.
In Cassandra system, our method improves the ACC by 9.9% compared to DeepLV, the AUC by 3.9% compared to TeLL, and the AOD by 6.2%. For ElasticSearch system, our method also performs well, improving the ACC, AUC, and AOD by 5.8%, 3.0%, and 2.4%, respectively. Other systems such as Flink, Hbase, and Jmeter show similar trends, and despite slight fluctuations in individual metrics, overall our method maintains a high level of performance.
It is worth noting that although in some systems (e.g., Flink and Hbase), the improvement of our method in the AUC and the AOD is not as significant as in other systems, it still improves in the ACC. This suggests that our approach is not only applicable to a wide range of system types, but also provides consistent improvements in a variety of performance metrics.
[See PDF for image]
Fig. 5
Comparison of classification accuracy with TeLL under zookeeper system
Figure 5 shows the ACC comparison between our approach and TeLL at different log levels on the Zookeeper system. It can be seen that among the five levels of logs, namely Trace, Debug, Info, Warn, and Error, our approach outperforms TeLL at the Trace, Info, Warn, and Error levels, while it is slightly inferior at the Debug level. Specifically, our approach achieves the ACC of 0.87 at the Info level which is a 103% improvement compared to TeLL, and the ACC of 0.65 at the Error level which is a 22% improvement compared to the TeLL. However, at the Trace level and Debug level, the ACC of our method is lower than that of TeLL. We examine the distribution of data labels and find that there are fewer Trace and Debug type labels than other types of labels, so we believe this may be due to the label imbalance in the dataset. Overall, our approach performs better at most log levels, especially for higher severity log levels.
[See PDF for image]
Fig. 6
Log prediction data distribution chart
To further investigate the performance of the LLP-DAMG and TeLL on the log-level prediction task, we visualize the last layer embeddings of the LLP-DAMG and TeLL in Fig. 6 (a) and (b), respectively. It can be seen in Fig. 6 (b) that the LLP-DAMG exhibits excellent classification results, which not only succeeds in clustering code snippets with the same log level to form clear clusters, but also makes obvious boundaries between different clusters. In contrast, although the TeLL in Fig. 6 (a) also has some clustering tendency, it is slightly inferior in terms of the compactness of clusters and the separation between clusters. This visualization suggests that the LLP-DAMG may have superior performance in the log-level prediction.
In summary, Table 3, Figs. 5 and 6 demonstrate the advantages of our method over existing techniques, especially in terms of improved prediction accuracy and reduced error. This finding provides solid support for subsequent studies and highlights the progress we have made in solving log-level prediction problems in the nine software systems.
Table 4. Performance of different TeLL variants in log-level prediction
Approach | ACC | ACC (%) | AUC | AUC (%) | AOD | AOD (%) |
|---|---|---|---|---|---|---|
w/o DA + GraphSAGE | 0.668 | -14.36% | 0.787 | -16.72% | 0.675 | -25.99% |
w/o TO + MRGNN | 0.754 | -3.39% | 0.914 | -3.32% | 0.885 | -2.97% |
w/o COD + MRGNN | 0.745 | -4.43% | 0.905 | -4.21% | 0.873 | -4.26% |
w/o LC + MRGNN | 0.719 | -7.82% | 0.873 | -7.64% | 0.848 | -6.98% |
w/o TOE + MRGNN | 0.725 | -7.04% | 0.878 | -7.09% | 0.852 | -6.59% |
w/o IRS + MRGNN | 0.699 | -10.43% | 0.838 | -11.30% | 0.837 | -8.27% |
w/ DA+GraphSAGE | 0.767 | -1.67% | 0.903 | -4.44% | 0.774 | -15.13% |
w/o DA + MRGNN | 0.736 | -5.64% | 0.875 | -7.41% | 0.792 | -13.16% |
LLP-DAMG(w/ DA + MRGNN) | 0.780 | - | 0.945 | - | 0.912 | - |
Note: TO, COD, LC, TOE, and AIS respectively refer to Transformation of operator, Changing the order of declarations, loop conversion, terms of exchange, and adding irrelevant statements
RQ2: To what extent does the design of the different components in the LLP-DAMG affect the performance of log-level prediction tasks?
In the previous section, we verified the performance of the LLP-DAMG on log-level prediction. In this section, we perform ablation experiments to further validate the effectiveness of different components of the approach.
Table 4 shows the results of ablation experiments aimed at exploring the impact of each component on the performance of final results. Specifically, we progressively remove different parts of the data augmentation and observed their impact on the ACC, AUC and AOD metrics.
First, the configuration without data augmentation + GraphSAGE resulted in an approach with the ACC of 0.668, the AUC of 0.787, and the AOD of 0.675. Next, we keep the MRGNN component but remove the update expression, at which point the approach’s ACC is increased to 0.754, the AUC is improved to 0.914, and the AOD is raised to 0.885. This shows that updating the expression affects performance.
We continue to explore the effectiveness of the local variable declaration order transformation. After removing the transformation, the approach’s ACC is decreased to 0.745, the AUC is decreased to 0.905, and the AOD is decreased to 0.873. This suggests that the local variable declaration order transformation is also one of the important factors in improving the performance.
After that, we analyze the role of cyclic transformations. By removing the cyclic transformations, the ACC is decreased to 0.719, the AUC is decreased to 0.873, and the AOD is decreased to 0.848. This shows that cyclic transformations also contribute positively to the approach’s performance.
Furthermore, we examine the importance of the conditional order transformation, and after removing the conditional order transformation, the ACC of the approach again decreases to 0.725, the AUC further decreases to 0.878, and the AOD decreases to 0.852. This suggests that the conditional order transformation is also a key part of the approach to performance enhancement.
Finally, we consider all data augmentation strategies with the MRGNN together, which is known as the proposed LLP-DAMG approach. The ACC of the approach reaches the highest point 0.780, the AUC achieves 0.945, and the AOD is improved to 0.912, which fully proves the overall effectiveness of the data augmentation strategies and the MRGNN on the enhancement of performance of log-level prediction.
[See PDF for image]
Fig. 7
Effectiveness of the number of sampling layers and the number of samples per layer on the accuracy of log-level prediction
Figure 7 illustrates a heatmap of the effectiveness of the number of sampling layers and the number of samples per layer on the prediction accuracy of log levels. The horizontal axis of the heatmap indicates the number of nodes sampled per layer, from 1 to 9. The vertical axis of the heatmap indicates the layers of sampled neighbors, from 2 to 5. The value in each cell represents the approach accuracy under a particular combination of parameters. The color shades of the cells correlate with the accuracy, with darker colors indicating higher accuracy.
As shown in the Fig. 7, the accuracy of the method is gradually improving as the number of sampling nodes per layer and the number of sampling layers increase. After the number of sampling layers reaches 4 and the number of sampling nodes per layer reaches 5, the accuracy of the approach becomes stabilization and reaches the highest value 0.78.
In summary, the ablation experiments reveal all data augmentation strategies and the MRGNN play an important role in accurately predicting log levels, and enables higher accuracy than state-of-the-art approaches.
[See PDF for image]
Fig. 8
Cross-system accuracy of the LLP-DAMRG
RQ3: How well does the approach generalize?
In this section, we mainly compare the generalization ability of different methods. The logging habits of different software developers are different, and some developers do not have good logging practices. In some cases, there may not be enough log data in the system to train the automatic log-level prediction approach.
A predictive approach with good generalization capabilities perform accurate log-level prediction in different types of software systems. The generalization ability of such an approach reduces the negative impact of insufficient training data, and helps developers to quickly and accurately determine log levels in the development of different software systems.
Specifically, we train the approach by using the data of the Zookeeper system and then evaluate the accuracy of the approach on all other eight systems.
Figure 8 shows the accuracy of our approach on different systems. As can be seen from the figure, although our approach and TeLL are trained using data from the zookeeper system only, both TeLL and our method exhibit high accuracy. The accuracy of our method outperforms TeLL on six systems. Especially, it is worth mentioning that in the Elasticsearch system, the accuracy of our approach improves by 12% over TeLL, which far exceeds the performance of the benchmark model TeLL. This shows that our approach has good generalization ability and stability across systems.
RQ4: what are the overlaps and differences in prediction results between LLP-DAMG and baseline approaches?
In this RQ, we primarily investigate the overlaps and differences in log-level predictions between the LLP-DAMG and the baseline for the same logs. These differences suggest that the LLP-DAMG has advantages in capturing code semantics and structural information over the baseline methods.
[See PDF for image]
Fig. 9
Venn diagram of predicting log-level by the LLP-DAMG and baseline approaches on the Zookeeper dataset
From the data parsed by LLP-DAMG and the baseline methods from the Zookeeper dataset, we select 265 overlapping log-level labels as the test set. The remaining data was used for training and validation. Based on this data split, experiments are conducted on the approach.
Figure 9 presents a Venn diagram comparing the performance of LLP-DAMG with two baseline methods, the DeepLV and TeLL, in the log-level prediction task. The overlapping areas between the circles represent the number of predictions that are correct for approaches, while the non-overlapping areas indicate the number of predictions that are uniquely correct for each individual approach. Our approach demonstrates advantages in both the total number of correct predictions and the number of samples correctly predicted only by our method, followed by the TeLL. The higher overlap between our method and the TeLL is attributed to their shared use of code representation graphs constructed based on ASTs, which exhibit certain similar characteristics.
[See PDF for image]
Fig. 10
Examples of LLP-DAMG’s Advantages
Then, we have a case study. Figure 10 (a) demonstrates the unique code context of log levels correctly predicted by LLP-DAMG. Baseline approaches, influenced by the “Error” marked in the blue box and the log message “Unknown op” may predict the log level as Error. However, the LLP-DAMG incorporates data flow in its code representation, enabling it to extract the data flow highlighted in the red box and other related data flows. Through learning, the LLP-DAMG identifies that “Unknown op” in the log statement is a transaction object containing attributes such as ClientId, time, and Type, and understands that the purpose of logging unknown operation types is to help developers or operators identify unhandled transaction types in the system. While this situation typically does not cause program errors, it needs to be recorded for subsequent analysis or improvement.
Additionally, Fig. 10 (b) presents a code example in RQ4 where only LLP-DAMG made the correct prediction. This example consists of log statements located within try/catch blocks, with most logs residing inside these blocks. Since most log levels in catch statements are typically “Warn” or “Error”, baseline approaches predict these log levels as Warn or Error. However, LLP-DAMG employs code data augmentation by adding try/catch blocks. This augmentation enables the LLP-DAMG to learn which exceptions within try/catch blocks require special attention from system maintainers.
Threats to validity
The threats to the validity of our approach mainly include four aspects. Firstly, the scalability of the current method is limited, as it focuses primarily on the Java language, and its AST parsing and semantic relation extraction are highly dependent on Java syntax features. If extended to other languages such as Python or C++, the AST conversion rules and relationship definitions would need to be redesigned. Specifically, the ASTs, CFGs, DFGs, and call relations we extract are designed and implemented based on the characteristics of Java programs. Implementing these program analysis techniques relies on the syntactic structures and semantic rules of the Java language.
Next, we did not compare with large language models in the log-level prediction task. There are significant differences between deep learning models and large language models in terms of computational resources and model complexity. Deep learning models are relatively simple in structure, with fewer parameters, and offer faster training and inference speeds. In contrast, large language models are structurally complex, with a massive number of parameters, resulting in slower training and inference speeds. Therefore, directly comparing their performance may be unfair and cannot fully reflect their respective strengths and limitations.
Moreover, regarding computational efficiency, data augmentation generates diverse AST structures through syntax conversion, which significantly improves the model’s generalization ability but also leads to a significant increase in the size of the code representation graph. Although data augmentation only generates additional semantic equivalent code during the training phase (without changing the inference speed during the testing phase), the complex graph structure introduced by it will still increase the computational burden of graph neural networks, manifested in a decrease in training speed and an increase in video memory usage. To address this challenge, LLP-DAMG adopts a neighbor sampling strategy, which randomly selects subgraphs from multiple layers of neighbors for training to reduce the video memory requirements. However, this strategy may lose some contextual information and may have a negative impact on prediction accuracy. Therefore, in the future, this balance can be further optimized through dynamic sampling strategies, such as adaptive sampling based on node importance or edge type frequency, to maximize information integrity while controlling computational overhead.
Additionally, our approach does not consider the call relationships between code files. These relationships may contain semantic information that could address the limitations of the current approach. For example, an object of an entity class defined in another file might be created, and its private methods might be invoked. Therefore, in future log-level prediction research, we plan to incorporate call relationships between code files.
Finally, GPU memory limitations impact the approach performance. Data augmentation increases the number of nodes and edges in the code representations, enhancing data diversity and model generalization but also expanding the computational graph scale, exacerbating GPU memory consumption. With only a single Nvidia RTX 3090 (24GB), larger graphs cause memory insufficiency, limiting batch size and forcing training on smaller subgraphs, potentially affecting final performance. Memory constraints also require compromises in data augmentation and graph construction, such as reducing augmentation complexity or graph scale, which limit augmentation effectiveness.
Conclusion
In this paper, we propose the LLP-DAMG for log-level prediction, which explores the potential of code syntax-enhanced multi-relational graphs for log-level prediction. First, we make structural changes to the AST to generate new code data and construct a joint graph structure containing nine relations on top of the AST. Then, we design the MRGNN based on the multi-relational graph so that different relations learn different weights. In this way, syntactic, semantic, and multi-relational information is aptly fused into the day node to predict its rank. We evaluate LLP-DAMG on nine large software systems. Experimental results show that our method outperforms existing methods in log-level prediction.
In future work, it is important to focus on the impact of call relationships on log-level prediction. Cross-file call relationships are particularly valuable, as they better capture the contextual semantics of object-oriented programming languages.
Acknowledgements
This research was supported by the China Scholarship Council (liujinmei [2023] No. 21), the China University Industry University Research Innovation Fund - new generation information technology innovation project (2023IT056) of the university science research and development center of the Ministry of Education.
Data Availability
The dataset used in this manuscript is publicly available system software source code. For detailed information about these datasets, please refer to Section 4.2 “Experiment” and access links.
Declarations
Competing Interests
The authors declare that they have no conflict of interest.
https://tree-sitter.github.io/tree-sitter/
2https://www.tiobe.com/tiobe-index/
3Zookeeper: https://github.com/apache/zookeeper; Cassandra: https://github.com/apache/cassandra; Flink: https://github.com/apache/flink; Hbase: https://github.com/apache/hbase; Jmeter: https://github.com/apache/jmeter; Kafka: https://github.com/apache/kafka; Wicket: https://github.com/apache/wicket
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Alon U, Zilberstein M, Levy O et al (2019) code2vec: learning distributed representations of code. Proc ACM Program Lang 3(POPL):1–29
Apache (2023) Zookeeper. https://github.com/apache/zookeeper
Apache (2024a) Cassandra. https://github.com/apache/cassandra
Apache (2024b) Flink. https://github.com/apache/flink
Apache (2024c) Hbase. https://github.com/apache/hbase
Apache (2024d) Jmeter. https://github.com/apache/jmeter
Apache (2024e) Kafka. https://github.com/apache/kafka
Apache (2024f) Karaf. https://github.com/apache/karaf
Apache (2024g) Wicket. https://github.com/apache/wicket
Ben-Nun T, Jakobovits AS, Hoefler T (2018) Neural code comprehension: a learnable representation of code semantics. Advances in neural information processing systems, vol 31
Chen TH, Syer MD, Shang W et al (2017) Analytics-driven load testing: an industrial experience report on load testing of large-scale systems. In: 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). IEEE, pp 243–252
Chen J, Hou H, Gao J et al (2019) Rgcn: recurrent graph convolutional networks for target-dependent sentiment analysis. In: International conference on knowledge science, engineering and management. Springer, pp 667–675
Chen M, Tworek J, Jun H et al (2021) Evaluating large language models trained on code. arXiv:2107.03374
Ding, Z; Tang, Y; Cheng, X et al. Logentext-plus: improving neural machine translation based logging texts generation with syntactic templates. ACM Trans Softw Eng Methodol; 2023; 33,
Ding Z, Tang Y, Li Y et al (2023b) On the temporal relations between logging and code. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, pp 843–854
Elastic (2024) Elasticsearch. https://github.com/elastic/elasticsearch
Feng, Z; Guo, D; Tang, D et al. Codebert: A pre-trained model for programming and natural languages. Findings of the association for computational linguistics: EMNLP; 2020; 2020, pp. 1536-1547.
Fu Q, Zhu J, Hu W et al (2014) Where do developers log? An empirical study on logging practices in industry. In: Companion proceedings of the 36th international conference on software engineering, pp 24–33
Guo D, Lu S, Duan N et al (2022) Unixcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 7212–7225
Guo D, Ren S, Lu S et al (2020) Graphcodebert: pre-training code representations with data flow. In: International conference on learning representations
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in neural information processing systems, vol 30
Hand, DJ; Till, RJ. A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn; 2001; 45, pp. 171-186. [DOI: https://dx.doi.org/10.1023/A:1010920819831]
He, P; Zhu, J; He, S et al. Towards automated log parsing for large-scale log data analysis. IEEE Trans Depend Sec Comput; 2017; 15,
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. 3rd International Conference on Learning Representations (ICLR)
Koroteev MV (2021) Bert: a review of applications in natural language processing and understanding. arXiv:2103.11943
Li, H; Shang, W; Hassan, AE. Which log level should developers choose for a new logging statement?. Empir Softw Eng; 2017; 22, pp. 1684-1716. [DOI: https://dx.doi.org/10.1007/s10664-016-9456-2]
Li, H; Chen, TH; Shang, W et al. Studying software logging using topic models. Empir Softw Eng; 2018; 23, pp. 2655-2694. [DOI: https://dx.doi.org/10.1007/s10664-018-9595-8]
Li, Z; Chen, TH; Yang, J et al. Studying duplicate logging statements and their relationships with code clones. IEEE Trans Softw Eng; 2021; 48,
Li Z, Li H, Chen TH et al (2021b) Deeplv: Suggesting log levels using ordinal based neural networks. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, pp 1461–1472
Li Z, Lu S, Guo D et al (2022) Automating code review activities by large-scale pre-training. In: Proceedings of the 30th ACM joint European software engineering conference and symposium on the foundations of software engineering, pp 1035–1047
Li M, Chen S, Fan G et al (2023) Robustness-enhanced assertion generation method based on code mutation and attack defense. In: International conference on collaborative computing: networking, applications and worksharing. Springer, pp 281–300
Li Y, Huo Y, Zhong R et al (2024a) Go static: contextualized logging statement generation. Proc ACM Softw Eng 1(FSE):609–630
Li Y, Huo Y, Jiang Z et al (2024b) Exploring the effectiveness of llms in automated logging statement generation: an empirical study. IEEE Trans Softw Eng
Liu J, Zeng J, Wang X et al (2022) Tell: log level suggestions via modeling multi-level code block information. In: Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis, pp 27–38
Liu J, Zeng J, Wang X et al (2023) Learning graph-based code representations for source-level functional similarity detection. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, pp 345–357
Mastropaolo A, Pascarella L, Bavota G (2022) Using deep learning to generate complete log statements. In: Proceedings of the 44th international conference on software engineering, pp 2279–2290
Mikolov T, Sutskever I, Chen K et al (2013) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, vol 26
Nagappan M, Wu K, Vouk MA (2009) Efficiently extracting operational profiles from execution logs using suffix arrays. In: 2009 20th International symposium on software reliability engineering. IEEE, pp 41–50
Ouatiti, YE; Sayagh, M; Kerzazi, N et al. An empirical study on log level prediction for multi-component systems. IEEE Trans Softw Eng; 2022; 49,
Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, vol 32
Wang, J; Li, C; Han, S et al. Predictive maintenance based on event-log analysis: a case study. IBM J Res Dev; 2017; 61,
Xian Z, Huang R, Towey D et al (2024) Transformcode: a contrastive learning framework for code embedding via subtree transformation. IEEE Trans Softw Eng
Xu J, Cui Z, Zhao Y et al (2024) Unilog: Automatic logging via llm and in-context learning. In: Proceedings of the 46th IEEE/ACM international conference on software engineering, pp 1–12
Zhang J, Wang X, Zhang H et al (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, pp 783–794
Zhao X, Rodrigues K, Luo Y et al (2017) Log20: Fully automated optimal placement of log printing statements under specified overhead threshold. In: Proceedings of the 26th symposium on operating systems principles, pp 565–581
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.