Content area

Abstract

Static analysis techniques, particularly those using Abstract Syntax Trees (ASTs), are widely employed, for example, to detect vulnerabilities and extract design patterns. These analyses can be applied to individual translation units in the C language, which is particularly important for the language as it includes manual memory management and is prone to memory-related bugs and vulnerabilities. However, challenges arise when attempting to scale analyses across large codebases due to the complexities of C’s language features, build processes, and fragmented development environments. These complexities make it challenging to gather all the dependencies of a source file, making it incomplete in terms of missing symbols (e.g., functions). This may prevent parsing the code to obtain a valid AST (i.e., standard-compliant in terms of syntax and semantics). Additionally, some datasets may inherently contain incomplete code, as is the case with datasets comprising standalone functions or partial code generated by Large Language Models (LLMs). Existing C/C++ standard-compliant compiler technologies (e.g., Clang and GCC) should be integrated into analysis tools and pipelines over custom solutions to avoid the inherent effort of developing, maintaining, and testing them. Current robust parsing approaches tackle incomplete or erroneous code in general but often employ non-standard constructs and other undesirable by-products when building code models. An alternative approach is to iteratively fix incomplete code until a syntactically and semantically correct version is achieved (based on a C/C++ standard). The TranslationUnitPatcher tool follows this approach but targets C++ only, lacking comprehensive patching capabilities for C-specific issues. It also has limited debugging, configurability, and patching heuristics (e.g., non-robust stop condition). This thesis proposes an approach, “code mending”, inspired by TranslationUnitPatcher (from the Clava project), which also uses compiler diagnostics to guide the fixing process of incomplete C code and generalizes for C++. This approach and our implementation, CMender, leverages a diagnostics-informed mending architecture to iteratively repair incomplete C code, ultimately enabling the extraction of standard ASTs without the need for custom compiler frontends or robust parsing. CMender was implemented along with diag-exporter, a tool for extracting Clang compiler diagnostics more efficiently and with more structured detail than TranslationUnitErrorDumper (from the Clava project). CMender also introduces debugging and more configuration capabilities, as well as the capability of processing more than one diagnostic in a single iteration and more safeguards to prevent infinite loops while pinpointing missing symbols more accurately. The solution was tested against a set of open-source projects. CMender demonstrated improvements over TranslationUnitPatcher, for instance, in success rates in certain situations, an overall lower running time, and the absence of timeouts. Nevertheless, challenges remain, such as handling complex diagnostics, enhancing heuristics, improving debugging and traceability, and resource usage optimization. Additionally, exploring the integration with Clang’s “AST” libraries may provide more context for mending complex code constructs. 

This approach opens up new possibilities for scalable, automated code analysis without relying on robust parsing or custom compiler frontends, contributing to more efficient and reliable software development practices.

Details

1010268
Classification
Title
CMender: Code Mending Tool to Enable Large-Scale Analysis of Incomplete C Code
Number of pages
110
Publication year
2025
Degree date
2025
School code
5896
Source
MAI 87/5(E), Masters Abstracts International
ISBN
9798297964686
University/institution
Universidade do Porto (Portugal)
University location
Portugal
Degree
M.Eng.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32267243
ProQuest document ID
3271850977
Document URL
https://www.proquest.com/dissertations-theses/cmender-code-mending-tool-enable-large-scale/docview/3271850977/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic