Content area

Abstract

In recent years, the programming capabilities of large language models (LLMs) have garnered significant attention. Fuzz testing, a highly effective technique, plays a key role in enhancing software reliability and detecting vulnerabilities. However, traditional fuzz testing tools rely on manually crafted fuzz drivers, which can limit both testing efficiency and effectiveness. To address this challenge, we propose an automated fuzz testing method driven by a code knowledge graph and powered by an LLM-based intelligent agent system, referred to as CKGFuzzer. We approach fuzz driver creation as a code generation task, leveraging the knowledge graph of the code repository to automate the generation process within the fuzzing loop, while continuously refining both the fuzz driver and input seeds. The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity, such as a function or a file. The knowledge graph-enhanced CKGFuzzer not only effectively resolves compilation errors in fuzz drivers and generates input seeds tailored to specific API usage scenarios, but also analyzes fuzz driver crash reports, assisting developers in improving code quality. By querying the knowledge graph of the code repository and learning from API usage scenarios, we can better identify testing targets and understand the specific purpose of each fuzz driver. We evaluated our approach using eight open-source software projects. The experimental results indicate that CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques. Additionally, CKGFuzzer reduced the manual review workload in crash case analysis by 84.4% and successfully detected 11 real bugs (including nine previously unreported bugs) across the tested libraries.

Details

1009240
Title
CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Dec 20, 2024
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-12-23
Milestone dates
2024-11-18 (Submission v1); 2024-12-19 (Submission v2); 2024-12-20 (Submission v3)
Publication history
 
 
   First posting date
23 Dec 2024
ProQuest document ID
3130503919
Document URL
https://www.proquest.com/working-papers/ckgfuzzer-llm-based-fuzz-driver-generation/docview/3130503919/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-12-24
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic