Content area
Fuzz testing, or fuzzing, is a practical software testing technique that can ameliorate the risk of developers missing critical software failures or vulnerabilities (bugs) by systematically producing random, unexpected inputs to a program. Fuzzing has successfully found thousands of critical vulnerabilities across a myriad of software projects. Additionally, it has seen widespread adoption in both industry and academia, as there are hundreds of fuzzers that have been published since the early 1990's.
Despite widespread adoption and success, fuzzers still miss critical software failures and vulnerabilities such as the “Heartbleed” vulnerability. Determining the root cause of this is difficult, as evaluation is a major challenge of fuzzer development. Best practices for evaluation are resource-intensive, time consuming and impractical for the average developer, as they often require decades of CPU time. There are a multitude of design decisions that go into the development of a fuzzer, and rigorously evaluating all of these nuanced design decisions is hard, yet necessary, as the stochastic nature of fuzzers can make the interactions between individual design decisions difficult to predict. As a result, many of these design decisions within popular fuzzers are left unevaluated, as long as the performance of the fuzzer is anecdotally “good enough.”
This dissertation addresses the problem of fuzzers frequently missing critical software failures by focusing on the nuanced design decisions within the mutation stage of the fuzzing algorithm. First, this dissertation illustrates the importance of understanding the design decisions within the mutation stage of a fuzzer by conducting an evaluation of several nuanced design decisions within today's most widely-used fuzzer, AFL++.
Using the insights gained from evaluating the mutation stage of a popular fuzzer, this dissertation proposes a new grey-box fuzzer, CONFETTI, which uses a novel mechanism to significantly increase the code coverage and bug-finding ability of a state-of-the-art fuzzer. The dissertation evaluates CONFETTI using state-of-the-art fuzzer evaluation practices. Finally, the dissertation proposes ways of leveraging state-of-the-art evaluation techniques, novel statistical methods and stopping criteria for fuzzer developers to conduct fuzzer evaluation in a more efficient, practical manner, while achieving statistically sound conclusions about a fuzzer's ability to find bugs and achieve code coverage.
The contributions of this dissertation include: the first large scale evaluation of today's most popular fuzzer, AFL++'s, mutation stage, along with a published artifact enabling reproducibility of experiments; an open-source implementation of CONFETTI, a concolic-guided fuzzer for JVM applications that was able to find 15 new bugs in several software projects, along with a published artifact; and, the first study leveraging a novel fuzzer stopping criterion based on the results of machine learning to make fuzzer evaluation more practical. These contributions highlight the effectiveness of understanding design decisions in the mutation stage to pinpoint why fuzzers miss critical software failures, as well as how altering the mutation strategy of a fuzzer can significantly affect a fuzzers ability to find software failures. Furthermore, this dissertation's study on stopping criteria in fuzzer evaluation proposes ways to make the science of fuzzer evaluation more practical for everyday developers, thereby increasing the bug-finding ability of fuzzers and the quality of software as a whole.