Content area

Abstract

Fuzzing is an important technique for generating inputs, and grammar-based fuzzing is used for constraining input generation for specialized domains that use context-free grammars. However, additional semantic constraints that are context-sensitive cannot be easily handled by grammar-based fuzzers. For example, in the C grammar, there is no way to specify that all variables must be defined before they are used.

Inspired by attribute grammars, we propose a lightweight DSL called PROGRMR that can extend the expressiveness of a grammar using programmable annotations. These annotations introduce concepts from imperative programming languages, such as program state, preconditions, and postconditions, that allow users to constrain generation based on context. This enables PROGRMR to constrain future expansions of the derivation tree using context from what has been expanded so far and provides developers with a familiar interface for writing semantic constraints. It can then be compiled into a custom input generator capable of producing well-formed and diverse inputs efficiently.

We evaluated PROGRMR against the grammar-only generator Grammarinator and the SMT-based constrained input generator ISLa, and showed that PROGRMR is able to compactly express semantic constraints and achieves high throughput and diversity. Across five input domains of Scriptsize-C, CSV, MLIR, Restructured Text, and XML that contain semantic constraints beyond the expressibility of context-free grammars, PROGRMR requires an average of 22.2 annotations to encode all semantic constraints.

It generates fully well-formed inputs for all domains, with high throughput and input diversity. Compared to ISLa, PROGRMR achieves significant improvements, with up to 4016.60× higher throughput on the CSV domain and 29.24× more diversity on the Scriptsize-C domain.

Details

1010268
Title
PROGRMR: Grammar-Based Input Generation with Programmable Annotations
Number of pages
61
Publication year
2025
Degree date
2025
School code
0031
Source
MAI 87/6(E), Masters Abstracts International
ISBN
9798265451545
Advisor
Committee member
Van den Broeck, Guy; Palsberg, Jens
University/institution
University of California, Los Angeles
Department
Computer Science 0201
University location
United States -- California
Degree
M.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32286001
ProQuest document ID
3278434895
Document URL
https://www.proquest.com/dissertations-theses/progrmr-grammar-based-input-generation-with/docview/3278434895/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic