Content area

Abstract

Programming languages, much like natural languages, exhibit a high degree of repetitiveness and regularity, often referred to as the naturalness of software. This characteristic, combined with the improved capabilities of neural language models (NLMs) to statistically learn from such patterns, has led to their widespread adoption in software engineering (SE) tasks ranging from code generation to automated bug detection and program repair. While these applications of automated software engineering offer a useful proxy for assessing the downstream performance of NLMs, their ability to reason about intrinsic program properties, such as structure, semantics, and execution behaviors, remains underexplored. 

This dissertation addresses this gap through the lens of program analysis, using the latter’s formalisms to probe the reasoning capabilities of NLMs over intrinsic program behaviors. In general, analyzing programs entails either examining all possible behaviors based on program semantics (i.e., static) or establishing precise execution behaviors by running the entire test suite (i.e., dynamic), each with trade-offs in generalizability and scalability. As an alternative, we introduce a new paradigm of predictive program analysis, which aims to learn to analyze program behaviors from similar analyses of open-source software repositories. This approximation helps extend such analyses to partial programs, enables a static estimation of runtime behaviors, and facilitates multilingual program analysis, all at scale. Using dependence analysis as a representative setting, this dissertation investigates how NLMs can model program structure, semantics, and execution behaviors across three key dimensions: (i) the granularity of dependencies, ranging from inter-statement and variable-statement to inter-constraint dependencies; (ii) nature of reasoning, spanning both static and dynamic program behaviors; and (iii) reasoning modality, which involves reasoning in the latent space or through verbalized natural language explanations. Overall, these contributions show that predictive analysis can generalize, bridging the gap between static and dynamic analysis, while offering insights into how language models internalize reasoning about program behaviors.

Details

1010268
Title
Neural Modeling of Reasoning About Program Behaviors
Number of pages
185
Publication year
2025
Degree date
2025
School code
0382
Source
DAI-A 87/6(E), Dissertation Abstracts International
ISBN
9798265455963
Committee member
Ray, Baishakhi; Yang, Wei; Wei, Shiyi
University/institution
The University of Texas at Dallas
Department
Computer Science
University location
United States -- Texas
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32435089
ProQuest document ID
3279300678
Document URL
https://www.proquest.com/dissertations-theses/neural-modeling-reasoning-about-program-behaviors/docview/3279300678/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic