Content area

Abstract

This thesis presents a test-driven framework for enhancing the reliability of code generated by Large Language Models (LLMs), focusing on real-world applicability and minimal developer assistance. The system is designed to simulate a realistic development environment where no ground-truth implementations are available to the model, relying exclusively on textual artifacts such as documentation, docstrings, and test outcomes. This constraint ensures that every generated function is derived from semantic understanding rather than replication or pattern-matching.

A core innovation of this work is the integration of an iterative refinement loop, which introduces structured feedback into the code generation process. After producing an initial function from a natural language prompt, the model’s output is immediately tested. If failures occur, relevant error signals are extracted and used to update the prompt, allowing the model to revise its solution. This loop continues until the implementation passes all associated tests or a retry limit is reached. The system thus mirrors a human-like workflow of test-driven development and debugging.

To assess the contribution of this iterative process, the same framework is also evaluated in a non-iterative configuration, where each function is generated only once based on its prompt and tested without revision.

The evaluation is conducted on entire Python repositories—not isolated functions—making the task significantly more complex. Functions are embedded in larger software structures, depend on shared state or class behavior, and are often indirectly tested through multi-layered scenarios. The system parses these repositories to extract structural metadata, resolve functionto-test mappings, and build context-aware prompts that support both initial generation and iterative correction.

The results demonstrate that embedding LLMs into a feedback-rich environment substantially increases their capacity to produce robust, test-passing code. Despite added computational cost, the iterative approach leads to higher success rates across a diverse range of codebases, showing that language models, when guided by empirical signals and properly contextualized, can evolve from static generators into adaptive agents capable of producing functionally correct and maintainable code.

Details

1010268
Title
TIGER: Testing and Improving Generated Code With LLMs
Number of pages
100
Publication year
2025
Degree date
2025
School code
0799
Source
MAI 87/5(E), Masters Abstracts International
ISBN
9798263306212
Committee member
Gjomemo, Rigel; Kanich, Chris; Scanzio, Stefano
University/institution
University of Illinois at Chicago
Department
Computer Science
University location
United States -- Illinois
Degree
M.S.Comp.Sci.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32409329
ProQuest document ID
3271768012
Document URL
https://www.proquest.com/dissertations-theses/tiger-testing-improving-generated-code-with-llms/docview/3271768012/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic