Content area

Abstract

With the rapid development of large language models (LLMs), these models have demonstrated impressive capabilities not only in formal reasoning tasks but also certain desirable behaviors similar to human thinking. The emergence of these cognitive-like patterns motivated me to leverage insights from cognitive science to better understand the remaining challenges and explore the boundaries of the capabilities of LLMs. In this thesis, I mainly focus on the challenges that LLMs face when engaged in autonomous scientific processes. Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision to become reality is evaluating how well an AI model can identify the underlying structure of a system from its behavior. In this thesis, we explore the question of whether an LLM can learn from passive observations and actively collect informative data to refine its own hypotheses. To answer this question, we investigate the ability of LLMs to reverse-engineer three types of black-box systems chosen to represent problems that might appear in different domains of research: list mapping programs, formal languages, and mathematical equations. We use Bayesian models as a normative reference to quantify the gap between LLMs and optimal inference under a given observation space. Through extensive experiments, we show that while LLMs have difficulty reverse-engineering these systems from observations alone, data generated by LLM-driven interventions can effectively improve the models' own performance. By testing edge cases, the LLM is able to refine its own hypotheses and avoid failure modes such as overcomplication, where the LLM falsely assumes prior knowledge about the black box, and overlooking, where the LLM fails to incorporate observations. These insights provide practical guidance for helping LLMs more effectively reverse-engineer black-box systems, supporting their use in making new discoveries.

Details

1010268
Title
Understanding the Reverse Engineering Abilities of Large Language Models
Author
Number of pages
59
Publication year
2025
Degree date
2025
School code
0181
Source
MAI 86/12(E), Masters Abstracts International
ISBN
9798280750371
University/institution
Princeton University
Department
Computer Science
University location
United States -- New Jersey
Degree
M.S.E.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32038351
ProQuest document ID
3217055681
Document URL
https://www.proquest.com/dissertations-theses/understanding-reverse-engineering-abilities-large/docview/3217055681/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic