Content area

Abstract

As machine learning systems increasingly shape outcomes in high-stakes domains, the need to understand, trust, and effectively guide their decision-making grows urgent. This dissertation advances the field of machine learning explainability, offering a cohesive framework for enabling AI systems whose underlying reasoning is transparent, resilient, and actionable. By examining three critical frontiers—explainability amidst adversarial robustness, scalable rationale generation for large language models (LLMs), and decoding LLM behavior under iterative prompting—this work illuminates how explanations can inform, protect, and empower stakeholders.

The first part reveals how adversarial training, while bolstering model security, can inadvertently undermine the provision of meaningful, low-cost algorithmic recourse. This tension exposes trade-offs between securing decision boundaries and preserving explanations that help individuals improve their predicted outcomes. The second part introduces a novel approach to scaling explanations without human annotation, integrating post hoc attributions from smaller, more interpretable proxy models directly into LLM prompting. This not only reduces the need for manual rationales but also demonstrates that automatically generated explanations can actively guide complex models toward more coherent and well-founded reasoning.

The final part focuses on decoding LLM behavior through iterative prompting. While one might expect repeated user-model interactions to improve understanding and truthfulness, naïve iterative prompting can paradoxically degrade factual alignment and confidence calibration. By carefully analyzing how LLMs respond to iterative queries, the dissertation uncovers new insights into model tendencies, including over-apologizing and sycophantic patterns, and develops strategies to mitigate these issues. This examination shows that how we interact with models—how we request, refine, and interpret explanations—fundamentally shapes model reliability and clarity.

Collectively, these contributions emphasize that robust, scalable, and iteratively refined explanations are both feasible and vital. By reconciling adversarial defenses with user-friendly recourse, automating rationales for complex models, and decoding LLM behaviors through iterative engagement, the dissertation provides a principled path toward AI systems whose inner workings can be understood, trusted, and responsibly guided by human stakeholders.

Details

1010268
Business indexing term
Title
From Understanding to Improving Artificial Intelligence: New Frontiers in Machine Learning Explanations
Author
Number of pages
189
Publication year
2025
Degree date
2025
School code
0084
Source
DAI-B 86/8(E), Dissertation Abstracts International
ISBN
9798304962827
Committee member
Doshi-Velez, Finale; du Pin Calmon, Flavio
University/institution
Harvard University
Department
Engineering and Applied Sciences - Computer Science
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31631968
ProQuest document ID
3168214797
Document URL
https://www.proquest.com/dissertations-theses/understanding-improving-artificial-intelligence/docview/3168214797/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic