Abstract

In recent years, rapid advancements in large language models (LLMs) have steadily shifted their applications from simple chatbots to increasingly complex, autonomous agents. Agentic applications require LLMs to interact with a broad range of external information sources, tools, and environments to solve intricate tasks with minimal human oversight—posing significant challenges to their reliability. This dissertation presents a series of contributions toward (more) reliable agentic LLMs.

Firstly, we explore how LLMs can be made more robust when incorporating external references—an essential capability for many agentic applications. We introduce chain-of-defensive-thought, a simple yet effective technique that instructs LLMs to generate a chain of thought mimicking a structured reasoning process of cross-checking. This highly accessible approach significantly improves the robustness of a wide range of LLMs against reference corruption. Importantly, it highlights a promising direction: exploiting the reasoning abilities of LLMs for robustness on tasks that are not necessarily reasoning-centric, which is a timely insight given the growing interest in LLM reasoning and the increasing reliability demands of agentic applications.

Secondly, we examine the reliability of tool use in agentic LLMs. While external tools can dramatically extend the capabilities of LLMs, the current paradigm—where models choose tools based solely on text descriptions—proves fragile. We demonstrate how strategic edits to tool descriptions can substantially bias tool usage, revealing a vulnerability in standard tool/function-calling protocols. These findings underscore the need for a grounded mechanism for agentic LLMs to select and utilize tools and resources.

Finally, we address the reliability of LLM evaluations, particularly in the presence of test set contamination, where models may (knowingly or not) train on test data prior to evaluation. We propose DyePack, a novel framework that repurposes backdoor techniques into a principled mechanism for identifying such contamination. DyePack operates without requiring access to model internals and supports both multiple-choice and open-ended tasks. More importantly, it provides provable guarantees by enabling exact false positive rate (FPR) computation before flagging any model as contaminated—effectively preventing false accusations while offering strong evidence for every case detected. This positions DyePack as a powerful tool for maintaining the integrity of open benchmarks and safeguarding our pathway toward reliable agentic LLMs.

Details

Title
Towards Reliable Agentic LLMs
Author
Wang, Wenxiao
Publication year
2025
Publisher
ProQuest Dissertations & Theses
ISBN
9798293837601
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3250301036
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.