Content area
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks. However, generating responses that accurately meet the requirements of execution-based scenarios remains challenging. This challenge stems fundamentally from the pretraining objectives of LLMs, where next-token prediction is not inherently execution-aware, making customization in the post-training stage essential. This thesis explores a comprehensive approach to customizing LLMs for execution-driven contexts through three interconnected works.
First, we investigate the potential of execution-derived feedback for enhancing LLM performance in specification-rich tasks. We propose a framework that synthesizes training data by combining natural language intents with specifications derived from execution. These specifications help to ground the model's understanding of task requirements, also addressing the ambiguity often present in natural language prompts. Our evaluations demonstrate substantial improvements in generating solutions that comply with execution specifications in data science notebook and tool automation.
As synthetic data generation scales, we observe diminishing returns when fine-tuning a single model. To address this limitation, our second work explores how to make full use of abundant synthetic data by training multiple specialized model variants. We use influence functions to efficiently group synthetic data, promoting diversity in LLM outputs while maintaining quality. Experiments showed that our approach can diversify foundation model responses while maintaining high quality in the code generation domain and several tasks in the natural language understanding domain.
Finally, we address the practical challenges of deploying these customized models with fLoRA (Fast Low-Rank Adaptation). fLoRA enables efficient, real-time serving of task-specific model variants through batched low-rank adaptation, making it computationally feasible to maintain and deploy multiple specialized models while ensuring that the most appropriate variant is selected for a given execution-based task. Together, these contributions form a holistic strategy for customizing LLMs in execution-driven scenarios, enhancing both their accuracy, diversity and efficiency across applications in code generation and tool automation.