Content area

Abstract

This thesis examines the accuracy and usefulness of large language models (LLMs) as intelligent tax advisors for individual tax preparation in the United States. The study assesses the performance of LLMs - OpenAI, Anthropic, and Deepseek - particularly when employing a Retrieval-Augmented Generation (RAG) model that enhances responses with reputable tax sources. The study focuses on shared tax-related inquiries about income reporting, credits, deductions, and special tax treatment according to IRS Form 1040.

GEval test framework, offering standardized measures to test factual accuracy of generated answers, was used to verify the accuracy of the model. Additionally, VITA tax preparers from Southeastern Louisiana University offered qualitative feedback. For verification of the usability of the models, their readability, and their potential integration into tax aid services for taxpayers, the practitioners tested them based on actual tax situations.

The results show that although augmented LLMs with retrieval capabilities can generate responses that are, overall, accurate and informative, there remain certain boundaries, especially in border situations and intricate filing scenarios. The study also highlights the potential of RAG-based LLMs to support taxpayer education and tax preparers, highlighting the need for more validation, regulation, and prudent deployment in public tax assistance programs.

Details

Title
Use of Large Language Models as Tax Guide: Case Study Using RAG Technique
Author
Paudel, Swastika
Publication year
2025
Publisher
ProQuest Dissertations & Theses
ISBN
9798314857830
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3201107926
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.