Use of Large Language Models as Tax Guide: Case

Abstract

This thesis examines the accuracy and usefulness of large language models (LLMs) as intelligent tax advisors for individual tax preparation in the United States. The study assesses the performance of LLMs - OpenAI, Anthropic, and Deepseek - particularly when employing a Retrieval-Augmented Generation (RAG) model that enhances responses with reputable tax sources. The study focuses on shared tax-related inquiries about income reporting, credits, deductions, and special tax treatment according to IRS Form 1040.

GEval test framework, offering standardized measures to test factual accuracy of generated answers, was used to verify the accuracy of the model. Additionally, VITA tax preparers from Southeastern Louisiana University offered qualitative feedback. For verification of the usability of the models, their readability, and their potential integration into tax aid services for taxpayers, the practitioners tested them based on actual tax situations.

The results show that although augmented LLMs with retrieval capabilities can generate responses that are, overall, accurate and informative, there remain certain boundaries, especially in border situations and intricate filing scenarios. The study also highlights the potential of RAG-based LLMs to support taxpayer education and tax preparers, highlighting the need for more validation, regulation, and prudent deployment in public tax assistance programs.

Details

Title

Use of Large Language Models as Tax Guide: Case Study Using RAG Technique

Author

Paudel, Swastika

Publication year

2025

Publisher

ProQuest Dissertations & Theses

ISBN

9798314857830

Source type

Dissertation or Thesis

Language of publication

English

ProQuest document ID

3201107926

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Use of Large Language Models as Tax Guide: Case Study Using RAG Technique

Content area

Abstract

Details

Suggested sources