Content area
Timely and accurate access to financial data is crucial for empirical research in accounting and finance. However, current data collection processes are often manual, inconsistent, and difficult to scale. This study asks: How can large language models (LLMs) be effectively used to automate financial data collection? Using design science research methodology (DSRM), the author develops a modular architecture that integrates a real-time search API and auxiliary information processing into LLM workflows. The study applies the model to two tasks: extracting ESG report release dates and identifying customer firm tickers from COMPUSTAT. The system achieves 96% and 95% accuracy, respectively, comparable to human performance. This study advances LLM applications in accounting by providing a scalable, practical framework for automating financial data retrieval.
