Content area

Abstract

Background:ChatGPT, a conversational artificial intelligence developed by OpenAI, has rapidly become an invaluable tool for researchers. With the recent integration of Python code interpretation into the ChatGPT environment, there has been a significant increase in the potential utility of ChatGPT as a research tool, particularly in terms of data analysis applications.

Objective:This study aimed to assess ChatGPT as a data analysis tool and provide researchers with a framework for applying ChatGPT to data management tasks, descriptive statistics, and inferential statistics.

Methods:A subset of the National Inpatient Sample was extracted. Data analysis trials were divided into data processing, categorization, and tabulation, as well as descriptive and inferential statistics. For data processing, categorization, and tabulation assessments, ChatGPT was prompted to reclassify variables, subset variables, and present data, respectively. Descriptive statistics assessments included mean, SD, median, and IQR calculations. Inferential statistics assessments were conducted at varying levels of prompt specificity (“Basic,” “Intermediate,” and “Advanced”). Specific tests included chi-square, Pearson correlation, independent 2-sample t test, 1-way ANOVA, Fisher exact, Spearman correlation, Mann-Whitney U test, and Kruskal-Wallis H test. Outcomes from consecutive prompt-based trials were assessed against expected statistical values calculated in Python (Python Software Foundation), SAS (SAS Institute), and RStudio (Posit PBC).

Results:ChatGPT accurately performed data processing, categorization, and tabulation across all trials. For descriptive statistics, it provided accurate means, SDs, medians, and IQRs across all trials. Inferential statistics accuracy against expected statistical values varied with prompt specificity: 32.5% accuracy for “Basic” prompts, 81.3% for “Intermediate” prompts, and 92.5% for “Advanced” prompts.

Conclusions:ChatGPT shows promise as a tool for exploratory data analysis, particularly for researchers with some statistical knowledge and limited programming expertise. However, its application requires careful prompt construction and human oversight to ensure accuracy. As a supplementary tool, ChatGPT can enhance data analysis efficiency and broaden research accessibility.

Details

1009240
Business indexing term
Title
ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research
Publication title
Volume
27
First page
e63550
Publication year
2025
Publication date
2025
Section
Chatbots and Conversational Agents
Publisher
Gunther Eysenbach MD MPH, Associate Professor
Place of publication
Toronto
Country of publication
Canada
e-ISSN
1438-8871
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-02-07
Milestone dates
2024-06-23 (Preprint first published); 2024-06-23 (Submitted); 2024-10-19 (Revised version received); 2024-11-24 (Accepted); 2025-02-07 (Published)
Publication history
 
 
   First posting date
07 Feb 2025
ProQuest document ID
3222367933
Document URL
https://www.proquest.com/scholarly-journals/chatgpt-univariate-statistics-validation-ai/docview/3222367933/se-2?accountid=208611
Copyright
© 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-07
Database
ProQuest One Academic