Content area
This study examines the efficacy of artificial intelligence (AI) in creating parallel test items compared to human-made ones. Two test forms were developed: one consisting of 20 existing human-made items and another with 20 new items generated with ChatGPT assistance. Expert reviews confirmed the content parallelism of the two test forms. Forty-three university students then completed the 40 test items presented randomly from both forms on a final test. Statistical analyses of student performance indicated comparability between the AI-human-made and human-made test forms. Despite limitations such as sample size and reliance on classical test theory (CTT), the findings suggest ChatGPT’s potential to assist teachers in test item creation, reducing workload and saving time. These results highlight ChatGPT’s value in educational assessment and emphasize the need for further research and development in this area.
Details
Test Theory;
Literature Reviews;
Item Banks;
Language Skills;
Test Format;
Computers;
English (Second Language);
Language Tests;
Test Items;
Teacher Made Tests;
Feedback (Response);
Natural Language Processing;
Psychometrics;
Artificial Intelligence;
Educational Assessment;
Comparative Education;
Comparative Analysis;
Language Processing;
Learner Engagement;
Logical Thinking
College students;
Artificial intelligence;
Language;
Students;
Verbal communication;
Quantitative psychology;
Classical test theory;
Cognitive models;
Listening;
Natural language processing;
Automation;
Research & development--R&D;
Feedback;
Learning;
Teachers;
Chatbots;
Cognition & reasoning;
Tests;
Linguistics education;
TESOL;
Comparative studies;
Efficacy;
Humans;
Comparative analysis
1 Dongduk Women’s University, Seoul, South Korea (GRID:grid.412059.b) (ISNI:0000 0004 0532 5816)