Content area
The automation of educational and instructional assessment plays a crucial role in enhancing the quality of teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to the intelligent grading of tests. This study explores the automatic grading of physics problems through a combination of large language models and prompt engineering. By comparing the performance of four prompt strategies (one-shot, few-shot, chain of thought, tree of thought) within two large model frameworks, namely ERNIEBot-4-turbo and GPT-4o. This study finds that the tree of thought prompt can better assess calculation problems with complex ideas (N = 100, ACC ≥ 0.9, kappa > 0.8) and reduce the performance gap between different models. This research provides valuable insights for the automation of assessments in physics education.
Details
Chinese;
Course Selection (Students);
Learning Strategies;
Active Learning;
Computer Science Education;
Medical Education;
Measurement Techniques;
Engineering;
Grading;
Licensing Examinations (Professions);
Comprehension;
Program Evaluation;
Governance;
Feedback (Response);
Accuracy;
Intelligence;
Educational Objectives;
English Language Learners;
Educational Administration;
Science Curriculum;
Learner Engagement;
Physics;
Graphs;
Programming
; Zhang, Jianwei 2 ; Qi, Dizhi 2 ; Cui, Wenqian 2 1 Guohao College, Tongji University, Shanghai 200082, China;
2 School of Physics Science and Engineering, Tongji University, Shanghai 200082, China;