Content area

Abstract

The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT‐4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT‐4, employing the optimal prompt strategy, outperforms 85 percent of human participants in a competitive environment, many of whom are students and professionals with moderate programming experience. GPT‐4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT‐4 is comparable to that of human programmers. GPT‐4 is also capable of handling broader programming tasks, including front‐end design and database operations. These results suggest that GPT‐4 has the potential to serve as a reliable assistant in programming code generation and software development. A programming assistant is designed based on an optimal prompt strategy to facilitate the practical use of LLMs for programming.

Details

1009240
Company / organization
Title
Comparing Large Language Models and Human Programmers for Generating Programming Code
Author
Hou, Wenpin 1 ; Ji, Zhicheng 2   VIAFID ORCID Logo 

 Department of Biostatistics, Mailman School of Public Health, Columbia University, New York City, NY, USA 
 Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA 
Publication title
Volume
12
Issue
8
Publication year
2025
Publication date
Feb 1, 2025
Section
Research Article
Publisher
John Wiley & Sons, Inc.
Place of publication
Weinheim
Country of publication
United States
Publication subject
e-ISSN
21983844
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2024-12-30
Milestone dates
2024-12-11 (manuscriptRevised); 2025-02-24 (publishedOnlineFinalForm); 2024-10-02 (manuscriptReceived); 2024-12-30 (publishedOnlineEarlyUnpaginated)
Publication history
 
 
   First posting date
30 Dec 2024
ProQuest document ID
3170006747
Document URL
https://www.proquest.com/scholarly-journals/comparing-large-language-models-human-programmers/docview/3170006747/se-2?accountid=208611
Copyright
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-05-16
Database
ProQuest One Academic