Content area

Abstract

This study conducts a comprehensive comparative analysis of six contemporary artificial intelligence models for Python code generation using the HumanEval benchmark. The evaluated models include GPT-3.5 Turbo, GPT-4 Omni, Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude Sonnet 4, and Claude Opus 4. A total of 164 Python programming problems were utilized to assess model performance through a multi-faceted methodology incorporating automated functional correctness evaluation via the Pass@1 metric, cyclomatic complexity analysis, maintainability index calculations, and lines-of-code assessment. The results indicate that Claude Sonnet 4 achieved the highest performance with a success rate of 95.1%, followed closely by Claude Opus 4 at 94.5%. Across all metrics, models developed by Anthropic Claude consistently outperformed those developed by OpenAI GPT by margins exceeding 20%. Statistical analysis further confirmed the existence of significant differences between the model families (p < 0.001). Anthropic Claude models were observed to generate more sophisticated and maintainable solutions with superior syntactic accuracy. In contrast, OpenAI GPT models tended to adopt simpler strategies but exhibited notable limitations in terms of reliability. These findings offer evidence-based insights to guide the selection of AI-powered coding assistants in professional software development contexts.

Details

1009240
Business indexing term
Company / organization
Title
Comparative Analysis of AI Models for Python Code Generation: A HumanEval Benchmark Study
Author
Bayram, Ali 1   VIAFID ORCID Logo  ; Menekse Dalveren Gonca Gokce 1   VIAFID ORCID Logo  ; Derawi Mohammad 2 

 Department of Computer Engineering, Izmir Bakircay University, 35665 Izmir, Turkey; [email protected] (A.B.); [email protected] (G.G.M.D.) 
 Department of Electronic Systems, Norwegian University of Science and Technology, 2815 Gjøvik, Norway 
Publication title
Volume
15
Issue
18
First page
9907
Number of pages
18
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-09-10
Milestone dates
2025-08-25 (Received); 2025-09-08 (Accepted)
Publication history
 
 
   First posting date
10 Sep 2025
ProQuest document ID
3254469735
Document URL
https://www.proquest.com/scholarly-journals/comparative-analysis-ai-models-python-code/docview/3254469735/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-09-26
Database
ProQuest One Academic