Content area

Abstract

This study introduces a new methodology for an Inference Index (InI) called the Inference Index In Testing Model Effectiveness methodology (INFINITE), aiming to evaluate the performance of Large Language Models (LLMs) in code generation tasks. The InI index provides a comprehensive assessment focusing on three key components: efficiency, consistency, and accuracy. This approach encapsulates time-based efficiency, response quality, and the stability of model outputs, offering a thorough understanding of LLM performance beyond traditional accuracy metrics. We apply this methodology to compare OpenAI’s GPT-4o (GPT), OpenAI-o1 pro (OAI1), and OpenAI-o3 mini-high (OAI3) in generating Python code for two tasks: a data-cleaning and statistical computation task and a Long Short-Term Memory (LSTM) model generation task for forecasting meteorological variables such as temperature, relative humidity, and wind speed. Our findings demonstrate that GPT outperforms OAI1 and performs comparably to OAI3 regarding accuracy and workflow efficiency. The study reveals that LLM-assisted code generation can produce results similar to expert-designed models with effective prompting and refinement. GPT’s performance advantage highlights the benefits of widespread use and user feedback. These findings contribute to advancing AI-assisted software development, providing a structured approach for evaluating LLMs in coding tasks and setting the groundwork for future studies on broader model comparisons and expanded assessment frameworks.

Details

1009240
Business indexing term
Company / organization
Title
Evaluating Large Language Models in Code Generation: INFINITE Methodology for Defining the Inference Index
Publication title
Volume
15
Issue
7
First page
3784
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-03-30
Milestone dates
2025-02-17 (Received); 2025-03-27 (Accepted)
Publication history
 
 
   First posting date
30 Mar 2025
ProQuest document ID
3188782749
Document URL
https://www.proquest.com/scholarly-journals/evaluating-large-language-models-code-generation/docview/3188782749/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-04-11
Database
ProQuest One Academic