Content area

Abstract

Context: Source code translation enables cross-platform compatibility, code reusability, legacy system migration, and developer collaboration. Numerous state-of-the-art techniques have emerged to address demand for efficient and accurate translation methodologies.

Objective: This study compares code translation capabilities of Large Language Models (LLMs), specifically DeepSeek R1 and ChatGPT 4.1, evaluating their proficiency in translating code between programming languages. We systematically assess model outputs through quantitative and qualitative measures, focusing on translation accuracy, execution efficiency, and coding standard conformity. By examining each model’s strengths and limitations, this work provides insights into their applicability for various translation scenarios and contributes to discourse on LLM potential in software engineering.

Method: We evaluated translation quality from ChatGPT 4.1 and DeepSeek R1 using SonarQube Analyzer to identify strengths and weaknesses through comprehensive software metrics including translation accuracy, code quality, and clean code attributes. SonarQube’s framework enables objective quantification of maintainability, reliability, technical debt, and code smells which are critical factors in software quality measurement. The protocol involved randomly sampling 500 code instances from 1695 Java programming problems. Java samples were translated to Python by both models, then analysed quantitatively using SonarQube metrics to evaluate adherence to software engineering best practices.

Results: This comparative analysis reveals capabilities and limitations of state-of-the-art LLM-based translation systems, providing developers, researchers, and practitioners actionable guidance for model selection. Identified gaps highlight future research directions in automated code translation. Result s demonstrate that DeepSeek R1 consistently generates superior software quality compared to ChatGPT 4.1 across Sonar-Qube metrics.

Details

Title
Analysing Software Quality of AI-Translated Code: A Comparative Study of Large Language Models Using Static Analysis
Author
Bhutani Vikram 1   VIAFID ORCID Logo  ; Toosi Farshad Ghassemi 1   VIAFID ORCID Logo  ; Buckley, Jim 2   VIAFID ORCID Logo 

 1,2 Computer Science Department , Munster Technological University , Cork , Ireland 
 Computer Science Department , University of Limerick , Limerick , Ireland 
Pages
105-121
Publication year
2025
Publication date
2025
Publisher
De Gruyter Brill Sp. z o.o., Paradigm Publishing Services
ISSN
22558683
e-ISSN
22558691
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3264376368
Copyright
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0 (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.