Quantifying cross-language code reuse via

Abstract

Code reuse through cloning is common in software development, yet excessive or unchecked cloning can harm maintainability and raise plagiarism concerns. Detecting the proportion of reused (cloned) code in a software project, especially across different programming languages, is a challenging task. This paper defines code reuse proportion detection as measuring how much code in a target program is cloned (identical or similar) from elsewhere. Existing code clone detection techniques perform well in single-language settings but struggle with cross-language clones and do not directly quantify reuse proportion. To address these gaps, we propose a novel cross-language function-level code clone detection approach using a dual embedding Siamese neural network. Our method represents code in Java and Python using a unified abstract syntax structure and semantic embeddings, then uses a Siamese deep network to learn language-agnostic similarities. We also introduce a metric to quantify the clone-based reuse ratio for each function or program. Experiments on three public datasets (including a Java clone benchmark, a Python code clone corpus, and a cross-language Java–Python clone dataset) show that our approach outperforms ten baseline methods, including state-of-the-art and classical clone detectors. Ablation studies confirm the contribution of each component (structural embeddings, cross-language alignment, and contrastive learning) to performance gains. Our model achieves new state-of-the-art accuracy in code clone detection, enabling precise measurement of code reuse. These results demonstrate that the proposed approach can effectively detect cross-language code clones and quantify reuse proportion, benefiting software plagiarism detection and code quality assessment in multi-language projects.

Details

Title

Quantifying cross-language code reuse via function-level clone detection

Author

Rong, Yi¹; Zhou, Yan²

¹ The University of New South Wales, School of Education, New South Wales, Australia (GRID:grid.1005.4) (ISNI:0000 0004 4902 0432)
² South China Agricultural University, College of Mathematics and Informatics, Guangdong, China (GRID:grid.20561.30) (ISNI:0000 0000 9546 5767)

Pages

327

Publication year

2025

Publication date

Dec 2025

Publisher

Springer Nature B.V.

e-ISSN

13191578

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1007/s44443-025-00362-2

ProQuest document ID

3274025682

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Quantifying cross-language code reuse via function-level clone detection

Jump to:

Abstract

Details

Full text options

Suggested sources