Abstract

Code reuse through cloning is common in software development, yet excessive or unchecked cloning can harm maintainability and raise plagiarism concerns. Detecting the proportion of reused (cloned) code in a software project, especially across different programming languages, is a challenging task. This paper defines code reuse proportion detection as measuring how much code in a target program is cloned (identical or similar) from elsewhere. Existing code clone detection techniques perform well in single-language settings but struggle with cross-language clones and do not directly quantify reuse proportion. To address these gaps, we propose a novel cross-language function-level code clone detection approach using a dual embedding Siamese neural network. Our method represents code in Java and Python using a unified abstract syntax structure and semantic embeddings, then uses a Siamese deep network to learn language-agnostic similarities. We also introduce a metric to quantify the clone-based reuse ratio for each function or program. Experiments on three public datasets (including a Java clone benchmark, a Python code clone corpus, and a cross-language Java–Python clone dataset) show that our approach outperforms ten baseline methods, including state-of-the-art and classical clone detectors. Ablation studies confirm the contribution of each component (structural embeddings, cross-language alignment, and contrastive learning) to performance gains. Our model achieves new state-of-the-art accuracy in code clone detection, enabling precise measurement of code reuse. These results demonstrate that the proposed approach can effectively detect cross-language code clones and quantify reuse proportion, benefiting software plagiarism detection and code quality assessment in multi-language projects.

Details

Title
Quantifying cross-language code reuse via function-level clone detection
Author
Rong, Yi 1 ; Zhou, Yan 2 

 The University of New South Wales, School of Education, New South Wales, Australia (GRID:grid.1005.4) (ISNI:0000 0004 4902 0432) 
 South China Agricultural University, College of Mathematics and Informatics, Guangdong, China (GRID:grid.20561.30) (ISNI:0000 0000 9546 5767) 
Pages
327
Publication year
2025
Publication date
Dec 2025
Publisher
Springer Nature B.V.
e-ISSN
13191578
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3274025682
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.