Content area

Abstract

Precisely evaluating text similarity remains a fundamental challenge in Natural Language Processing (NLP), with widespread applications in plagiarism detection, information retrieval, semantic analysis, and recommendation systems. Traditional approaches often suffer from overfitting, local optima stagnation, and difficulty capturing deep semantic relationships. To address these challenges, this paper introduces an Intelligent Text Similarity Assessment Model that integrates Robustly Optimized Bidirectional Encoder Representations from Transformers (RoBERTa) with Chaotic Sand Cat Swarm Optimization (CHSCSO), a novel swarm intelligence-based optimization method inspired by chaotic dynamics. The model leverages RoBERTa’s robust contextual embeddings to extract deep semantic representations while utilizing CHSCSO’s controlled chaotic perturbations to optimize hyperparameters dynamically. This integration enhances model generalization, mitigates overfitting, and improves the trade-off between exploration and exploitation during training. CHSCSO refines the parameter search space by employing chaotic maps, ensuring a more adaptive and efficient training process. Extensive experiments on multiple benchmark datasets, including Semantic Textual Similarity (STS) and Textual Entailment (TE), demonstrate the model’s superiority over standard RoBERTa fine-tuning and conventional baselines that reach cosine similarity scores that are clustered at 0.996. The optimized model achieves higher accuracy and improved stability and exhibits faster convergence in text similarity tasks.

Details

1009240
Business indexing term
Title
Intelligent text similarity assessment using Roberta with integrated chaotic perturbation optimization techniques
Author
Hassan, Esraa 1 ; Talaat, Amira Samy 2 ; Elsabagh, M. A. 1 

 Kafrelsheikh University, Department of Machine Learning and Information Retrieval, Faculty of Artificial Intelligence, Kafrelsheikh, Egypt (GRID:grid.411978.2) (ISNI:0000 0004 0578 3577) 
 Electronics Research Institute, Computers and Systems Department, Cairo, Egypt (GRID:grid.463242.5) (ISNI:0000 0004 0387 2680) 
Publication title
Volume
12
Issue
1
Pages
164
Publication year
2025
Publication date
Jul 2025
Publisher
Springer Nature B.V.
Place of publication
Heidelberg
Country of publication
Netherlands
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-07-11
Milestone dates
2025-06-30 (Registration); 2025-02-22 (Received); 2025-06-30 (Accepted)
Publication history
 
 
   First posting date
11 Jul 2025
ProQuest document ID
3229357039
Document URL
https://www.proquest.com/scholarly-journals/intelligent-text-similarity-assessment-using/docview/3229357039/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-14
Database
ProQuest One Academic