Content area

Abstract

The emergence of large language models (LLMs) has revolutionized the trajectory of NLP research. Transformers, combined with attention mechanisms, have increased computational power, and massive datasets have led to the emergence of pre-trained large language models (PLLMs), which offer promising possibilities for multilingual applications in low-resource settings. However, the scarcity of annotated resources and suitably pre-trained models continues to pose a significant hurdle for the low-resource abstractive text summarization of legal texts, particularly in Urdu. This study presents a transfer learning approach using pre-trained multilingual large models (the mBART and mT5, Small, Base, and Large) to generate abstractive summaries of Urdu legal texts. A curated dataset was developed with legal experts, who produced ground-truth summaries. The models were fine-tuned on this domain-specific corpus to adapt them for low-resource legal summarization. The experimental results demonstrated that the mT5-Large, fine-tuned on Urdu legal texts, outperforms all other evaluated models across standard summarization metrics, achieving ROUGE-1 scores of 0.7889, ROUGE-2 scores of 0.5961, and ROUGE-L scores of 0.7813. This indicates its strong capacity to generate fluent, coherent, and legally accurate summaries. The mT5-Base model closely follows with ROUGE-1 = 0.7774, while the mT5-Small shows moderate performance (ROUGE-1 = 0.6406), with reduced fidelity in capturing legal structure. The mBART50 model, despite being fine-tuned on the same legal corpus, performs lower (ROUGE-1 = 0.5914), revealing its relative limitations in this domain. Notably, models trained or fine-tuned on non-legal, out-of-domain data, such as the urT5 (ROUGE-1 = 0.3912), the mT5-XLSUM (ROUGE-1 = 0.0582), and the mBART50 (XLSUM) (ROUGE-1 = 0.0545), exhibit poor generalization to legal summaries, underscoring the necessity of domain adaptation when working in low-resource legal contexts. These findings highlight the effectiveness of fine-tuning multilingual LLMs for domain-specific tasks. The gains in legal summarization demonstrate the practical value of transfer learning in low-resource settings and the broader potential of AI-driven tools for legal document processing, information retrieval, and decision support.

Details

1009240
Title
Transformer-Based Abstractive Summarization of Legal Texts in Low-Resource Languages
Author
Masih Salman 1   VIAFID ORCID Logo  ; Hassan Mehdi 1 ; Gillani, Fahad Labiba 2 ; Hassan Bilal 3   VIAFID ORCID Logo 

 Department of Computer Science, Faculty of Computing and Artificial Intelligence (FCAI), Air University Sector E-9, Islamabad 44000, [email protected] (M.H.) 
 Department of Computer Science, National University of Computer & Emerging Sciences, Islamabad 44000, Pakistan; [email protected] 
 Faculty of Engineering & Environment, Northumbria University, London Campus, London E1 7HT, UK 
Publication title
Volume
14
Issue
12
First page
2320
Number of pages
22
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-06-06
Milestone dates
2025-04-23 (Received); 2025-05-28 (Accepted)
Publication history
 
 
   First posting date
06 Jun 2025
ProQuest document ID
3223907975
Document URL
https://www.proquest.com/scholarly-journals/transformer-based-abstractive-summarization-legal/docview/3223907975/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-06-25
Database
ProQuest One Academic