Content area
The proposed hybrid AI-driven translation system’s architecture integrates phrase-based machine translation (PBMT) and neural machine translation (NMT) within a recursive learning framework. It provides a blueprint for institutions that digitize, translate, or teach under-resourced languages. Due to its ability to adapt to multilingual inputs and preserve cultural expressions, it is highly suitable for applications in education, community media, cultural preservation, and government-supported language revitalization initiatives. This study presents a hybrid artificial intelligence model designed to enhance translation quality for low-resource languages, specifically targeting the Hakka language. The proposed model integrates phrase-based machine translation (PBMT) and neural machine translation (NMT) within a recursive learning framework. The methodology consists of three key stages: (1) initial translation using PBMT, where Hakka corpus data is structured into a parallel dataset; (2) NMT training with Transformers, leveraging the generated parallel corpus to train deep learning models; and (3) recursive translation refinement, where iterative translations further enhance model accuracy by expanding the training dataset. This study employs preprocessing techniques to clean and optimize the dataset, reducing noise and improving sentence segmentation. A BLEU score evaluation is conducted to compare the effectiveness of PBMT and NMT across various corpus sizes, demonstrating that while PBMT performs well with limited data, the Transformer-based NMT achieves superior results as training data increases. The findings highlight the advantages of a hybrid approach in overcoming data scarcity challenges for minority languages. This research contributes to machine translation methodologies by proposing a scalable framework for improving linguistic accessibility in under-resourced languages.
Details
Language revitalization;
Hakka Chinese;
Grammatical aspect;
Artificial intelligence;
Recursion;
Deep learning;
Minority languages;
Communication;
Minority & ethnic groups;
Parallel corpora;
Corpus linguistics;
Machine translation;
Neural networks;
Linguistics;
Natural language processing;
Literature reviews;
Multiculturalism & pluralism;
Segmentation;
Dialects;
Translation methods and strategies;
Cultural heritage;
Training;
Scarcity;
Cultural maintenance;
Access;
Learning;
Data;
Frame analysis;
Languages;
Preservation;
Translation;
Public policy
; Yu-Hsun, Lin 2
; Yun-Hsiang, Hsu 1 ; I-Hsin, Fan 3 1 Department of Culture Creativity and Digital Marketing, College of Hakka Studies, National United University, Miaoli 36063, Taiwan; [email protected] (C.-C.C.); [email protected] (Y.-H.H.)
2 Department of Business and Management, College of Management and Design, Ming Chi University of Technology, New Taipei 243303, Taiwan
3 Department of Cultural Tourism, College of Hakka Studies, National United University, Miaoli 36063, Taiwan; [email protected]