Content area

Abstract

Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions.

Details

1009240
Business indexing term
Title
Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions
Author
Zhang Lusheng 1 ; Wu, Shie 1 ; Wang Zhongxun 1 

 School of Physics and Electronic Information, Yantai University, Yantai 264005, China; [email protected] (L.Z.); [email protected] (S.W.), Shandong Data Open Innovation Application Laboratory of Smart Grid Advanced Technology, Yantai University, Yantai 264005, China 
Publication title
Symmetry; Basel
Volume
17
Issue
9
First page
1478
Number of pages
19
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20738994
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-09-08
Milestone dates
2025-07-22 (Received); 2025-09-02 (Accepted)
Publication history
 
 
   First posting date
08 Sep 2025
ProQuest document ID
3254649183
Document URL
https://www.proquest.com/scholarly-journals/phoneme-aware-augmentation-robust-cantonese-asr/docview/3254649183/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-07
Database
ProQuest One Academic