Content area
Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions.
Details
Accuracy;
Phonology;
Conditional random fields;
Cantonese;
Phonetics;
Phonemes;
Masking;
Phonemics;
Voice recognition;
Supervision;
Tone;
Speech recognition;
Multilingualism;
Robustness (mathematics);
Annotations;
Acoustics;
Automatic speech recognition;
Speech;
Grapheme phoneme correspondence;
Chinese languages;
Reduction (Phonological or Phonetic);
Semantics;
Cultural heritage;
Experiments;
Dropping out;
Classification;
Scarcity;
Contours;
Augmentation;
Morality;
Curricula
1 School of Physics and Electronic Information, Yantai University, Yantai 264005, China; [email protected] (L.Z.); [email protected] (S.W.), Shandong Data Open Innovation Application Laboratory of Smart Grid Advanced Technology, Yantai University, Yantai 264005, China