Content area
Multilingual people often code switch mixing two languages in the same sentence (intra-sentential code-switching) or between sentences (inter-sentential code-switching). In this thesis, we meet the dual challenge of code switching speech recognition in terms of both acoustic modeling and language modeling.
The acoustic modeling challenge is due to the lack of labeled code switching data.We propose a novel asymmetric pronunciation and acoustic modeling approach usinga single set of models trained on a small amount of accented data in the second language and monolingual data in the main language. We tested our proposed asymmetric acoustic models on inter-sentential and intra-sentential code-switching testsets and showed that our approach significantly outperforms previous approaches of using limited amount of code-switched data or using adaptation.
The challenge for language modeling is predicting the code switching point. Itis generally accepted by linguistics that code switching follows the Inversion Transduction Grammar Constraint under which the switching does not violate grammars of either language. Under another constraint, the Functional Head Constraint, code switching is forbidden between the functional head and its complements. However,none of these linguistic constraints has been previously modeled computationally or incorporated into code switching speech recognition. We propose a first ever computational approach of incorporating these linguistic constraints into a statistical language model for speech recognition. We proposeusing a weighted finite-state transducer (WFST) framework so that linguistic constraints such as Inversion Transduction Grammar Constraint and Functional Head Constraint can be incorporated.
We propose first ever statistical code switching language modeling that integrates the syntactic Inversion Transduction Grammar Constraint by a chunk segmentation model and a chunk translation model. We also propose a constrained code switching language model with Functional Head Constraint obtained by first expanding the search network with a translation model,and then restrict paths to those permissible using parsing. Experimental resultson lecture speech and lunch conversation datasets show our systems reduce word error rates compared to the previous approaches. Our proposed approaches delay code switching boundary decisions to avoid propagated errors. We address the code switching data scarcity challenge using bilingual data by language borrowing.