Content area

Abstract

Dialect is one of the most important factors next to gender that influences speech system performance including automatic speech recognition. In this thesis, we address novel advances in automatic dialect classification. The goal is to detect and classify dialect information in continuous speech and text. Based on available resources, we approach this problem from three directions. First, when transcripts are given for audio training data, we propose a Word-based Dialect Classification (WDC) framework, which turns the single text-independent decision problem into a multiple text-dependent decision problem. The basic WDC system is further enhanced by the AdaBoost algorithm, a dialect dependence measure, and Context Adaptive Training (CAT) which is proposed to address the data sparseness problem in training. The WDC systems outperform the state-of-the-art system (Large Vocabulary Continuous Speech Recognition (LVCSR) based system) by 38%-70% in three English dialect corpora. Second, when there are no associated transcripts available for training, a Gaussian Mixture Model (GMM) based algorithm is proposed. Here, Frame Selection (FS), Mixture Selection (MS), and Minimum Classification Error (MCE) methods are proposed and applied to discriminatively train the GMM. The discriminatively trained GMM classifiers outperform the baseline maximum likelihood estimated GMM classifier by 33%-43% in both English and Spanish corpora. Third, since dialect information is represented not only at the pronunciation level, but also at the word selection and grammar levels, text processing is also considered. Topic-specific documents from the United States, Australia, and the United Kingdom are collected and analyzed for dialects of English. An n-gram language model based classifier, and the Conditional Random Fields (CRF) based classifier are applied to determine dialect origin of documents. It is shown that there are strong dialect differences contained in the text structure of these documents. The n-gram language model based classifier achieves 82%-92% classification accuracy on five document topics. The CRF based classifier obtains 10%-20% improvement over the n-gram language model based classifier. Finally, a comparison between the proposed automatic systems and human listener performance in dialect classification is conducted. The CRF is applied to unsupervised dialect classification, and issues concerning data mismatch and training data size in dialect classification are explored. The algorithms formulated in this thesis, and the resulting observations provide important contributions to the advancement of effective systems for dialect classification at the acoustic level. It is also shown that text structure can contribute to effective dialect classification systems.

Details

Title
Automatic dialect classification: Advances for read and spontaneous speech, and printed text
Author
Huang, Rongqing
Year
2006
Publisher
ProQuest Dissertations & Theses
ISBN
978-0-542-94318-8
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
305354799
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.