Full Text

Turn on search term navigation

Copyright © 2022 Muhammad Ahmed Hassan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastically for non-native English accents. Our proposed model covers this limitation for non-native English accents. We fine-tuned the DeepSpeech2 model, pretrained on the native English accent dataset by LibriSpeech. We retrain the model on a subset of the common voice dataset having only South Asian accents using the proposed novel loss function. We experimented with three different layer configurations of model to learn the best features for South Asian accents. Three evaluation parameters, word error rate (WER), match error rate (MER), and word information loss (WIL) were used. The results show that DeepSpeech2 can perform significantly well for South Asian accents if the weights of initial convolutional layers are retained while updating weights of deeper layers in the model (i.e., RNN and fully connected layers). Our model gave WER of 18.08%, which is the minimum error achieved for non-native English accents in comparison with the original model.

Details

Title
Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2
Author
Muhammad Ahmed Hassan 1   VIAFID ORCID Logo  ; Rehmat, Asim 2   VIAFID ORCID Logo  ; Ghani Khan, Muhammad Usman 3   VIAFID ORCID Logo  ; Yousaf, Muhammad Haroon 4   VIAFID ORCID Logo 

 Al-Khawarizmi Institute of Computer Science, University of Engineering and Technology, Lahore, Pakistan 
 Department of Computer Engineering, University of Engineering and Technology, Lahore, Pakistan 
 Department of Computer Science, University of Engineering and Technology, Lahore, Pakistan 
 Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan 
Editor
Muhammad Fazal Ijaz
Publication year
2022
Publication date
2022
Publisher
John Wiley & Sons, Inc.
ISSN
1024123X
e-ISSN
15635147
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2725129619
Copyright
Copyright © 2022 Muhammad Ahmed Hassan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/