Full text

Turn on search term navigation

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

This research was conducted to solve the out-of-vocabulary problem caused by Uyghur spelling errors in Uyghur–Chinese machine translation, so as to improve the quality of Uyghur–Chinese machine translation. This paper assesses three spelling correction methods based on machine translation: 1. Using a Bilingual Evaluation Understudy (BLEU) score; 2. Using a Chinese language model; 3. Using a bilingual language model. The best results were achieved in both the spelling correction task and the machine translation task by using the BLEU score for spelling correction. A maximum F1 score of 0.72 was reached for spelling correction, and the translation result increased the BLEU score by 1.97 points, relative to the baseline system. However, the method of using a BLEU score for spelling correction requires the support of a bilingual parallel corpus, which is a supervised method that can be used in corpus pre-processing. Unsupervised spelling correction can be performed by using either a Chinese language model or a bilingual language model. These two methods can be easily extended to other languages, such as Arabic.

Details

Title
Spelling Correction of Non-Word Errors in Uyghur–Chinese Machine Translation
Author
Dong, Rui 1 ; Yang, Yating 2 ; Jiang, Tonghai 3 

 Xinjiang Technical Institute of Physics and Chemistry Chinese Academy of Science, Urumqi 830011, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China; University of the Chinese Academy of Sciences, Beijing 100049, China 
 Xinjiang Technical Institute of Physics and Chemistry Chinese Academy of Science, Urumqi 830011, China; Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China 
 Xinjiang Technical Institute of Physics and Chemistry Chinese Academy of Science, Urumqi 830011, China 
First page
202
Publication year
2019
Publication date
2019
Publisher
MDPI AG
e-ISSN
20782489
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2548516208
Copyright
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.