Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Lip reading technology refers to the analysis of the visual information of the speaker’s mouth movements to recognize the content of the speaker’s speech. As one of the important aspects of human–computer interaction, the technology of lip reading has gradually become popular with the development of deep learning in recent years. At present, most the lip reading networks are very complex, with very large numbers of parameters and computation, and the model generated by training needs to occupy large memory, which brings difficulties for devices with limited storage capacity and computation power, such as mobile terminals. Based on the above problems, this paper optimizes and improves GhostNet, a lightweight network, and improves on it by proposing a more efficient Efficient-GhostNet, which achieves performance improvement while reducing the number of parameters through a local cross-channel interaction strategy, without dimensionality reduction. The improved Efficient-GhostNet is used to perform lip spatial feature extraction, and then the extracted features are inputted to the GRU network to obtain the temporal features of the lip sequences, and finally for prediction. We used Asian volunteers for the recording of the dataset in this paper, while also adopting data enhancement for the dataset, using the angle transformation of the dataset to deflect the recording process of the recorder by 15 degrees each to the left and right, in order to be able to enhance the robustness of the network and better reduce the influence of other factors, as well as to improve the generalization ability of the model so that the model can be more consistent with recognition scenarios in real life. Experiments prove that the improved Efficient-GhostNet + GRU model can achieve the purpose of reducing the number of parameters with comparable accuracy.

Details

Title
Research on a Lip Reading Algorithm Based on Efficient-GhostNet
Author
Zhang, Gaoyan; Lu, Yuanyao
First page
1151
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2785187073
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.