Abstract

Automatic speech segmentation as an important part of speech recognition system (ASR) is highly noise dependent. Noise is made by changes in the communication channel, background, level of speaking etc. In recent years, many researchers have proposed noise cancelation techniques and have added visual features from speaker's face to reduce the effect of noise on ASR systems. Removing noise from audio signals depends on the type of the noise; so it cannot be used as a general solution. Adding visual features improve this lack of efficiency, but advanced methods of this type need manual extraction of visual features. In this paper we propose a completely automatic system which uses optical flow vectors from speaker's image sequence to obtain visual features. Then, Hidden Markov Models are trained to segment audio signals from image sequences and audio features based on extracted optical flow. The developed segmentation system based on such method acts totally automatic and become more robust to noise.

Details

Title
Automatic Speech Segmentation Based On Audio and Optical Flow Visual Classification
Author
Torabi, Behnam; Ahmad Reza Naghsh Nilchi
Pages
43-49
Publication year
2014
Publication date
Oct 2014
Publisher
Modern Education and Computer Science Press
ISSN
20749074
e-ISSN
20749082
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1626456616
Copyright
Copyright Modern Education and Computer Science Press Oct 2014