Abstract

Emotion estimation method with Mel-frequency spectrum, voice power level and pitch frequency of human voices through CNN (Convolutional Neural Network) learning processes is proposed. Usually, frequency spectra are used for emotion estimation. The proposed method utilizes not only Mel-frequency spectrum, but also voice pressure level (voice power level) and pitch frequency to improve emotion estimation accuracy. These components are used through CNN learning processes using training samples which are provided by Keio University (emotional speech corpus) together with our own training samples which are collected by our students in emotion estimation processes. In these processes, the target emotion is divided into two categories, confident and non-confident. Through experiments, it is found that the proposed method is superior to the traditional method with only Mel-frequency by 15%.

Details

Title
Emotion Estimation Method with Mel-frequency Spectrum, Voice Power Level and Pitch Frequency of Human Voices through CNN Learning Processes
Author
Haruta, Taiga; Oda, Mariko; Arai, Kohei
Publication year
2022
Publication date
2022
Publisher
Science and Information (SAI) Organization Limited
ISSN
2158107X
e-ISSN
21565570
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2758768532
Copyright
© 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.