Abstract

In this paper, we investigate two neural architecture for gender detection and speaker identification tasks by utilizing Mel-frequency cepstral coefficients (MFCC) features which do not cover the voice related characteristics. One of our goals is to compare different neural architectures, multi-layers perceptron (MLP) and, convolutional neural networks (CNNs) for both tasks with various settings and learn the gender/speaker-specific features automatically. The experimental results reveal that the models using z-score and Gramian matrix transformation obtain better results than the models only use max-min normalization of MFCC. In terms of training time, MLP requires large training epochs to converge than CNN. Other experimental results show that MLPs outperform CNNs for both tasks in terms of generalization errors.

Details

Title
Neural architectures for gender detection and speaker identification
Author
Mamyrbayev, Orken 1   VIAFID ORCID Logo  ; Toleu, Alymzhan 2   VIAFID ORCID Logo  ; Tolegen, Gulmira 2 ; Mekebayev, Nurbapa 1 

 Institute of Information and Computational Technologies, Almaty, Kazakhstan; al-Farabi Kazakh National University, Almaty, Kazakhstan 
 Institute of Information and Computational Technologies, Almaty, Kazakhstan 
Publication year
2020
Publication date
Jan 2020
Publisher
Taylor & Francis Ltd.
e-ISSN
23311916
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2488092027
Copyright
© 2020 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license. This work is licensed under the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.