Neural architectures for gender detection and

Abstract

In this paper, we investigate two neural architecture for gender detection and speaker identification tasks by utilizing Mel-frequency cepstral coefficients (MFCC) features which do not cover the voice related characteristics. One of our goals is to compare different neural architectures, multi-layers perceptron (MLP) and, convolutional neural networks (CNNs) for both tasks with various settings and learn the gender/speaker-specific features automatically. The experimental results reveal that the models using z-score and Gramian matrix transformation obtain better results than the models only use max-min normalization of MFCC. In terms of training time, MLP requires large training epochs to converge than CNN. Other experimental results show that MLPs outperform CNNs for both tasks in terms of generalization errors.

Details

Title

Neural architectures for gender detection and speaker identification

Author

Mamyrbayev, Orken¹

; Toleu, Alymzhan²

; Tolegen, Gulmira²; Mekebayev, Nurbapa¹

¹ Institute of Information and Computational Technologies, Almaty, Kazakhstan; al-Farabi Kazakh National University, Almaty, Kazakhstan
² Institute of Information and Computational Technologies, Almaty, Kazakhstan

Publication year

2020

Publication date

Jan 2020

Publisher

Taylor & Francis Ltd.

e-ISSN

23311916

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1080/23311916.2020.1727168

ProQuest document ID

2488092027

© 2020 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license. This work is licensed under the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Neural architectures for gender detection and speaker identification

Jump to:

Abstract

Details

Suggested sources