Content area

Abstract

In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.

Details

1009240
Title
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Jun 10, 2024
Section
Computer Science; Electrical Engineering and Systems Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-06-11
Milestone dates
2024-06-10 (Submission v1)
Publication history
 
 
   First posting date
11 Jun 2024
ProQuest document ID
3066577103
Document URL
https://www.proquest.com/working-papers/meta-learning-text-speech-synthesis-over-7000/docview/3066577103/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-06-12
Database
ProQuest One Academic