Global ETD Search

Return to search

Efektivní neuronová syntéza řeči / Efficient neural speech synthesis

While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time. In this the- sis, we present a neural speech synthesis system capable of high-quality faster- than-real-time spectrogram synthesis, with low requirements on computational resources and fast training time. Our system consists of a teacher and a student network. The teacher model is used to extract alignment between the text to synthesize and the corresponding spectrogram. The student uses the alignments from the teacher model to synthesize mel-scale spectrograms from a phonemic representation of the input text efficiently. Both systems utilize simple convo- lutional layers. We train both systems on the english LJSpeech dataset. The quality of samples synthesized by our model was rated significantly higher than baseline models. Our model can be efficiently trained on a single GPU and can run in real time even on a CPU. 1

http://www.nusl.cz/ntk/nusl-415974

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:415974
Date	January 2020
Creators	Vainer, Jan
Contributors	Dušek, Ondřej, Hajič, Jan
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0121 seconds

Efektivní neuronová syntéza řeči / Efficient neural speech synthesis

Description

Links & Downloads

Tags

Additional Fields