Return to search

Italianising English words with G2P techniques in TTS voices. An evaluation of different models

Text-to-speech voices have come a long way in terms of their naturalness, and they are getting closer to human-sounding than ever. However, among the problems that still persist, the pronunciation of foreign words is still one of them. The experiments conducted in this thesis focus on using grapheme-to-phoneme (G2P) models to tackle the just-mentioned issue and, more specifically, to adjust the erroneous pronunciation of English words to an Italian English accent in Italian-speaking voices. We curated a dataset of words collected during recording sessions with an Italian voice actor reading general conversational sentences. We then manually transcribed their pronunciation in Italian English. In the second stage, we augmented the dataset by collecting the most common surnames in Great Britain and the United States, phonetically transcribed them with a rule-based phoneme mapping algorithm previously deployed by the company, and then manually adjusted the pronunciations to Italian English. Thirdly, by using the massively multilingual ByT5 model, a Transformer G2P model pre-trained on 100 languages, as well as its tokenizer-dependent versions T5_base and T5_small, and an LSTM with attention based on OpenNMT, we performed 10-fold cross-validation with the curated dataset. The results show that augmenting the data benefitted every model. In terms of PER, WER and accuracy, the transformer-based ByT5_small strongly outperformed its T5_small and T5_base counterparts even with a third or two-thirds of the training data. The second best performing model, the LSTM with attention one built with the OpenNMT framework, outperformed as well the T5 models, showed the second-best accuracy of our experiments and was the 'lightest' in terms of trainable parameters (2M) in comparison to ByT5 (299M) and the T5 ones (60 and 200M).

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-531125
Date January 2024
CreatorsGrassini, Francesco
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds