Return to search

Text Normalization for Text-to-Speech

Text normalization plays a crucial role in text-to-speech systems by ensuring that the input text is in an appropriate format and consists of standardized words prior to grapheme-to-phoneme conversion for text-to-speech. The aim of this study was to assess the performance of five text normalization systems based on different methods. These text normalization systems were evaluated on the English Google text normalization dataset. The evaluation was based on the similarity between the ground truth and normalized outputs from each text normalization system. Since multiple ground truth issues occurred during the evaluation, the original similarity scores needed to be manually re-scored. The re-scoring was employed on a sample data semi-randomly extracted from the evaluation dataset. According to the results, the accuracy of these text normalization systems  can be ranked as follows: the Duplex system, the Hybrid system, the VT system, the RS system, and the WFST system. For the two rule-based systems from ReadSpeaker, the VT system performed slightly better than the RS system, with a slight difference in the original similarity score. By analyzing the error patterns produced during the normalization process, the study provided valuable insights into the strengths and limitations of these systems. The findings of this study contribute to the refinement of internal rules, leading to improved accuracy and effectiveness of text normalization in text-to-speech applications.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-503889
Date January 2023
CreatorsZhang, Zhaorui
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds