Return to search

FPGA-based Implementation of Concatenative Speech Synthesis Algorithm

The main aim of a text-to-speech synthesis system is to convert ordinary text into an acoustic signal that is indistinguishable from human speech. This thesis presents an architecture to implement a concatenative speech synthesis algorithm targeted to FPGAs. Many current text-to-speech systems are based on the concatenation of acoustic units of recorded speech. Current concatenative speech synthesizers are capable of producing highly intelligible speech. However, the quality of speech often suffers from discontinuities between the acoustic units, due to contextual differences. This is the easiest method to produce synthetic speech. It concatenates prerecorded acoustic elements and forms a continuous speech element. The software implementation of the algorithm is performed in C whereas the hardware implementation is done in structural VHDL. A database of acoustic elements is formed first with recording sounds for different phones. The architecture is designed to concatenate acoustic elements corresponding to the phones that form the target word. Target word corresponds to the word that has to be synthesized. This architecture doesn't address the form discontinuities between the acoustic elements as its ultimate goal is the synthesis of speech. The Hardware implementation is verified on a Virtex (v800hq240-4) FPGA device.

Identiferoai:union.ndltd.org:USF/oai:scholarcommons.usf.edu:etd-2327
Date29 October 2003
CreatorsBamini, Praveen Kumar
PublisherScholar Commons
Source SetsUniversity of South Flordia
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceGraduate Theses and Dissertations
Rightsdefault

Page generated in 0.0022 seconds