Return to search

A Design of Portuguese Speech Recognition System

IBM, a well-known computer giant, and Nuance, a renowned speech technology firm, have been offering numerous speech recognition applications in the recent years. The connections between these two companies and the automobile, communication, and other eight dominating industries, including banking, electronics, energy/utilities, medical/life science, insurance, media/entertainment, retail travel and transportation, are vastly expanded and flourished. Maturity of these speech technologies drives our lifestyle to a cozy level that we cannot imagine before. In April, 2011, the world class manufacturer Foxconn decided to invest 12 billion US dollars to build iPhone/iPad factories in Brazil, the largest Portuguese speaking country in the world. It is our objective to build a language system that can help us to learn Portuguese, to savor the beauty of their culture, and to widen our vision of travel and living.
This thesis investigates the design and implementation strategies for a Portuguese speech recognition system. It utilizes the speech features of the 303 common Portuguese mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Portuguese pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones.
The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.26% can be reached using phonotactical rules for a 3,900 vocabulary Portuguese phrase database. The average computation time for the Portuguese phrase system is less than 1.5 seconds, and the training time for the systems is about two hours.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0812111-144756
Date12 August 2011
CreatorsKuo, Bo-yu
ContributorsEr-Hui Lu, Tsung Lee, Xiao-Song Bo, Chih-Chien Chen, Chii-Maw Uang
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0812111-144756
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0018 seconds