Global ETD Search

Return to search

A Design of Portuguese Speech Recognition System

IBM, a well-known computer giant, and Nuance, a renowned speech technology firm, have been offering numerous speech recognition applications in the recent years. The connections between these two companies and the automobile, communication, and other eight dominating industries, including banking, electronics, energy/utilities, medical/life science, insurance, media/entertainment, retail travel and transportation, are vastly expanded and flourished. Maturity of these speech technologies drives our lifestyle to a cozy level that we cannot imagine before. In April, 2011, the world class manufacturer Foxconn decided to invest 12 billion US dollars to build iPhone/iPad factories in Brazil, the largest Portuguese speaking country in the world. It is our objective to build a language system that can help us to learn Portuguese, to savor the beauty of their culture, and to widen our vision of travel and living.
This thesis investigates the design and implementation strategies for a Portuguese speech recognition system. It utilizes the speech features of the 303 common Portuguese mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Portuguese pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones.
The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.26% can be reached using phonotactical rules for a 3,900 vocabulary Portuguese phrase database. The average computation time for the Portuguese phrase system is less than 1.5 seconds, and the training time for the systems is about two hours.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0812111-144756

Phonotactics

Portuguese Speech recognition system

Linear predicted cepstral coefficients

Hidden Markov model

Mel-frequency cepstral coefficients

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0812111-144756
Date	12 August 2011
Creators	Kuo, Bo-yu
Contributors	Er-Hui Lu, Tsung Lee, Xiao-Song Bo, Chih-Chien Chen, Chii-Maw Uang
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	Cholon
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0812111-144756
Rights	user_define, Copyright information available at source archive

Page generated in 0.0018 seconds

A Design of Portuguese Speech Recognition System

Description

Links & Downloads

Tags

Additional Fields