Return to search

A Design of Trilingual Speech Recognition System for Chinese, Portuguese and Hindi

The BRICS, Brazil, Russia, India, China and South Africa, have been making a significant amount of contribution to the global economy growth in the past few years. China possesses not only the largest population, but also the most splendid history in the world. During the recent years, the rapid development on all respects, including the enhanced economic trade with Taiwan, has made China in the line of the Super Powers. Brazil is the largest Portuguese speaking country in the world, where the world class manufacturer Foxconn Technology decided to build Apple iPad/iPhone factory in 2011. India has been flourishing in software, tele-communications and aviation industries since last decade. Offshore outsourcing consulting is so popular due to cost-down policy of the Western companies. Chinese, Portuguese and Hindi speaking population are over 1.573 billion, and account for over 22% of the world population. Therefore, it is our objective to establish a trilingual speech recognition system to help verbal communication and cultural understanding among languages.
This thesis investigates the design and implementation strategies for a trilingual speech recognition system for Chinese, Portuguese and Hindi. Based on their pronunciation rules, the 404 Chinese, 515 Portuguese and 244 Hindi common mono-syllables are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct phrase recognition rates of 87.69%, 85.14% and 86.74% can be reached using phonotactical rules for the 82,000 Chinese, 30,000 Portuguese and 3,900 Hindi phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be reached. The average computation time for each system is within 2 seconds.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0910112-155529
Date10 September 2012
CreatorsWang, Yu-an
ContributorsSheau-Shong Bor, Chih-Chien Chen, Chii-Maw Uang
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0910112-155529
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0024 seconds