Global ETD Search

71	A Design of Portuguese Speech Recognition System Kuo, Bo-yu 12 August 2011 (has links) IBM, a well-known computer giant, and Nuance, a renowned speech technology firm, have been offering numerous speech recognition applications in the recent years. The connections between these two companies and the automobile, communication, and other eight dominating industries, including banking, electronics, energy/utilities, medical/life science, insurance, media/entertainment, retail travel and transportation, are vastly expanded and flourished. Maturity of these speech technologies drives our lifestyle to a cozy level that we cannot imagine before. In April, 2011, the world class manufacturer Foxconn decided to invest 12 billion US dollars to build iPhone/iPad factories in Brazil, the largest Portuguese speaking country in the world. It is our objective to build a language system that can help us to learn Portuguese, to savor the beauty of their culture, and to widen our vision of travel and living. This thesis investigates the design and implementation strategies for a Portuguese speech recognition system. It utilizes the speech features of the 303 common Portuguese mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Portuguese pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.26% can be reached using phonotactical rules for a 3,900 vocabulary Portuguese phrase database. The average computation time for the Portuguese phrase system is less than 1.5 seconds, and the training time for the systems is about two hours. Phonotactics Portuguese Speech recognition system Linear predicted cepstral coefficients Hidden Markov model Mel-frequency cepstral coefficients
72	A Design of Russian Speech Recognition System Wu, Yin-Jie 19 August 2011 (has links) Language plays an important role for understanding people, their history, culture and even technology. Many countries of the world have developed the technology of the outer space recently, and Russian is the top of the world. In 1998 Russia further launched Zarya, the first International Space Station (ISS) Module, to the outer space, and was deeply involved in the development of the ISS with the U.S.. Since the end of the World War Two, Russia has been one of the five Permanent Members in the United Nations. And then, she became one of the G8 members, an economical forum of eight industrially advanced nations. Because these informations, it is our objective to build a language system that can help us to learn Russian, to taste the beauty of her culture, and to widen our vision of technologies. This thesis investigates the design and implementation strategies for a Russian speech recognition system. It utilizes the speech features of the 514 common Russian mono-syllables as the major training and recognition methodology. The mono-syllable is established by applying Russian pronunciation rules. These 12 utterances are collected through reading 6 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 86.90% and 94.83% can be reached using phonotactical rules for a 3,900 vocabulary Russian phrase database for TORFL (Test of Russian as a Foreign Language) and a 600 person name database for Russian. The average computation time for each system is less than 1.5 seconds, and the training time for the systems is about three hours. Linear predicted cepstral coefficients Hidden Markov model Mel-frequency cepstral coefficients Phonotactics Russian speech recognition system
73	A Design of Arabic Speech Recognition System Lee, Shih-Chung 19 August 2011 (has links) Arab world is one of the most spectacular regions in the earth, especially for her over 2,800 year history, Islamic religion and magnificent culture. She consists of 24 countries and territories where people speak Arabic. The population of Arabic speaking people is approximately 221 million, and ranked the fourth according to the 2009 statistics by Summer Institute of Linguistics, USA. Since 1973, petroleum embargoes, imposed by the Arab world, have influenced global economy and hurt national security seriously. This kind of fossil energy is still irreplaceable until efficient green energy alternative becomes feasible. It is our objective to build a language system that can help us to learn Arabic, to appreciate the beauty of her culture, and to widen our vision of religions. This thesis investigates the design and implementation strategies for an Arabic speech recognition system. It utilizes the speech features of the 302 common Arabic mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Arabic pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 86.31% and 93.90% can be reached respectively using phonotactical rules for a 3,600 vocabulary Arabic phrase database and a 590 person name database for Arabic figures. The average computation time for each system is less than 1 second, and the training time for the systems is about two hours. linear predicted cepstral coefficients Mel-frequency cepstral coefficients Arabic speech recognition system phonotactics hidden Markov model
74	A Design of Italian Speech Recognition System Lin, Wei-cheng 22 August 2011 (has links) The European Union (EU) established on November 1, 1993, according to the Maastricht Treaty signed on February 7, 1992. This economic and political community consists of 27 member states, primarily located in Europe. She operates through a supranational and intergovernmental system, including the European Commission, the Council, the Parliament and the Central Bank, to transfer herself from the joint economic development regions to the single market of economic and political integration. Italy is one of the six founding countries of the EU, also one of the G8 members, the eight industrially advanced nations in the world, and playing a force to be reckoned with. It is our objective to build a language system that can help us to learn Italian more effectively, to promote our competency of intercultural understanding, and to widen our vision of travel and living. This thesis investigates the design and implementation strategies for an Italian speech recognition system. It utilizes the speech features of the 370 common Italian mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Italian pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 88.35% and 89.32% can be reached using phonotactical rules for a 4,000 vocabulary Italian phrase database and a 3,304 word database for Italian Language Proficiency Test. The average computation time for each system is less than 1.5 seconds, and the training time for the systems is about two hours. Phonotactics Hidden Markov model Linear predicted cepstral coefficients Italian speech recognition system Mel-frequency cepstral coefficients
75	A Design of Turkish Speech Recognition System Chen, Guan-lun 22 August 2011 (has links) The Republic of Turkey, founded in 1923, is a well-known ancient country with abundant cultural heritage and great junction location of the Asian and European Continents. Istanbul is the largest city of this country with her old name Constantinople or Byzantium. She was established by Constantinus I Magnus in A.D. 330 during the era of the Roman Empire, to serve as a well-fortified castle like Rome. Numerous attractions on historical architecture, ancient music, gourmet cuisine, and art collections can be explored and appreciated. It is our objective to build a language system that can help us to learn Turkish, to savor the beauty of her culture, and to widen our vision of travel and living. This thesis investigates the design and implementation strategies for a Turkish speech recognition system. It utilizes the speech features of the 395 common Turkish mono-syllables as the major training and recognition methodology. A training database of 12 utterances per mono-syllable is established by applying Turkish pronunciation rules. These 12 utterances are collected through reading 6 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.8 GHz Athlon X2 2400 personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.29% can be reached using phonotactical rules for a 3,644 vocabulary Turkish phrase database. The average computation time for the each system is less than 1.5 seconds, and the training time for the systems is about two hours. Turkish speech recognition system Hidden Markov model Linear predicted cepstral coefficients Mel-frequency cepstral coefficients phonotactics
76	A Design of Recognition Rate Improving Strategy For English Speech Recognition System Hung, Ming-Chang 27 August 2011 (has links) Britain established the status of maritime hegemony in 1588. The English language along with the British colonized activities was spread to North America, India, Africa and Australia. After the end of World War I in 1918, the U.S. became the most powerful nation in the world economy. And at the same time, the world financial center was shifted to New York from London. In 1945, the World War II ended, the U.S. further played indispensable role in each aspect of international politics, economy and technologies. The United Nation, founded on October 24, 1945, adopted English, Chinese, French, Spanish, Arabic as well as Russian as the six working languages. These historical events facilitated a succession of language expansion and caused English to be the most widely used international language. Beside the political, economic and technological superiority, Britain owns the largest comprehensive museum in the globe, the British Museum. This Museum was located in London, built in 1753, and more than 13 million cultural relics of archaeology from around the world were collected. Her cultural resources are remarkably rich. It is our objective to build a language system that can help us to learn English more effectively and to widen our vision of living at the same time. This thesis investigates the recognition rate improvement strategies for an English speech recognition system. It utilizes the speech features of the 989 common English mono-syllables as the major training and recognition methodology. A training database is established by reading each mono-syllable 14 rounds. Each one of the 989 mono-syllables is consecutively read with two different tones at alternate rounds. The odd pronounced rounds have high pitch of tone 1, while the even rounds have falling pitch of tone 4. The pitch period frame method is applied for enhancing the accuracy of end point detection. Mel-frequency cepstral coefficients, linear predictive cepstral coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. The number of HMM states is adjusted to 10 and the phonotactical rule is used for the recognition rate improvement. Under the Core ™ i5 CPU M450 notebook computer with 2.4GHz clock rate and Fedora 14 operating system environment, a 92.94% correct phrase recognition rate can be reached for a 6,812 English phrase database. The average computation time for each phrase is within 1.5 seconds. Mel-frequency cepstral coefficients Phonotactics Hidden Markov model Linear predictive cepstral coefficients English speech recognition system
77	Using Latin Square Design To Evaluate Model Interpolation And Adaptation Based Emotional Speech Synthesis Hsu, Chih-Yu 19 July 2012 (has links) ¡@¡@In this thesis, we use a hidden Markov model which can use a small amount of corpus to synthesize speech with certain quality to implement speech synthesis system for Chinese. More, the emotional speech are synthesized by the flexibility of the parametric speech in this model. We conduct model interpolation and model adaptation to synthesize speech from neutral to particular emotion without target speaker¡¦s emotional speech. In model adaptation, we use monophone-based Mahalanobis distance to select emotional models which are close to target speaker from pool of speakers, and estimate the interpolation weight to synthesize emotional speech. In model adaptation, we collect abundant of data training average voice models for each individual emotion. These models are adapted to specific emotional models of target speaker by CMLLR method. In addition, we design the Latin-square evaluation to reduce the systematic offset in the subjective tests, making results more credible and fair. We synthesize emotional speech include happiness, anger, sadness, and use Latin square design to evaluate performance in three part similarity, naturalness, and emotional expression respectively. According to result, we make a comprehensive comparison and conclusions of two method in emotional speech synthesis. model interpolation Latin-square design hidden Markov model model adaptation emotional speech synthesis Mahalanobis distance
78	A Design of Trilingual Speech Recognition System for Chinese, Russian and Thai Pan, Hao-Ming 10 September 2012 (has links) Economy growth rate is an index of a nation¡¦s gross productivity. China, Russia and Thailand are a few nations whose economy growth rates exceed the global average. In the recent years, the rapid development in China, including the enhanced relation with Taiwan, has made her the member of the BRICS, the top five emerging countries in the world. Russia has been playing an important role in the international society during the past decades. She is not only the member of the G8, the group of eight major industrial nations, but also her language, Russian, is one of the six official languages in the United Nations. According to the statistics of the Taiwan Funds, Russia and Thailand are the top two countries in their investment growth. Thailand, located in the middle of the Southeast Peninsular, together with Malaysia and Philippines, are the three founding members of the ASEAN 10, the Association of Ten Southeast Asian Nations. Due to the industrial and household needs, Taiwan has offered job opportunities to foreign labors from the Southeast countries. Therefore, it is our objective to design a trilingual speech recognition system for Chinese, Russian and Thai to meet the needs of language learning and household living. This system utilizes 404 Chinese, 611 Russian and 123 Thai common mono-syllables, selected from their pronunciation rules, as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct phrase recognition rates of 88.87%, 84.31% and 87.58% can be reached using phonotactical rules for the 82,000 Chinese, 31,883 Russian and 3,809 Thai phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.66% correct language-phrase recognition rate can be obtained. Hidden Markov model Mel-frequency cepstral coefficients Linear predicted cepstral coefficients Phonotactic Speech recognition
79	A Design of Trilingual Speech Recognition System for Chinese, Taiwanese and Cantonese Zheng, Po-Xin 10 September 2012 (has links) Mandarin Chinese, Taiwanese and Cantonese all belong to the Chinese language family. According to the statistics from Summer Institute of Linguistics, USA, Chinese language are spoken by over 1.2 billion population, ranked number one in the world. The regions where these three languages are spoken have been playing an important role for global economy. For example, Hong Kong and Taiwan all have flourishing harbors for international trade. Furthermore, Mandarin Chinese, Taiwanese and Cantonese are the most influential among the seven Chinese dialects. Mandarin Chinese was admitted as a language by the United Nations in the early years while Cantonese was accepted in 2006. Cantonese is spoken in many Western countries. She is the fourth language in Australia as well as the third language in Canada and America. From the phonetics point of view, these three languages are all tonal languages in which words or phrases uttered in different pitch or duration have distinct lexical meaning. This thesis investigates the design and implementation strategies for Chinese, Taiwanese and Cantonese. Based on their pronunciation rules and tonal properties, common mono-syllables for each language are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct recognition rates of 88.03%, 86.00% and 86.79% can be reached using phonotactical rules for the 82,000 Chinese, 5,129 Taiwanese and 3,051 Cantonese phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 97.66% correct language-phrase recognition rate can be obtained. Phonotactics Hidden Markov model Linear predicted cepstral coefficients Mel-frequency cepstral coefficients Speech recognition
80	A Design of Trilingual Speech Recognition System for Chinese, English and Vietnamese Tzeng, Yi-Ying 10 September 2012 (has links) History, culture and economy constitute the foundation of language. Mandarin Chinese is our native language, spoken by over 1.2 billion people. Its population is ranked number one in the world. In the recent years, the emerging China not only possesses market and labor forces, but also develops the Chinese culture circle in Asia. British history and American politics make English the most influential language in the 20th century. Vietnam has been under the profound influence of Chinese culture. The reformed and opened economy in the past decade brought her tremendous foreign investments, including those from Taiwan. It is our objective to establish a trilingual system for travel, living and speech learning. This thesis investigates the design and implementation strategies for a trilingual speech recognition system of Chinese, English and Vietnamese. It utilizes the speech features of 404 Chinese, 925 English and 154 Vietnamese mono-syllables as the major training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct rates of 88.16%, 82.74% and 87.45% can be reached using phonotactical rules for the 82,000 Chinese, 30,795 English and 3,300 Vietnamese phrase database respectively. The computation for each system can be completed within 2 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be obtained with the computation time less than 2 seconds. Mel-frequency cepstral coefficients Hidden Markov model Phonotactic Linear predicted cepstral coefficients Speech recognition

Search results