Global ETD Search

11	A Design of Arabic Speech Recognition System Lee, Shih-Chung 19 August 2011 (has links) Arab world is one of the most spectacular regions in the earth, especially for her over 2,800 year history, Islamic religion and magnificent culture. She consists of 24 countries and territories where people speak Arabic. The population of Arabic speaking people is approximately 221 million, and ranked the fourth according to the 2009 statistics by Summer Institute of Linguistics, USA. Since 1973, petroleum embargoes, imposed by the Arab world, have influenced global economy and hurt national security seriously. This kind of fossil energy is still irreplaceable until efficient green energy alternative becomes feasible. It is our objective to build a language system that can help us to learn Arabic, to appreciate the beauty of her culture, and to widen our vision of religions. This thesis investigates the design and implementation strategies for an Arabic speech recognition system. It utilizes the speech features of the 302 common Arabic mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Arabic pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 86.31% and 93.90% can be reached respectively using phonotactical rules for a 3,600 vocabulary Arabic phrase database and a 590 person name database for Arabic figures. The average computation time for each system is less than 1 second, and the training time for the systems is about two hours. linear predicted cepstral coefficients Mel-frequency cepstral coefficients Arabic speech recognition system phonotactics hidden Markov model
12	A Design of Italian Speech Recognition System Lin, Wei-cheng 22 August 2011 (has links) The European Union (EU) established on November 1, 1993, according to the Maastricht Treaty signed on February 7, 1992. This economic and political community consists of 27 member states, primarily located in Europe. She operates through a supranational and intergovernmental system, including the European Commission, the Council, the Parliament and the Central Bank, to transfer herself from the joint economic development regions to the single market of economic and political integration. Italy is one of the six founding countries of the EU, also one of the G8 members, the eight industrially advanced nations in the world, and playing a force to be reckoned with. It is our objective to build a language system that can help us to learn Italian more effectively, to promote our competency of intercultural understanding, and to widen our vision of travel and living. This thesis investigates the design and implementation strategies for an Italian speech recognition system. It utilizes the speech features of the 370 common Italian mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Italian pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 88.35% and 89.32% can be reached using phonotactical rules for a 4,000 vocabulary Italian phrase database and a 3,304 word database for Italian Language Proficiency Test. The average computation time for each system is less than 1.5 seconds, and the training time for the systems is about two hours. Phonotactics Hidden Markov model Linear predicted cepstral coefficients Italian speech recognition system Mel-frequency cepstral coefficients
13	A Design of Turkish Speech Recognition System Chen, Guan-lun 22 August 2011 (has links) The Republic of Turkey, founded in 1923, is a well-known ancient country with abundant cultural heritage and great junction location of the Asian and European Continents. Istanbul is the largest city of this country with her old name Constantinople or Byzantium. She was established by Constantinus I Magnus in A.D. 330 during the era of the Roman Empire, to serve as a well-fortified castle like Rome. Numerous attractions on historical architecture, ancient music, gourmet cuisine, and art collections can be explored and appreciated. It is our objective to build a language system that can help us to learn Turkish, to savor the beauty of her culture, and to widen our vision of travel and living. This thesis investigates the design and implementation strategies for a Turkish speech recognition system. It utilizes the speech features of the 395 common Turkish mono-syllables as the major training and recognition methodology. A training database of 12 utterances per mono-syllable is established by applying Turkish pronunciation rules. These 12 utterances are collected through reading 6 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.8 GHz Athlon X2 2400 personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.29% can be reached using phonotactical rules for a 3,644 vocabulary Turkish phrase database. The average computation time for the each system is less than 1.5 seconds, and the training time for the systems is about two hours. Turkish speech recognition system Hidden Markov model Linear predicted cepstral coefficients Mel-frequency cepstral coefficients phonotactics
14	A Design of Recognition Rate Improving Strategy For English Speech Recognition System Hung, Ming-Chang 27 August 2011 (has links) Britain established the status of maritime hegemony in 1588. The English language along with the British colonized activities was spread to North America, India, Africa and Australia. After the end of World War I in 1918, the U.S. became the most powerful nation in the world economy. And at the same time, the world financial center was shifted to New York from London. In 1945, the World War II ended, the U.S. further played indispensable role in each aspect of international politics, economy and technologies. The United Nation, founded on October 24, 1945, adopted English, Chinese, French, Spanish, Arabic as well as Russian as the six working languages. These historical events facilitated a succession of language expansion and caused English to be the most widely used international language. Beside the political, economic and technological superiority, Britain owns the largest comprehensive museum in the globe, the British Museum. This Museum was located in London, built in 1753, and more than 13 million cultural relics of archaeology from around the world were collected. Her cultural resources are remarkably rich. It is our objective to build a language system that can help us to learn English more effectively and to widen our vision of living at the same time. This thesis investigates the recognition rate improvement strategies for an English speech recognition system. It utilizes the speech features of the 989 common English mono-syllables as the major training and recognition methodology. A training database is established by reading each mono-syllable 14 rounds. Each one of the 989 mono-syllables is consecutively read with two different tones at alternate rounds. The odd pronounced rounds have high pitch of tone 1, while the even rounds have falling pitch of tone 4. The pitch period frame method is applied for enhancing the accuracy of end point detection. Mel-frequency cepstral coefficients, linear predictive cepstral coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. The number of HMM states is adjusted to 10 and the phonotactical rule is used for the recognition rate improvement. Under the Core ™ i5 CPU M450 notebook computer with 2.4GHz clock rate and Fedora 14 operating system environment, a 92.94% correct phrase recognition rate can be reached for a 6,812 English phrase database. The average computation time for each phrase is within 1.5 seconds. Mel-frequency cepstral coefficients Phonotactics Hidden Markov model Linear predictive cepstral coefficients English speech recognition system
15	A Design of Trilingual Speech Recognition System for Chinese, Taiwanese and Cantonese Zheng, Po-Xin 10 September 2012 (has links) Mandarin Chinese, Taiwanese and Cantonese all belong to the Chinese language family. According to the statistics from Summer Institute of Linguistics, USA, Chinese language are spoken by over 1.2 billion population, ranked number one in the world. The regions where these three languages are spoken have been playing an important role for global economy. For example, Hong Kong and Taiwan all have flourishing harbors for international trade. Furthermore, Mandarin Chinese, Taiwanese and Cantonese are the most influential among the seven Chinese dialects. Mandarin Chinese was admitted as a language by the United Nations in the early years while Cantonese was accepted in 2006. Cantonese is spoken in many Western countries. She is the fourth language in Australia as well as the third language in Canada and America. From the phonetics point of view, these three languages are all tonal languages in which words or phrases uttered in different pitch or duration have distinct lexical meaning. This thesis investigates the design and implementation strategies for Chinese, Taiwanese and Cantonese. Based on their pronunciation rules and tonal properties, common mono-syllables for each language are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct recognition rates of 88.03%, 86.00% and 86.79% can be reached using phonotactical rules for the 82,000 Chinese, 5,129 Taiwanese and 3,051 Cantonese phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 97.66% correct language-phrase recognition rate can be obtained. Phonotactics Hidden Markov model Linear predicted cepstral coefficients Mel-frequency cepstral coefficients Speech recognition
16	A Design of Trilingual Speech Recognition System for Chinese, Italian and Farsi Jiang, Wei-Sheng 10 September 2012 (has links) China, Italy and Iran are seemingly quite different in language, history, culture and economy. However, there have been existed mutual interactions among these three countries during the past age. In the fourth century, the Chinese Northern Wei Dynasty established close relation with the Persian Empire, located in Iran today. Persian language is also called Farsi in her native name. The unearthed silver bowls from China in the recent years showed similar appearance and material with the Sassanid-Persian silverware of Iran. Archaeologists found that ancient China and Iran used to be close international trading partners. In the thirteenth century, Marco-Polo, an Italian travel adventurer and merchant, visited Chinese Yuan Dynasty, and wrote a marvelous book ¡§The Travels of Marco-Polo¡¨. Fantastic experiences in China were depicted in this journal, and these initiated the Sino-Italian relation in the early days. Armani suits and Ferrari super racers become the oriental passion to the Italy in the Modern China, and this may represent the achievement of Asian-European culture exchange. Therefore, it is our objective to design a trilingual speech recognition system to help us to learn Chinese, Italian and Farsi languages. Linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, hidden Markov model and phonotactics are used in this system as the two syllable feature models and the recognition model respectively. For the Chinese system, a 2,699 two-syllable words database is used as the training corpus. For the Italian and Farsi systems, a database of 10 utterances per mono-syllable is established by applying their pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with tone 1 and tone 4. The correct recognition rates of 87.54%, 87.48%, and 90.33% can be reached for the 82,000 Chinese, 27,900 Italian, and 4,000 Farsi phrase databases respectively. The computation time for each system is within 1.5 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67 % correct language-phrase recognition rate can be obtained with the computation time about 2 seconds. Speech recognition Linear predicted cepstral coefficients Hidden Markov model Mel-frequency cepstral coefficients Phonotactics
17	A Design of Trilingual Speech Recognition System for Chinese, Portuguese and Hindi Wang, Yu-an 10 September 2012 (has links) The BRICS, Brazil, Russia, India, China and South Africa, have been making a significant amount of contribution to the global economy growth in the past few years. China possesses not only the largest population, but also the most splendid history in the world. During the recent years, the rapid development on all respects, including the enhanced economic trade with Taiwan, has made China in the line of the Super Powers. Brazil is the largest Portuguese speaking country in the world, where the world class manufacturer Foxconn Technology decided to build Apple iPad/iPhone factory in 2011. India has been flourishing in software, tele-communications and aviation industries since last decade. Offshore outsourcing consulting is so popular due to cost-down policy of the Western companies. Chinese, Portuguese and Hindi speaking population are over 1.573 billion, and account for over 22% of the world population. Therefore, it is our objective to establish a trilingual speech recognition system to help verbal communication and cultural understanding among languages. This thesis investigates the design and implementation strategies for a trilingual speech recognition system for Chinese, Portuguese and Hindi. Based on their pronunciation rules, the 404 Chinese, 515 Portuguese and 244 Hindi common mono-syllables are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct phrase recognition rates of 87.69%, 85.14% and 86.74% can be reached using phonotactical rules for the 82,000 Chinese, 30,000 Portuguese and 3,900 Hindi phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be reached. The average computation time for each system is within 2 seconds. Linear predicted cepstral coefficients Hidden Markov model Phonotactics Mel-frequency cepstral coefficients Speech recognition
18	A Design of Trilingual Speech Recognition System for Chinese, Arabic and Dutch Tu, Ming-hui 10 September 2012 (has links) Chinese as well as Arabic is one of the six official languages in the United Nations. The population of Chinese is over 1.2 billion, ranked number one in the world. Arabic, a language used in the Arab World, has a more than 2,800 year history. Her religion, culture and oil economy have been making far-reaching effects around the globe. The worldwide energy supply greatly relies on the petroleum from the Arab World. Netherland, whose official language is Dutch, has been an international trading power since ancient time. She has become an industrial giant today. Recently, European-study-abroad is getting more popular, many famous Netherland universities offer opportunities for foreign students. Therefore, it is our objective to design a trilingual speech recognition system to help us learn Chinese, Arabic and Dutch, as well as appreciate their profound history and beautiful culture. This thesis investigates the design and implementation strategies for a Chinese, Arabic and Dutch speech recognition system. A 2,699 two-syllable recorded words database is utilized as the Chinese training corpus. For the Arabic and Dutch systems, 396 and 205 common mono-syllables are selected respectively as the major training and recognition methodology. Each mono-syllable is uttered twice with tone 1 and tone 4, and ten training patterns are used for system implementation. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, hidden Markov model and phonotactics are applied as the two syllable feature models and the recognition model respectively. The correct recognition rates of 90.17%, 84.65%, and 86.69% can be reached for the 82,000 Chinese, 31,000 Arabic, and 3,600 Dutch phrase databases respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67 % correct language-phrase recognition rate can be obtained. The computation time for each system is about 2 seconds. Linear predicted cepstral coefficients Mel-frequency cepstral coefficients Hidden Markov model Phonotactics Speech recognition
19	A Design of Trilingual Speech Recognition System for Chinese, Hakka and Swedish Wu, Chih-Han 10 September 2012 (has links) According to the statistics of Summer Institute of Linguistics, USA, there are about 7,000 languages in the world. Chinese, Hakka and Swedish are all the first 100 most popular languages. Chinese is spoken in Taiwan, Mainland China, Hong Kong and Macau. Hakka is the second popular dialect in Taiwan. The population is only less than that of Taiwanese. The ancestors of Hakka are from the Han people in Honan, China. Hakka culture has been cultivated by enormous migrations since the fourth century, and transformed to represent the tradition. Taiwan and Sweden are developed, free and democratic countries, with similar level of living standard. The ancestors of Sweden are from the Germanic peoples in Northern Europe. Swedish has been also evolved and transformed by massive migrations since the ninth century, sharing the analogous evolution route with Chinese and Hakka. Therefore, it is our objective to establish a trilingual speech recognition system to help verbal communication among languages in the global economic arena. This thesis investigates the design and implementation strategies for a trilingual speech recognition system for Chinese, Hakka and Swedish. Based on their pronunciation rules, the 404 Chinese, 204 Hakka and 369 Swedish common mono-syllables are selected as the major speech training and recognition methodology. A 2,699 two-syllable words database is recorded as the Chinese training corpus. The five rounds with four tones and six rounds with two tones training strategies are used for Hakka and Swedish respectively. Correct rates of 92.29%, 90.70% and 89.09% can be reached for the 82,000 Chinese, 3,900 Hakka and 3,750 Swedish phrase database respectively. Besides, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67% correct language-phrase recognition rate can be obtained. The average computation time for each system is within 2 seconds. Linear predicted cepstral coefficients Phonotactics Hidden Markov model Speech recognition Mel-frequency cepstral coefficients
20	A Design of Trilingual Speech Recognition System for Chinese, Turkish and Tamil Lin, Wei-Ting 10 September 2012 (has links) In this thesis, both Turkish and Tamil, a language spoken in southern India and Sri Lanka, are studied in addition to Mandarin Chinese. It is hoped that the history, culture, and economy behind each language can be acquainted, tasted and appreciated during the learning process. In the ancient Chinese Han and Tang Dynasties, the ¡§Silk Road¡¨ played the most magnificent role to connect among the Oriental China, the Western Turkey and the Southern India as the international trading corridor. In this modern era, Turkey and India are both the most important cotton exporting countries. Moreover, China, Turkey and India have been showing their potential to the newly emerging markets in the world. Therefore, a trilingual speech recognition system is developed and implemented to help us to learn Chinese, Turkish and Tamil, as well as to enhance our understanding to their history and culture. In this trilingual system, linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, hidden Markov model and phonotactics are used as the two syllable feature models and the recognition model respectively. For the Chinese system, a 2,699 two-syllable words database is used as the training corpus. For the Turkish and Tamil systems, a database of 10 utterances per mono-syllable is established by applying their pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with tone 1 and tone 4. The correct rates of 88.30%, 84.21%, and 88.74% can be reached for the 82,000 Chinese, 30,795 Turkish, and 3,500 Tamil phrase databases respectively. The computation time for each system is within 1.5 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be reached with the computation time less than 2 seconds. Linear predicted cepstral coefficients Hidden Markov model Phonotactics Mel-frequency cepstral coefficients Speech recognition system

Search results