Spelling suggestions: "subject:"cepstral coefficients"" "subject:"kepstral coefficients""
11 |
A Design of Trilingual Speech Recognition System for Chinese, English and VietnameseTzeng, Yi-Ying 10 September 2012 (has links)
History, culture and economy constitute the foundation of language. Mandarin Chinese is our native language, spoken by over 1.2 billion people. Its population is ranked number one in the world. In the recent years, the emerging China not only possesses market and labor forces, but also develops the Chinese culture circle in Asia. British history and American politics make English the most influential language in the 20th century. Vietnam has been under the profound influence of Chinese culture. The reformed and opened economy in the past decade brought her tremendous foreign investments, including those from Taiwan. It is our objective to establish a trilingual system for travel, living and speech learning.
This thesis investigates the design and implementation strategies for a trilingual speech recognition system of Chinese, English and Vietnamese. It utilizes the speech features of 404 Chinese, 925 English and 154 Vietnamese mono-syllables as the major training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct rates of 88.16%, 82.74% and 87.45% can be reached using phonotactical rules for the 82,000 Chinese, 30,795 English and 3,300 Vietnamese phrase database respectively. The computation for each system can be completed within 2 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be obtained with the computation time less than 2 seconds.
|
12 |
A Design of Trilingual Speech Recognition System for Chinese, Italian and FarsiJiang, Wei-Sheng 10 September 2012 (has links)
China, Italy and Iran are seemingly quite different in language, history, culture and economy. However, there have been existed mutual interactions among these three countries during the past age. In the fourth century, the Chinese Northern Wei Dynasty established close relation with the Persian Empire, located in Iran today. Persian language is also called Farsi in her native name. The unearthed silver bowls from China in the recent years showed similar appearance and material with the Sassanid-Persian silverware of Iran. Archaeologists found that ancient China and Iran used to be close international trading partners. In the thirteenth century, Marco-Polo, an Italian travel adventurer and merchant, visited Chinese Yuan Dynasty, and wrote a marvelous book ¡§The Travels of Marco-Polo¡¨. Fantastic experiences in China were depicted in this journal, and these initiated the Sino-Italian relation in the early days. Armani suits and Ferrari super racers become the oriental passion to the Italy in the Modern China, and this may represent the achievement of Asian-European culture exchange. Therefore, it is our objective to design a trilingual speech recognition system to help us to learn Chinese, Italian and Farsi languages.
Linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, hidden Markov model and phonotactics are used in this system as the two syllable feature models and the recognition model respectively. For the Chinese system, a 2,699 two-syllable words database is used as the training corpus. For the Italian and Farsi systems, a database of 10 utterances per mono-syllable is established by applying their pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with tone 1 and tone 4. The correct recognition rates of 87.54%, 87.48%, and 90.33% can be reached for the 82,000 Chinese, 27,900 Italian, and 4,000 Farsi phrase databases respectively. The computation time for each system is within 1.5 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67 % correct language-phrase recognition rate can be obtained with the computation time about 2 seconds.
|
13 |
A Design of Trilingual Speech Recognition System for Chinese, Portuguese and HindiWang, Yu-an 10 September 2012 (has links)
The BRICS, Brazil, Russia, India, China and South Africa, have been making a significant amount of contribution to the global economy growth in the past few years. China possesses not only the largest population, but also the most splendid history in the world. During the recent years, the rapid development on all respects, including the enhanced economic trade with Taiwan, has made China in the line of the Super Powers. Brazil is the largest Portuguese speaking country in the world, where the world class manufacturer Foxconn Technology decided to build Apple iPad/iPhone factory in 2011. India has been flourishing in software, tele-communications and aviation industries since last decade. Offshore outsourcing consulting is so popular due to cost-down policy of the Western companies. Chinese, Portuguese and Hindi speaking population are over 1.573 billion, and account for over 22% of the world population. Therefore, it is our objective to establish a trilingual speech recognition system to help verbal communication and cultural understanding among languages.
This thesis investigates the design and implementation strategies for a trilingual speech recognition system for Chinese, Portuguese and Hindi. Based on their pronunciation rules, the 404 Chinese, 515 Portuguese and 244 Hindi common mono-syllables are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct phrase recognition rates of 87.69%, 85.14% and 86.74% can be reached using phonotactical rules for the 82,000 Chinese, 30,000 Portuguese and 3,900 Hindi phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be reached. The average computation time for each system is within 2 seconds.
|
14 |
A Design of Trilingual Speech Recognition System for Chinese, Arabic and DutchTu, Ming-hui 10 September 2012 (has links)
Chinese as well as Arabic is one of the six official languages in the United Nations. The population of Chinese is over 1.2 billion, ranked number one in the world. Arabic, a language used in the Arab World, has a more than 2,800 year history. Her religion, culture and oil economy have been making far-reaching effects around the globe. The worldwide energy supply greatly relies on the petroleum from the Arab World. Netherland, whose official language is Dutch, has been an international trading power since ancient time. She has become an industrial giant today. Recently, European-study-abroad is getting more popular, many famous Netherland universities offer opportunities for foreign students. Therefore, it is our objective to design a trilingual speech recognition system to help us learn Chinese, Arabic and Dutch, as well as appreciate their profound history and beautiful culture.
This thesis investigates the design and implementation strategies for a Chinese, Arabic and Dutch speech recognition system. A 2,699 two-syllable recorded words database is utilized as the Chinese training corpus. For the Arabic and Dutch systems, 396 and 205 common mono-syllables are selected respectively as the major training and recognition methodology. Each mono-syllable is uttered twice with tone 1 and tone 4, and ten training patterns are used for system implementation. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, hidden Markov model and phonotactics are applied as the two syllable feature models and the recognition model respectively. The correct recognition rates of 90.17%, 84.65%, and 86.69% can be reached for the 82,000 Chinese, 31,000 Arabic, and 3,600 Dutch phrase databases respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67 % correct language-phrase recognition rate can be obtained. The computation time for each system is about 2 seconds.
|
15 |
A Design of Trilingual Speech Recognition System for Chinese, Hakka and SwedishWu, Chih-Han 10 September 2012 (has links)
According to the statistics of Summer Institute of Linguistics, USA, there are about 7,000 languages in the world. Chinese, Hakka and Swedish are all the first 100 most popular languages. Chinese is spoken in Taiwan, Mainland China, Hong Kong and Macau. Hakka is the second popular dialect in Taiwan. The population is only less than that of Taiwanese. The ancestors of Hakka are from the Han people in Honan, China. Hakka culture has been cultivated by enormous migrations since the fourth century, and transformed to represent the tradition. Taiwan and Sweden are developed, free and democratic countries, with similar level of living standard. The ancestors of Sweden are from the Germanic peoples in Northern Europe. Swedish has been also evolved and transformed by massive migrations since the ninth century, sharing the analogous evolution route with Chinese and Hakka. Therefore, it is our objective to establish a trilingual speech recognition system to help verbal communication among languages in the global economic arena.
This thesis investigates the design and implementation strategies for a trilingual speech recognition system for Chinese, Hakka and Swedish. Based on their pronunciation rules, the 404 Chinese, 204 Hakka and 369 Swedish common mono-syllables are selected as the major speech training and recognition methodology. A 2,699 two-syllable words database is recorded as the Chinese training corpus. The five rounds with four tones and six rounds with two tones training strategies are used for Hakka and Swedish respectively. Correct rates of 92.29%, 90.70% and 89.09% can be reached for the 82,000 Chinese, 3,900 Hakka and 3,750 Swedish phrase database respectively. Besides, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67% correct language-phrase recognition rate can be obtained. The average computation time for each system is within 2 seconds.
|
16 |
A Design of Trilingual Speech Recognition System for Chinese, Turkish and TamilLin, Wei-Ting 10 September 2012 (has links)
In this thesis, both Turkish and Tamil, a language spoken in southern India and Sri Lanka, are studied in addition to Mandarin Chinese. It is hoped that the history, culture, and economy behind each language can be acquainted, tasted and appreciated during the learning process. In the ancient Chinese Han and Tang Dynasties, the ¡§Silk Road¡¨ played the most magnificent role to connect among the Oriental China, the Western Turkey and the Southern India as the international trading corridor. In this modern era, Turkey and India are both the most important cotton exporting countries. Moreover, China, Turkey and India have been showing their potential to the newly emerging markets in the world. Therefore, a trilingual speech recognition system is developed and implemented to help us to learn Chinese, Turkish and Tamil, as well as to enhance our understanding to their history and culture.
In this trilingual system, linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, hidden Markov model and phonotactics are used as the two syllable feature models and the recognition model respectively. For the Chinese system, a 2,699 two-syllable words database is used as the training corpus. For the Turkish and Tamil systems, a database of 10 utterances per mono-syllable is established by applying their pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with tone 1 and tone 4. The correct rates of 88.30%, 84.21%, and 88.74% can be reached for the 82,000 Chinese, 30,795 Turkish, and 3,500 Tamil phrase databases respectively. The computation time for each system is within 1.5 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be reached with the computation time less than 2 seconds.
|
17 |
Security in Voice AuthenticationYang, Chenguang 27 March 2014 (has links)
We evaluate the security of human voice password databases from an information theoretical point of view. More specifically, we provide a theoretical estimation on the amount of entropy in human voice when processed using the conventional GMM-UBM technologies and the MFCCs as the acoustic features. The theoretical estimation gives rise to a methodology for analyzing the security level in a corpus of human voice. That is, given a database containing speech signals, we provide a method for estimating the relative entropy (Kullback-Leibler divergence) of the database thereby establishing the security level of the speaker verification system. To demonstrate this, we analyze the YOHO database, a corpus of voice samples collected from 138 speakers and show that the amount of entropy extracted is less than 14-bits. We also present a practical attack that succeeds in impersonating the voice of any speaker within the corpus with a 98% success probability with as little as 9 trials. The attack will still succeed with a rate of 62.50% if 4 attempts are permitted. Further, based on the same attack rationale, we mount an attack on the ALIZE speaker verification system. We show through experimentation that the attacker can impersonate any user in the database of 69 people with about 25% success rate with only 5 trials. The success rate can achieve more than 50% by increasing the allowed authentication attempts to 20. Finally, when the practical attack is cast in terms of an entropy metric, we find that the theoretical entropy estimate almost perfectly predicts the success rate of the practical attack, giving further credence to the theoretical model and the associated entropy estimation technique.
|
18 |
Perceiving Emotion in Sounds: Does Timbre Play a Role?Bowman, Casady 2011 December 1900 (has links)
Acoustic features of sound such as pitch, loudness, perceived duration and timbre have been shown to be related to emotion in regard to sound, demonstrating that an important connection between the perceived emotions and their timbres is lacking. This study investigates the relationship between acoustic features of sound and emotion in regard to timbre. In two experiments we investigated whether particular acoustic components of sound can predict timbre, and particular categories of emotion, and how these attributes are related. Two behavioral experiments related perceived emotion ratings with synthetically created sounds and International Affective Digitized Sounds (Bradley & Lang, 2007) sounds. Also, two timbre experiments found acoustic components of synthetically created sounds, and IADS. Regression analyses uncovered some relationships between emotion, timbre, and acoustic features of sound. Results indicate that emotion is perceived differently for synthetic instrumental sounds and IADS. Mel-frequency cepstral coefficients were a strong predictor of perceived emotion of instrumental sounds; however, this was not the case for the IADS. This difference lends itself to the idea that there is a strong relationship between emotion and timbre for instrumental sounds, perhaps in part because of their relationship to speech and the way these different sounds are processed.
|
19 |
A Design of Korean Speech Recognition SystemWu, Bing-Yang 24 August 2010 (has links)
This thesis investigates the design and implementation strategies for a Korean speech recognition system. It utilizes the speech features of the common Korean mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Korean pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1,while the second one has falling pitch of tone 4.Mel-frequency cepstral coefficients, linear predictive cepstrum coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. Under the Pentium 2.4 GHz personal computer and Ubuntu 9.04 operating system environment, a correct phrase recognition rate of 92.25% can be reached for a 4865 Korean phrase database. The average computation time for each phrase is about 1.5 seconds.
|
20 |
Prediction and Estimation of Random FieldsKohli, Priya 2012 August 1900 (has links)
For a stationary two dimensional random field, we utilize the classical Kolmogorov-Wiener theory to develop prediction methodology which requires minimal assumptions on the dependence structure of the random field. We also provide solutions for several non-standard prediction problems which deals with the "modified past," in which a finite number of observations are added to the past. These non-standard prediction problems are motivated by the network site selection in the environmental and geostatistical applications. Unlike the time series situation, the prediction results for random fields seem to be expressible only in terms of the moving average parameters, and attempts to express them in terms of the autoregressive parameters lead to a new and mysterious projection operator which captures the nature of edge-effects. We put forward an approach for estimating the predictor coefficients by carrying out an extension of the exponential models. Through simulation studies and real data example, we demonstrate the impressive performance of our prediction method. To the best of our knowledge, the proposed method is the first to deliver a unified framework for forecasting random fields both in the time and spectral domain without making a subjective choice of the covariance structure.
Finally, we focus on the estimation of the hurst parameter for long range dependence stationary random fields, which draws its motivation from applications in the environmental and atmospheric processes. Current methods for estimation of the Hurst parameter include parametric models like fractional autoregressive integrated moving average models, and semiparametric estimators which are either inefficient or inconsistent. We propose a novel semiparametric estimator based on the fractional exponential spectrum. We develop three data-driven methods which can automatically select the optimal model order for the fractional exponential models. Extensive simulation studies and analysis of Mercer and Hall?s wheat data are used to illustrate the performance of the proposed estimator and model order selection criteria. The results show that our estimator outperforms existing estimators, including the GPH (Geweke and Porter-Hudak) estimator. We show that the proposed estimator is consistent, works for different definitions of long range dependent random fields, is computationally simple and is not susceptible to model misspecification or poor efficiency.
|
Page generated in 0.5204 seconds