Return to search

A Design of Trilingual Speech Recognition System for Chinese, Italian and Farsi

China, Italy and Iran are seemingly quite different in language, history, culture and economy. However, there have been existed mutual interactions among these three countries during the past age. In the fourth century, the Chinese Northern Wei Dynasty established close relation with the Persian Empire, located in Iran today. Persian language is also called Farsi in her native name. The unearthed silver bowls from China in the recent years showed similar appearance and material with the Sassanid-Persian silverware of Iran. Archaeologists found that ancient China and Iran used to be close international trading partners. In the thirteenth century, Marco-Polo, an Italian travel adventurer and merchant, visited Chinese Yuan Dynasty, and wrote a marvelous book ¡§The Travels of Marco-Polo¡¨. Fantastic experiences in China were depicted in this journal, and these initiated the Sino-Italian relation in the early days. Armani suits and Ferrari super racers become the oriental passion to the Italy in the Modern China, and this may represent the achievement of Asian-European culture exchange. Therefore, it is our objective to design a trilingual speech recognition system to help us to learn Chinese, Italian and Farsi languages.
Linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, hidden Markov model and phonotactics are used in this system as the two syllable feature models and the recognition model respectively. For the Chinese system, a 2,699 two-syllable words database is used as the training corpus. For the Italian and Farsi systems, a database of 10 utterances per mono-syllable is established by applying their pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with tone 1 and tone 4. The correct recognition rates of 87.54%, 87.48%, and 90.33% can be reached for the 82,000 Chinese, 27,900 Italian, and 4,000 Farsi phrase databases respectively. The computation time for each system is within 1.5 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67 % correct language-phrase recognition rate can be obtained with the computation time about 2 seconds.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0910112-155203
Date10 September 2012
CreatorsJiang, Wei-Sheng
ContributorsChii-Maw Uang, Chih-Chien Chen, Sheau-Shong Bor
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0910112-155203
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0018 seconds