Return to search

A Design of Recognition Rate Improving Strategy For English Speech Recognition System

Britain established the status of maritime hegemony in 1588. The English language along with the British colonized activities was spread to North America, India, Africa and Australia. After the end of World War I in 1918, the U.S. became the most powerful nation in the world economy. And at the same time, the world financial center was shifted to New York from London. In 1945, the World War II ended, the U.S. further played indispensable role in each aspect of international politics, economy and technologies. The United Nation, founded on October 24, 1945, adopted English, Chinese, French, Spanish, Arabic as well as Russian as the six working languages. These historical events facilitated a succession of language expansion and caused English to be the most widely used international language. Beside the political, economic and technological superiority, Britain owns the largest comprehensive museum in the globe, the British Museum. This Museum was located in London, built in 1753, and more than 13 million cultural relics of archaeology from around the world were collected. Her cultural resources are remarkably rich. It is our objective to build a language system that can help us to learn English more effectively and to widen our vision of living at the same time.
This thesis investigates the recognition rate improvement strategies for an English speech recognition system. It utilizes the speech features of the 989 common English mono-syllables as the major training and recognition methodology. A training database is established by reading each mono-syllable 14 rounds. Each one of the 989 mono-syllables is consecutively read with two different tones at alternate rounds. The odd pronounced rounds have high pitch of tone 1, while the even rounds have falling pitch of tone 4. The pitch period frame method is applied for enhancing the accuracy of end point detection. Mel-frequency cepstral coefficients, linear predictive cepstral coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. The number of HMM states is adjusted to 10 and the phonotactical rule is used for the recognition rate improvement. Under the Core ™ i5 CPU M450 notebook computer with 2.4GHz clock rate and Fedora 14 operating system environment, a 92.94% correct phrase recognition rate can be reached for a 6,812 English phrase database. The average computation time for each phrase is within 1.5 seconds.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0827111-202317
Date27 August 2011
CreatorsHung, Ming-Chang
ContributorsChii-Maw Uang, Tsung Lee, Xiao-Song Bo, Er-Hui Lu, Chih-Chien Chen
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0827111-202317
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0021 seconds