Global ETD Search

1	A Design of Speech Inputting System for Chinese Resumes Ciou, Jhao-dong 06 September 2007 (has links) In this thesis, hidden Markov model, maximum likelihood ratio and lexicon search strategy are used to establish a Chinese resume inputting system. The resume contains five items: name introduction, gender, birth date, birth place and education. This system is developed using a PC with an Intel Pentium 1.6 GHz CPU and Red Hat Linux 9.0 operating system. For the speaker-dependent case, a resume can be completed within 45 seconds on the average. Hidden Markov model Mel-frequency cepstrum coefficients
2	A Design of French Speech Recognition System Li, Chun-Ching 24 August 2010 (has links) This thesis investigates the design and implementation strategies for a French speech recognition system. It utilizes the speech features of the 425 common French mono-syllables as the major training and recognition methodology. A training database is established by reading each mono-syllable 12 times in 6 rounds. Every mono-syllable is consecutively read twice with different tones. The first pronounced pattern has high pitch of tone 1,while the second one has falling pitch of tone 4. Mel-frequency cepstrum coefficients, linear predictive cepstrum coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. Under the AMD Athlon xp 2800+ with clock rate 2.2GHz personal computer and Ubuntu 9.04 operating system environment, a correct phrase recognition rate of 86% can be reached for a 3850 French phrase database. The average computation time for each phrase is about 1.5 seconds. Linear predictive cepstrum coefficients Mel-frequency cepstrum coefficients Hidden Markov model
3	A Design of Mandarin Speech Recognition System for Addresses Chang, Ching-Yung 06 September 2004 (has links) A Mandarin speech recognition system for addresses based on MFCC, hidden Markov model (HMM) and Viterbi algorithm is proposed in this thesis. HMM is a doubly stochastic process describing the ways of pronunciation by recording the state transitions according to the time-varing properties of the speech signal. In order to simplify the system design and reduce the computational cost, the mono-syllable structure information in Mandarin is used by incorporating both mono-syllable recognizor and HMM for our system. For the speaker-dependent case, Mandarin address inputting can be accomplished within 60 seconds and 98% correct identification rate can be achieved in the laboratory environment. Mel-frequency cepstrum coefficients Hidden Markov model (HMM) phrase recognition end-point detection
4	A System Design of Chinese Resume by Speech Construction Chen, Yue-sheng 28 August 2006 (has links) A system of Chinese resume by speech construction is developed by the use of a novel segmentation mechanism and the classical Hidden Markov Model. The recognition system is based on both mono-syllable HMM's and speech-text alignment schemes. Experimental results indicate that the amount of training materials used for feature extraction can be greatly reduced, and the text content of the recorded speech training data can be different from those of the recognition tasks as well. Each phrase in the resume can be identified within one second, that is approximately the same as the graduate did last year. Furthermore, the user interface of the resume system has been redesigned and polished by the GTK toolkit in order to enable event-driven X-window operations. Speech-text alignment Hidden Markov model(HMM)
5	A Design of Speech Recognition System for Chinese Names of Historical Figures Around the World Lin, Wei-Ci 07 September 2006 (has links) A design of speech recognition system for Chinese names of historical figures around the world is proposed in this thesis. A speech database of approximately forty-six thousand Chinese names is collected and recorded twice for system evaluation. This system applies Mel-frequency cepstrum coefficients, monosyllable HMM¡¦s and speech-text alignment scheme to accomplish initial candidate selection. A Mandarin pitch identification mechanism is then followed to increase the correct rate and obtain the final answer. The experimental results indicate that a 90% correct identification rate can be achieved, under the condition that the first session recording material is used for training and the second one for testing. For the speaker dependent case, the correct name can be recognized within 1.5 seconds, using a PC with an Intel Celeron 2.4 GHz CPU and RedHat Linux 9.0 Operation System. Hidden Markov model(HMM) Endpoint detection
6	A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition Wang, Long-Cheng 07 September 2006 (has links) A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers¡¦ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task. Speaker recognition Text independent Vector quantization Gaussian mixture model Mel-frequency cepstrum coefficients
7	A design of text-independent medium-size speaker recognition system Zheng, Shun-De 13 September 2002 (has links) This paper presents text-independent speaker identification results for medium-size speaker population sizes up to 400 speakers for TV speech and TIMIT database . A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TV database and TIMIT database. The TV-Database results show medium-size population performance under TV conditions. These are believed to be the first speaker identification experiments on the complete 400 speaker TV databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 94.5% on the TV databases, respectively and 98.5% on the TIMIT database . Gaussian mixture model Vector quantization Speaker recognition Mel-frequency cepstrum coefficients
8	A Design and Applications of Mandarin Keyword Spotting System Hou, Cheng-Kuan 11 August 2003 (has links) A Mandarin keyword spotting system based on MFCC, discrete-time HMM and Viterbi algorithm with DTW is proposed in this thesis. Joining with a dialogue system, this keyword spotting platform is further refined to a prototype of natural speech patient registration system of Kaohsiung Veterans General Hospital. After the ID number is asked by the computer-dialogue attendant in the registration process, the user can finish all relevant works in one sentence. Functions of searching clinical doctors, making and canceling registration are all built in this system. In a laboratory environment, the correct rate of this speaker-independent patient registration system can reach 97% and all registration process can be completed within 75 seconds. Mel-frequency cepstrum coefficients phrase recognition Dynamic Time Warping Keyword spotting Hidden Markov model
9	A Design of Japanese Speech Recognition System Chen, Meng-yang 24 August 2009 (has links) This thesis investigates the design and implementation strategies for a Japanese speech recognition system. It utilizes the speech features of the 188 common Japanese mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Japanese pronunciation rules. These 10 utterances are collected through reading 5 rounds of 188 mono-syllables, where every mono-syllable is consecutively read twice in each round. Mel-frequency cepstrum coefficients, linear predicted cepstrum coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. Under the Pentium 2.4 GHz personal computer and Ubuntu 8.04 operating system environment, a correct phrase recognition rate of 87% can be reached for a 34,000 Japanese phrase database. The average computation time for each phrase is about 1.5 seconds. Linear predicted cepstrum coefficients Hidden Markov model Speech recognition Mel-frequency cepstrum coefficients
10	A Design of Taiwanese Speech Recognition System Jhu, Hao-fu 24 August 2009 (has links) This thesis investigates the design and implementation strategies for a Taiwanese speech recognition system. It adopts a 4 plus 1¡]five times¡^recording strategy, where the 1st four recordings are used for speech feature training and the last recording for speech recognition simulation. Mel-frequency cepstrum coefficients and hidden Markov model are used as the feature model and the recognition model respectively. Under the Intel Celeron 2.4 GHz personal computer and Red Hat Linux 9.0 operating system environment, a correct phrase recognition rate of 90% can be reached for a 4200 Taiwanese phrase database. Speech recognition Mel-frequency cepstrum coefficients Gaussian distribution Hidden Markov model

Search results