Spelling suggestions: "subject:"endpoint detection"" "subject:"andpoint detection""
1 |
A design of speech recognition system for one hundred thousand Chinese namesTu, Chiu-chuan 06 September 2007 (has links)
The objective of this thesis is to design and implement a speech recognition system for one hundred thousand Chinese names. Mel frequency cepstrum coefficient, hidden Markov model and lexicon search strategy are utilized to choose the name candidates. Furthermore, a mandarin intonation technique is also incorporated into this system to increase the final speech recognition accuracy.
The experimental results indicate that for the speaker dependent case, an 85% correct rate can be achieved by use of the proposed intonation classification scheme and the balanced monosyllable training database. The above correct rate has an increase of 8% over the previous method without using these two techniques. Under Redhat Linux 9.0 environment, a mandarin name can be recognized within 2 seconds by the use of a computer with Intel Celeron 2.4 GHz CPU.
|
2 |
A Design of Speech Recognition System for Chinese NamesChen, Yu-Te 11 August 2003 (has links)
A design of speech recognition system for Chinese names has been established in this thesis. By identifying surname first, that is an unique feature of the Chinese names, the classification accuracy and computational time of the system can be greatly improved.
This research is primarily based on hidden Markov model (HMM), a technique that is widely used in speech recognition. HMM is a doubly stochastic process describing the ways of pronumciation by recording the state transitions according to the time-varing properties of the speech signal. The results of the HMM are compared with those of the segmental probability model (SPM) to figure out better option in recognizing base-syllables. Under the conditions of equal segments, SPM not only suits Mandarin base-syllable structure, but also achieves the goal of simplifying system since it does not need to find the best transformation of the utterance.
A speaker-independent 3000 Chinese names recognition system has been implemented based on the Mandarin microphone database recorded in the laboratory environment.
|
3 |
Development of Sound Database for Fishes in Taiwan by Relational ModelLiou, Yu-lin 31 August 2009 (has links)
The goal of development of sound database for marine fishes in Taiwan not only preserves data, but also wants to provide a common ground of data sharing to increase the efficiency for the study of fish behavior, automatic recognition, localization, and tracking. In order to provide the sound quality in terms of signal-to-noise ratio to users, the fish sound recording will be analyzed before uploading. Because most available data were recorded either in the field or in fish tank, the fish sounds were extracted by using two different automatic detection methods. If fish sound recordings were from the field, the Time Endpoint Detection was applied by the processing a 0.5-s time frame with 50 % overlapping. Then the energy of the time frame was obtained by the sum of square of amplitude and the median of the energy plus a standard deviation was established as the threshold to extract fish sounds. If the recording was made in the fish tank, the Frequency Endpoint Detection was applied by 0.5-s time frame with 50 % overlapping. Then each time frame will be transformed into spectrum and the energy ratio of each frequency will be calculated from the spectrum. Finally the information entropy was obtained from the energy ratio and the detection threshold was set on standard deviation above the median of the information entropy. From two different automatic detection methods, the sound quality was presented in the signal-to-noise ratio, which was the average power of signal divided by average power of the background noise. The fish sound database was a 3-Tier system and developed by PHP and MySQL. In order to reduce the storage size and maintain the integrity of data, the Relational Model was applied. Firstly, the recording data were conceptually represented as Entity-Relationship Diagram(ERD). Secondly, the ERD was transformed to relational schemas. Thirdly, the schemas was normalized by first, second, and third forms. To improve the users¡¦ efficiency the sound database provides three interfaces. One was data uploading, another was data searching according to the keyword of creature name, recording area, and recording time, the other was data comparing by recording number.
|
4 |
Speech Endpoint Detection: An Image Segmentation ApproachFaris, Nesma January 2013 (has links)
Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms.
|
5 |
A Hybrid Design of Speech Recognition System for Chinese NamesHsu, Po-Min 06 September 2004 (has links)
A speech recognition system for Chinese names based on Karhunen Loeve transform (KLT), MFCC, hidden Markov model (HMM) and Viterbi algorithm is proposed in this thesis. KLT is the optimal transform in minimum mean square error and maximal energy packing sense to reduce data. HMM is a stochastic approach which characterizes many of the variability in speech signal by recording the state transitions. For the speaker-dependent case, the correct identification rate can be achieved 93.97% within 3 seconds in the laboratory environment.
|
6 |
A Design of Speech Recognition System for Chinese Names of Historical Figures Around the WorldLin, Wei-Ci 07 September 2006 (has links)
A design of speech recognition system for Chinese names of historical figures around the world is proposed in this thesis. A speech database of approximately forty-six thousand Chinese names is collected and recorded twice for system evaluation. This system applies Mel-frequency cepstrum coefficients, monosyllable HMM¡¦s and speech-text alignment scheme to accomplish initial candidate selection. A Mandarin pitch identification mechanism is then followed to increase the correct rate and obtain the final answer. The experimental results indicate that a 90% correct identification rate can be achieved, under the condition that the first session recording material is used for training and the second one for testing. For the speaker dependent case, the correct name can be recognized within 1.5 seconds, using a PC with an Intel Celeron 2.4 GHz CPU and RedHat Linux 9.0 Operation System.
|
7 |
Speech Endpoint Detection: An Image Segmentation ApproachFaris, Nesma January 2013 (has links)
Speech Endpoint Detection, also known as Speech Segmentation, is an unsolved problem in speech processing that affects numerous applications including robust speech recognition. This task is not as trivial as it appears, and most of the existing algorithms degrade at low signal-to-noise ratios (SNRs). Most of the previous research approaches have focused on the development of robust algorithms with special attention being paid to the derivation and study of noise robust features and decision rules. This research tackles the endpoint detection problem in a different way, and proposes a novel speech endpoint detection algorithm which has been derived from Chan-Vese algorithm for image segmentation. The proposed algorithm has the ability to fuse multi features extracted from the speech signal to enhance the detection accuracy. The algorithm performance has been evaluated and compared to two widely used speech detection algorithms under various noise environments with SNR levels ranging from 0 dB to 30 dB. Furthermore, the proposed algorithm has also been applied to different types of American English phonemes. The experiments show that, even under conditions of severe noise contamination, the proposed algorithm is more efficient as compared to the reference algorithms.
|
Page generated in 0.08 seconds