This thesis established the DSP-based and PC-based system for speech keyword retrieval and recognition according to the same basic algorithm. This system does not need to train speech models, and the keywords and describing sentences do not put the limit of the number of words and could be any language.
Before calculating the speech features, the speech signal need to be pre-processed. The pre-process includes DC bias removing, segment, Rabiner & Sambur end point detection, pre-emphasis, and windowing. About the speech features, the system used 12 degrees of Mel-Frequency cepstral coefficient and 12 degrees of delta coefficient to make a 24-degreed speech feature. The key point of the system is the process of pattern comparison. The system adopted dynamic time warping cooperating with one pass algorithm to improve the optimal process. In order to attain the DSP system, using an optimum likelihood ratio threshold to be the determine standard for not keyword rejection. All of the keywords use the same threshold in the method. It improves the original method which uses least differential to set up the threshold by reducing the requirement of ram.
After testing in the experiments, the speech keyword retrieval and recognition system both have great recognition and efficiency.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0727104-134444 |
Date | 27 July 2004 |
Creators | Juang, Bo-Ya |
Contributors | Zai-Jun Yu, Yung-Chun Wu, I-Chih Kao, Chin-Ching Huang, Tzuen-lih Chern |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0727104-134444 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0014 seconds