With the development of multimedia technology, speech recognition technology has increasingly become a hotspot of research in recent years. It has a wide range of applications, which deals with recognizing the identity of the speakers that can be classified into speech identification and speech verification according to decision modes.The main work of this thesis is to study and research the techniques, algorithms of speech recognition, thus to create a feasible system to simulate the speech recognition. The research work and achievements are as following: First: The author has done a lot of investigation in the field of speech recognition with the adequate research and study. There are many algorithms about speech recognition, to sum up, the algorithms can divided into two categories, one of them is the direct speech recognition, which means the method can recognize the words directly, and another prefer the second method that recognition based on the training model. Second: find a useable and reasonable algorithm and make research about this algorithm. Besides, the author has studied algorithms, which are used to extract the word's characteristic parameters based on MFCC(Mel frequency Cepstrum Coefficients) , and training the Characteristic parameters based on the GMM(Gaussian mixture mode) . Third: The author has used the MATLAB software and written a program to implement the speech recognition algorithm and also used the speech process toolbox in this program. Generally speaking, whole system includes the module of the signal process, MFCC characteristic parameter and GMM training. Forth: Simulation and analysis the results. The MATLAB system will read the wav file, play it first, and then calculate the characteristic parameters automatically. All content of the speech signal have been distinguished in the last step. In this paper, the author has recorded speech from different people to test the systems and the simulation results shown that when the testing environment is quiet enough and the speaker is the same person to record for 20 times, the performance of the algorithm is approach to 100% for pair of words in different and same syllable. But the result will be influenced when the testing signal is surrounded with certain noise level. The simulation system won’t work with a good output, when the speaker is not the same one for recording both reference and testing signal.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hig-16950 |
Date | January 2014 |
Creators | Pan, Linlin |
Publisher | Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0023 seconds