<p>Speech recognition is widely applied to
translation from speech to related text, voice driven commands, human machine
interface and so on [1]-[8]. It has been increasingly proliferated to Human’s
lives in the modern age. To improve the accuracy of speech recognition, various
algorithms such as artificial neural network, hidden Markov model and so on
have been developed [1], [2].</p>
<p>In this thesis work, the tasks of speech
recognition with various classifiers are investigated. The classifiers employed
include the support vector machine (SVM), k-nearest neighbors (KNN), random
forest (RF) and convolutional neural network (CNN). Two novel features extraction
methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering
(BPF) based on the Mel filter banks [9] are developed and proposed. In order to
meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional
(2D) features are required to be obtained. The 1D features are the array of
power coefficients in frequency bands, which are dedicated for training SVM,
KNN and RF classifiers while the 2D features are formed both in frequency domain
and temporal variations. In fact, the 2D feature consists of the power values
in decomposed bands versus consecutive speech frames. Most importantly, the 2D
feature with geometric transformation are adopted to train CNN.</p>
<p>Speech recognition including males and females
are from the recorded data set as well as the standard data set. Firstly, the
recordings with little noise and clear pronunciation are applied with the
proposed feature extraction methods. After many trials and experiments using
this dataset, a high recognition accuracy is achieved. Then, these feature
extraction methods are further applied to the standard recordings having random
characteristics with ambient noise and unclear pronunciation. Many experiment
results validate the effectiveness of the proposed feature extraction techniques.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/8050565 |
Date | 11 June 2019 |
Creators | Jingzhao Dai (6642491) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/SPARSE_DISCRETE_WAVELET_DECOMPOSITION_AND_FILTER_BANK_TECHNIQUES_FOR_SPEECH_RECOGNITION/8050565 |
Page generated in 0.0048 seconds