Global ETD Search

21	Accent Classification from Speech Samples by Use of Machine Learning Carol Pedersen Unknown Date (has links) “Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier. Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed. 08 Information and Computing Sciences accent accent classification machine learning support vector machines mel-frequency cepstral coefficients rule extraction human listeners corpus evaluation
22	Accent Classification from Speech Samples by Use of Machine Learning Carol Pedersen Unknown Date (has links) “Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier. Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed. 08 Information and Computing Sciences accent accent classification machine learning support vector machines mel-frequency cepstral coefficients rule extraction human listeners corpus evaluation
23	[en] INDEPENDENT TEXT ROBUST SPEAKER RECOGNITION IN THE PRESENCE OF NOISE USING PAC-MFCC AND SUB BAND CLASSIFIERS / [pt] RECONHECIMENTO DE LOCUTOR INDEPENDENTE DO TEXTO EM PRESENÇA DE RUÍDO USANDO PAC-MFCC E CLASSIFICADORES EM SUB-BANDAS HARRY ARNOLD ANACLETO SILVA 06 September 2011 (has links) [pt] O presente trabalho é proposto o atributo PAC-MFCC operando com Classificadores em Sub-Bandas para a tarefa de identificação de locutor independente do texto em ruído. O sistema proposto é comparado com os atributos MFCC (Coeficientes Cepestrais de Frequência Mel), PAC- MFCC (Fase Autocorrelação-MFCC ) sem uso de classificadores em sub-bandas, SSCH(Histogramas de Centróides de Sub-Bandas Espectrais) e TECC (Coeficientes Cepestrais da Energia Teager). Nesta tarefa de reconhecimento, utilizou-se a base TIMIT a qual é composta de 630 locutores onde cada um deles falam 10 frases de aproximadamente 3 segundos cada frase, das quais 8 frases foram utilizadas para treinamento e 2 para teste, obtendo-se um total de 1260 locuções para o reconhecimento. Investigou-se o desempenho dos diversos sistemas utilizando diferentes tipos de ruídos da base Noisex 92 com diferentes relação sinal ruído. Verificou-se que a taxa de acerto da técnica PAC-MFCC com classificador em Sub-Bandas apresenta o melhor desempenho em comparação com as outras técnicas quando se tem uma relação sinal ruído menor que 10dB. / [en] In this work is proposed the use of the PAC-MFCC feature with Sub-band Classifiers for the task of text-independent speaker identification in noise. The proposed scheme is compared with the features MFCC (Mel-Frequency Cepstral Coefficients ), PAC-MFCC (Phase Autocorrelation MFCC) without subband classifiers, SSCH (Subband Spectral Centroid Histograms) and TECC (Teager Energy Cepstrum Coefficients). In this recognition task, we used the TIMIT database which consists of 630 speakers, where every one of them speak 10 utterances of 3 seconds each one approximately, of which eight utterance were used for training and two for testing, thus obtaining a total of 1260 test utterance for the recognition. We investigated the performance of these techniques using differents types of noise from the base Noisex 92 with different signal to noise ratios. It was found that the accuracy rate of the PAC-MFCC feature with Sub-band Classifiers performs better in comparison with other techniques at a lower signal noise(less than 10dB). [pt] SUB-BANDA [en] SUB-BAND [en] SPEAKER INDEPENDENT RECONIGTION [en] MEL-FREQUENCY CEPSTRAL COEFFICIENTS
24	Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition Sujatha, J 11 1900 (has links) (PDF) No description available. Speech Recognition Automatic Speech Recognition (ASR) Hiddden Markov Model (HMM) Robust Automatic Speech Recognition Cepstral Mean Normalization (CMN) Computer Science
25	VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ / DEVELOPMENT OF ALGORITHMS FOR GUNSHOT DETECTION Hrabina, Martin January 2019 (has links) Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
26	SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION Jingzhao Dai (6642491) 11 June 2019 (has links) <p>Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].</p> <p>In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.</p> <p>Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.</p> Sparse discrete wavelet decomposition Bandpass filter banks Support vector machine Support vector classification Random forest K nearest neighbors Convolutional neural networks
27	Channel Compensation for Speaker Recognition Systems Neville, Katrina Lee, katrina.neville@rmit.edu.au January 2007 (has links) This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: Convolutional (or lowpass filtering) effects Addition of white Gaussian noise Three different methods of channel compensation were tested: Cepstral Mean Subtraction (CMS) RelAtive SpecTrAl (RASTA) Processing Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB). Speaker recognition signal processing speech processing channel effects channel compensation channel equalisation adaptive filtering RelAtive SpecTral processing RASTA Cepstral Mean Subtraction Constant Modulus Algorithm speech filtering Mel-Frequency Cepstral coefficients
28	Semantic Classification And Retrieval System For Environmental Sounds Okuyucu, Cigdem 01 October 2012 (has links) (PDF) The growth of multimedia content in recent years motivated the research on audio classification and content retrieval area. In this thesis, a general environmental audio classification and retrieval approach is proposed in which higher level semantic classes (outdoor, nature, meeting and violence) are obtained from lower level acoustic classes (emergency alarm, car horn, gun-shot, explosion, automobile, motorcycle, helicopter, wind, water, rain, applause, crowd and laughter). In order to classify an audio sample into acoustic classes, MPEG-7 audio features, Mel Frequency Cepstral Coefficients (MFCC) feature and Zero Crossing Rate (ZCR) feature are used with Hidden Markov Model (HMM) and Support Vector Machine (SVM) classifiers. Additionally, a new classification method is proposed using Genetic Algorithm (GA) for classification of semantic classes. Query by Example (QBE) and keyword-based query capabilities are implemented for content retrieval.
29	Online detekce jednoduchých příkazů v audiosignálu / Online detection of simple voice commands in audiosignal Zezula, Miroslav January 2011 (has links) This thesis describes the development of voice module, that can recognize simple speech commands by comparation of input sound with recorded templates. The first part of thesis contains a description of used algorithm and a verification of its functionality. The algorithm is based on Mel-frequency cepstral coefficients and dynamic time warping. Thereafter the hardware of voice module is designed, containing signal controller 56F805 from Freescale. The signal from microphone is conditioned by operational amplifiers and digital filter. The third part deals with the development of software for the controller and describes the fixed point implementation of the algorithm, respecting limited capabilities of the controller. Final test proves the usability of voice module in low-noise environment.
30	Biometric Multi-modal User Authentication System based on Ensemble Classifier Assaad, Firas Souhail January 2014 (has links) No description available. Bioinformatics Computer Engineering Computer Science Artificial Intelligence algorithm authentication biometric classifier credentials eigenfaces multi-modal recognition score level fusion MFCC Viola Jones LBG Mel Frequency Cepstral Coefficients voice print Linde Buzo Gray ELSDSR Yale Extended NIST FERET Server client

Search results