Global ETD Search

21	Classification of Affective Emotion in Musical Themes : How to understand the emotional content of the soundtracks of the movies? Diaz Banet, Paula January 2021 (has links) Music is created by composers to arouse different emotions and feelings in the listener, and in the case of soundtracks, to support the storytelling of scenes. The goal of this project is to seek the best method to evaluate the emotional content of soundtracks. This emotional content can be measured quantitatively thanks to Russell’s model of valence, arousal and dominance which converts moods labels into numbers. To conduct the analysis, MFCCs and VGGish features were extracted from the soundtracks and used as inputs to a CNN and an LSTM model, in order to study which one achieved a better prediction. A database of 6757 number of soundtracks with their correspondent VAD values was created to perform the mentioned analysis. As an ultimate purpose, the results of the experiments will contribute to the start-up Vionlabs to understand better the content of the movies and, therefore, make a more accurate recommendation on what users want to consume on Video on Demand platforms according to their emotions or moods. / Musik skapas av kompositörer för att väcka olika känslor och känslor hos lyssnaren, och när det gäller ljudspår, för att stödja berättandet av scener. Målet med detta projekt är att söka den bästa metoden för att utvärdera det emotionella innehållet i ljudspår. Detta känslomässiga innehåll kan mätas kvantitativt tack vare Russells modell av valens, upphetsning och dominans som omvandlar stämningsetiketter till siffror. För att genomföra analysen extraherades MFCC: er och VGGish-funktioner från ljudspåren och användes som ingångar till en CNN- och en LSTM-modell för att studera vilken som uppnådde en bättre förutsägelse. En databas med totalt 6757 ljudspår med deras korrespondent acrshort VAD-värden skapades för att utföra den nämnda analysen. Som ett yttersta syfte kommer resultaten av experimenten att bidra till att starta upp Vionlabs för att bättre förstå innehållet i filmerna och därför ge mer exakta rekommendationer på Video on Demand-plattformar baserat på användarnas känslor eller stämningar. Music emotion recognition Deep learning Feature extraction VGGish Mel-frequency Cepstral Coefficients. Music emotion recognition Deep learning Särdragsextraktion VGGish Mel-frequency Cepstral Coefficients Elektroteknik och elektronik
22	Application of LabVIEW and myRIO to voice controlled home automation Lindstål, Tim, Marklund, Daniel January 2019 (has links) The aim of this project is to use NI myRIO and LabVIEW for voice controlled home automation. The NI myRIO is an embedded device which has a Xilinx FPGA and a dual-core ARM Cortex-A9processor as well as analog input/output and digital input/output, and is programmed with theLabVIEW, a graphical programming language. The voice control is implemented in two differentsystems. The first system is based on an Amazon Echo Dot for voice recognition, which is acommercial smart speaker developed by Amazon Lab126. The Echo Dot devices are connectedvia the Internet to the voice-controlled intelligent personal assistant service known as Alexa(developed by Amazon), which is capable of voice interaction, music playback, and controllingsmart devices for home automation. This system in the present thesis project is more focusingon myRIO used for the wireless control of smart home devices, where smart lamps, sensors,speakers and a LCD-display was implemented. The other system is more focusing on myRIO for speech recognition and was built on myRIOwith a microphone connected. The speech recognition was implemented using mel frequencycepstral coefficients and dynamic time warping. A few commands could be recognized, includinga wake word ”Bosse” as well as other four commands for controlling the colors of a smart lamp. The thesis project is shown to be successful, having demonstrated that the implementation ofhome automation using the NI myRIO with two voice-controlled systems can correctly controlhome devices such as smart lamps, sensors, speakers and a LCD-display. speech recognition home automation labview myrio amazon alexa mel frequency cepstral coefficients dynamic time warping Embedded Systems Inbäddad systemteknik
23	A Design of Recognition Rate Improving Strategy for Japanese Speech Recognition System Lin, Cheng-Hung 24 August 2010 (has links) This thesis investigates the recognition rate improvement strategies for a Japanese speech recognition system. Both training data development and consonant correction scheme are studied. For training data development, a database of 995 two-syllable Japanese words is established by phonetic balanced sieving. Furthermore, feature models for the 188 common Japanese mono-syllables are derived through mixed position training scheme to increase recognition rate. For consonant correction, a sub-syllable model is developed to enhance the consonant recognition accuracy, and hence further improve the overall correct rate for the whole Japanese phrases. Experimental results indicate that the average correct rate for Japanese phrase recognition system with 34 thousand phrases can be improved from 86.91% to 92.38%. Phrase training Sub-syllable model Mel-frequency cepstral coefficients Speech recognition Linear predictive cepstrum coefficients Hidden Markov model
24	Optimizing text-independent speaker recognition using an LSTM neural network Larsson, Joel January 2014 (has links) In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use. speaker recognition speaker identification text-independent long short-term memory lstm mel frequency cepstral coefficients mfcc recurrent neural network speech processing spectral analysis rnnlib htktoolkit
25	Accent Classification from Speech Samples by Use of Machine Learning Carol Pedersen Unknown Date (has links) “Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier. Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed. 08 Information and Computing Sciences accent accent classification machine learning support vector machines mel-frequency cepstral coefficients rule extraction human listeners corpus evaluation
26	Accent Classification from Speech Samples by Use of Machine Learning Carol Pedersen Unknown Date (has links) “Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier. Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed. 08 Information and Computing Sciences accent accent classification machine learning support vector machines mel-frequency cepstral coefficients rule extraction human listeners corpus evaluation
27	[en] INDEPENDENT TEXT ROBUST SPEAKER RECOGNITION IN THE PRESENCE OF NOISE USING PAC-MFCC AND SUB BAND CLASSIFIERS / [pt] RECONHECIMENTO DE LOCUTOR INDEPENDENTE DO TEXTO EM PRESENÇA DE RUÍDO USANDO PAC-MFCC E CLASSIFICADORES EM SUB-BANDAS HARRY ARNOLD ANACLETO SILVA 06 September 2011 (has links) [pt] O presente trabalho é proposto o atributo PAC-MFCC operando com Classificadores em Sub-Bandas para a tarefa de identificação de locutor independente do texto em ruído. O sistema proposto é comparado com os atributos MFCC (Coeficientes Cepestrais de Frequência Mel), PAC- MFCC (Fase Autocorrelação-MFCC ) sem uso de classificadores em sub-bandas, SSCH(Histogramas de Centróides de Sub-Bandas Espectrais) e TECC (Coeficientes Cepestrais da Energia Teager). Nesta tarefa de reconhecimento, utilizou-se a base TIMIT a qual é composta de 630 locutores onde cada um deles falam 10 frases de aproximadamente 3 segundos cada frase, das quais 8 frases foram utilizadas para treinamento e 2 para teste, obtendo-se um total de 1260 locuções para o reconhecimento. Investigou-se o desempenho dos diversos sistemas utilizando diferentes tipos de ruídos da base Noisex 92 com diferentes relação sinal ruído. Verificou-se que a taxa de acerto da técnica PAC-MFCC com classificador em Sub-Bandas apresenta o melhor desempenho em comparação com as outras técnicas quando se tem uma relação sinal ruído menor que 10dB. / [en] In this work is proposed the use of the PAC-MFCC feature with Sub-band Classifiers for the task of text-independent speaker identification in noise. The proposed scheme is compared with the features MFCC (Mel-Frequency Cepstral Coefficients ), PAC-MFCC (Phase Autocorrelation MFCC) without subband classifiers, SSCH (Subband Spectral Centroid Histograms) and TECC (Teager Energy Cepstrum Coefficients). In this recognition task, we used the TIMIT database which consists of 630 speakers, where every one of them speak 10 utterances of 3 seconds each one approximately, of which eight utterance were used for training and two for testing, thus obtaining a total of 1260 test utterance for the recognition. We investigated the performance of these techniques using differents types of noise from the base Noisex 92 with different signal to noise ratios. It was found that the accuracy rate of the PAC-MFCC feature with Sub-band Classifiers performs better in comparison with other techniques at a lower signal noise(less than 10dB). [pt] SUB-BANDA [en] SUB-BAND [en] SPEAKER INDEPENDENT RECONIGTION [en] MEL-FREQUENCY CEPSTRAL COEFFICIENTS
28	Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition Sujatha, J 11 1900 (has links) (PDF) No description available. Speech Recognition Automatic Speech Recognition (ASR) Hiddden Markov Model (HMM) Robust Automatic Speech Recognition Cepstral Mean Normalization (CMN) Computer Science
29	VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ / DEVELOPMENT OF ALGORITHMS FOR GUNSHOT DETECTION Hrabina, Martin January 2019 (has links) Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
30	SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION Jingzhao Dai (6642491) 11 June 2019 (has links) <p>Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].</p> <p>In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.</p> <p>Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.</p> Sparse discrete wavelet decomposition Bandpass filter banks Support vector machine Support vector classification Random forest K nearest neighbors Convolutional neural networks

Search results