Global ETD Search

41	Extratores de características acústicas inspirados no sistema periférico auditivo / Acoustic features extractors inspired in the peripheral auditory system Almeida, Christiane Raulino 08 October 2014 (has links) Extracting information from acoustic signals is a common task in signal processing and pattern recognition. Broadly speaking, the processing system has, as initial task, to obtain a low-dimensional representation of the acoustic signal, extracted trough computational methods called feature extractors. This representation aims to present the sound of speech in a more convenient form to extract the information contained in the signal. Considering the initial task of processing systems, this work presents a detailed study of three classic methods for features extracting, namely: the Mel - Frequency Cepstrum Coefficients (MFCC), the Ensemble Interval Histogram (EIH), and the Zero Crossing with Peak amplitudes (ZCPA). Still in the literature review step, a study of the human peripheral auditory system was accomplished, since the EIH and ZCPA methods are based on models of human hearing. Moreover, a new extraction method based on detection of level crossings was developed, called here as Elementary Acoustic Events (EAE). In order to compare the methods, both reviewed and developed, two different experiments were applied in this work. At first, experiments with additive noise and channel effects for robustness analysis methods were performed. Finally, experiments related to the task of isolated word recognition were applied using alignment Dynamic Time Warping (DTW). The results suggest that the proposed method is more robust than the classical methods implemented, for the proposed experiments. / Extrair informações de sinais acústicos é uma tarefa bastante comum dentro das áreas de processamento de sinais e reconhecimento de padrões. De uma maneira geral, os sistemas de processamento têm como tarefa inicial obter uma representação de baixa dimensão do sinal acústico, obtida a partir de métodos computacionais denominados extratores de características. Tal representação propõe apresentar o som da fala de uma forma mais conveniente à tarefa de extração e utilização da informação contida no sinal. Dentro deste contexto, nesta dissertação foi realizado um estudo detalhado de três métodos clássicos para extração de características de sinais acústicos existentes na literatura, a saber: os Mel-Frequency Cepstrum Coefficients (MFCC); o modelo Ensemble Interval Histogram (EIH); e o modelo Zero-Crossing with Peak Amplitudes (ZCPA). Sendo que, ainda para revisão bibliográfica, um estudo do sistema auditivo periférico humano foi realizado, visto que os métodos EIH e ZCPA são baseados em modelos do ouvido humano. Em seguida, um novo método de extração baseado em detecção de cruzamentos de nível foi desenvolvido ao longo do trabalho, denominado Eventos Acústicos Elementares (EAE). Diversos experimentos foram realizados a fim de comparar os métodos clássicos e o método desenvolvido nessa dissertação. Na primeira etapa, foram realizados experimentos com ruídos aditivos e com efeitos convolutivos de canal, para análise de robustez dos métodos. Por fim, referente à segunda etapa da análise comparativa dos métodos, foram realizados experimentos relativos à tarefa de reconhecimento de palavras isoladas, utilizando o método de alinhamento temporal Dynamic Time Warping (DTW). Os resultados obtidos indicam que o método proposto possui maior robustez quando comparado aos métodos clássicos implementados. Engenharia elétrica Acústica Análise espectral Audição (Fisiologia) Simulação (Computadores digitais) Extratores de características MFCC, ZCPA e EIH DTW CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
42	Feature Fusion Deep Learning Method for Video and Audio Based Emotion Recognition Yanan Song (11825003) 20 December 2021 (has links) In this thesis, we proposed a deep learning based emotion recognition system in order to improve the successive classification rate. We first use transfer learning to extract visual features and use Mel frequency Cepstral Coefficients(MFCC) to extract audio features, and then apply the recurrent neural networks(RNN) with attention mechanism to process the sequential inputs. After that, the outputs of both channels are fused into a concatenate layer, which is processed using batch normalization, to reduce internal covariate shift. Finally, the classification result is obtained by the softmax layer. From our experiments, the video and audio subsystem achieve 78% and 77% respectively, and the feature fusion system with video and audio achieves 92% accuracy based on the RAVDESS dataset for eight emotion classes. Our proposed feature fusion system outperforms conventional methods in terms of classification prediction. Transfer learning deep learning Recurrent Neural Networks (RNNs) MFCC Emotion recognition
43	Počítačová analýza sportovních zápasů / Computer analysis of sport matches Židlík, Pavel January 2009 (has links) This work deals with the possibility of a fast football match analysis from audio part of record with the possibility of implementation of some methods for other than football matches as well. The first intention was concentrated on detection of whiz of the soccer whistle that has specific frequency in its specter, which is out of common speech frequency. After detection harmonic frequency , the attention was focused on the definition of whiz meaning. Referee was helpful with the issue as he informed me about the number of whiz styles and provided me with referential samples for whiz classification. Neural network with back propagation was used for definition of whiz meaning. Another subject for detection of important moments of the match was concentration on the commentator’s basic tone. In case the commentator is really excited with the match, his basic speech tone automatically intensifies with every important action of the game. Analysis of commentator’s intensified basic speech tone was realized in this work too. Also the national hymns of teams playing against each other are a significant moment of the match. That is why detection of a hymn became another subject of analysis. Advantages of MFCC were used to obtain audio signal feature, from which 20 coefficients were gained. These were used as an entrance for classifier based on neural network with back propagation. For easy usage of these methods a graphic user interface with possibility of well-arranged look on gained results and also with possibility of replaying chosen section was created.
44	Určení výšky osob z řečového projevu / Determining person's height from spoken utterance Pelikán, Pavel January 2013 (has links) Diploma’s thesis is focused on determining person’s height from spoken utterance. First part of the work evaluates present situation and refers to the published studies. Knowledge gained in these studies was used in this thesis. Study with the best results according to estimated height of the speakers was chosen. The experiment realized in the chosen study was performed in this work. The system for the estimation of the height of the speakers based on the speech signal was created. This system was successfully tested by using several acoustic features on spoken utterances from TIMIT database.
45	Rozpoznávání řeči s pomocí nástroje Sphinx-4 / Speech recognition using Sphinx-4 Kryške, Lukáš January 2014 (has links) This diploma thesis is aimed to find an effective method for continuous speech recognition. To be more accurate, it uses speech-to-text recognition for a keyword spotting discipline. This solution is able to be applicable for phone calls analysis or for a similar application. Most of the diploma thesis describes and implements speech recognition framework Sphinx-4 which uses Hidden Markov models (HMM) to define a language acoustic models. It is explained how these models can be trained for a new language or for a new language dialect. Finally there is in detail described how to implement the keyword spotting in the Java language.
46	Algoritmy rozpoznávání řeči na FPGA/DSP / Speech Recognition Algorithms in FPGA/DSP Urbiš, Oldřich January 2008 (has links) This master's thesis deals with design of speech recognition algorithms with consideration of target technology, which is platform combinating digital signal processing and field programmable gate array. Algorithms for speech recognition includes: feature extraction of Melfrequency cepstral coefficients, hidden Markov models and their evaluation by Viterbi algorithm.
47	Optimizing text-independent speaker recognition using an LSTM neural network Larsson, Joel January 2014 (has links) In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use. speaker recognition speaker identification text-independent long short-term memory lstm mel frequency cepstral coefficients mfcc recurrent neural network speech processing spectral analysis rnnlib htktoolkit
48	Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition Sujatha, J 11 1900 (has links) (PDF) No description available. Speech Recognition Automatic Speech Recognition (ASR) Hiddden Markov Model (HMM) Robust Automatic Speech Recognition Cepstral Mean Normalization (CMN) Computer Science
49	Řízení a měření sportovních drilů hlasem/zvuky / Controlling and Measuring Sport Drills by Voice/Sound Odehnal, Jiří January 2019 (has links) This master's thesis deals with the design and development of mobile aplication for Android platform. The aim of the work is to implement a simple and user-friendly user interface that would support and assist the user in trainning and sport exercises. The thesis also include implementation of sound detection to support during exercises and voice instruction by application. In practice the application should help in making training exercises more comfortable without the user being forced to keep mobile device in hand.
50	Rozpoznání emočního stavu člověka z řeči / Automatic vocal-oriented recognition of human emotions Houdek, Miroslav January 2009 (has links) This master thesis concerns with emotional states and gender recognition on the basis of speech signal analysis. We used various prosodic and cepstral features for the description of the speech signal. In the text we describe non-invasive methods for glottal pulses estimation. The described features of speech were implemented in MATLAB. For their classification we used the GMM classifier, which uses the Gaussian probability distribution for modeling a feature space. Furthermore, we constructed a system for recognition of emotional states of the speaker and a system for gender recognition from speech. We tested the success of created systems with several features on speech signal segments of various lengths and compared the results. In the last part we tested the influence of speaker and gender on the success of emotional states recognition.

Search results