Global ETD Search

41	Análise cepstral baseada em diferentes famílias transformada wavelet / Cepstral analysis based on different family of wavelet transform Fabrício Lopes Sanchez 02 December 2008 (has links) Este trabalho apresenta um estudo comparativo entre diferentes famílias de transformada Wavelet aplicadas à análise cepstral de sinais digitais de fala humana, com o objetivo específico de determinar o período de pitch dos mesmos e, ao final, propõe um algoritmo diferencial para realizar tal operação, levando-se em consideração aspectos importantes do ponto de vista computacional, tais como: desempenho, complexidade do algoritmo, plataforma utilizada, dentre outros. São apresentados também, os resultados obtidos através da implementação da nova técnica (baseada na transformada wavelet) em comparação com a abordagem tradicional (baseada na transformada de Fourier). A implementação da técnica foi testada em linguagem C++ padrão ANSI sob as plataformas Windows XP Professional SP3, Windows Vista Business SP1, Mac OSX Leopard e Linux Mandriva 10. / This work presents a comparative study between different family of wavelets applied on cepstral analysis of the digital speech human signal with specific objective for determining of pitch period of the same and in the end, proposes an differential algorithm to make such a difference operation take into consideration important aspects of computational point of view, such as: performance, algorithm complexity, used platform, among others. They are also present, the results obtained through of the technique implementation compared with the traditional approach. The technique implementation was tested in C++ language standard ANSI under the platform Windows XP Professional SP3 Edition, Windows Vista Business SP1, MacOSX Leopard and Linux Mandriva 10. Análise cepstral Período de pitch Processamento digital de sinais de fala Transformada discreta de Fourier Transformada discreta wavelet Cepstrum analysis Digital speech signal processing Discrete Fourier transforn Discrete wavelet transform Pitch period
42	[en] INDEPENDENT TEXT ROBUST SPEAKER RECOGNITION IN THE PRESENCE OF NOISE USING PAC-MFCC AND SUB BAND CLASSIFIERS / [pt] RECONHECIMENTO DE LOCUTOR INDEPENDENTE DO TEXTO EM PRESENÇA DE RUÍDO USANDO PAC-MFCC E CLASSIFICADORES EM SUB-BANDAS HARRY ARNOLD ANACLETO SILVA 06 September 2011 (has links) [pt] O presente trabalho é proposto o atributo PAC-MFCC operando com Classificadores em Sub-Bandas para a tarefa de identificação de locutor independente do texto em ruído. O sistema proposto é comparado com os atributos MFCC (Coeficientes Cepestrais de Frequência Mel), PAC- MFCC (Fase Autocorrelação-MFCC ) sem uso de classificadores em sub-bandas, SSCH(Histogramas de Centróides de Sub-Bandas Espectrais) e TECC (Coeficientes Cepestrais da Energia Teager). Nesta tarefa de reconhecimento, utilizou-se a base TIMIT a qual é composta de 630 locutores onde cada um deles falam 10 frases de aproximadamente 3 segundos cada frase, das quais 8 frases foram utilizadas para treinamento e 2 para teste, obtendo-se um total de 1260 locuções para o reconhecimento. Investigou-se o desempenho dos diversos sistemas utilizando diferentes tipos de ruídos da base Noisex 92 com diferentes relação sinal ruído. Verificou-se que a taxa de acerto da técnica PAC-MFCC com classificador em Sub-Bandas apresenta o melhor desempenho em comparação com as outras técnicas quando se tem uma relação sinal ruído menor que 10dB. / [en] In this work is proposed the use of the PAC-MFCC feature with Sub-band Classifiers for the task of text-independent speaker identification in noise. The proposed scheme is compared with the features MFCC (Mel-Frequency Cepstral Coefficients ), PAC-MFCC (Phase Autocorrelation MFCC) without subband classifiers, SSCH (Subband Spectral Centroid Histograms) and TECC (Teager Energy Cepstrum Coefficients). In this recognition task, we used the TIMIT database which consists of 630 speakers, where every one of them speak 10 utterances of 3 seconds each one approximately, of which eight utterance were used for training and two for testing, thus obtaining a total of 1260 test utterance for the recognition. We investigated the performance of these techniques using differents types of noise from the base Noisex 92 with different signal to noise ratios. It was found that the accuracy rate of the PAC-MFCC feature with Sub-band Classifiers performs better in comparison with other techniques at a lower signal noise(less than 10dB). [pt] SUB-BANDA [en] SUB-BAND [en] SPEAKER INDEPENDENT RECONIGTION [en] MEL-FREQUENCY CEPSTRAL COEFFICIENTS
43	Reconhecimento de comandos de voz por redes neurais Rodrigo Jorge Alvarenga 02 June 2012 (has links) Sistema de reconhecimento de fala tem amplo emprego no universo industrial, no aperfeiçoamento de operações e procedimentos humanos e no setor do entretenimento e recreação. O objetivo específico do trabalho foi conceber e desenvolver um sistema de reconhecimento de voz, capaz de identificar comandos de voz, independentemente do locutor. A finalidade precípua do sistema é controlar movimentos de robôs, com aplicações na indústria e no auxílio de deficientes físicos. Utilizou-se a abordagem da tomada de decisão por meio de uma rede neural treinada com as características distintivas do sinal de fala de 16 locutores. As amostras dos comandos foram coletadas segundo o critério de conveniência (em idade e sexo), a fim de garantir uma maior discriminação entre as características de voz, e assim alcançar a generalização da rede neural utilizada. O préprocessamento consistiu na determinação dos pontos extremos da locução do comando e na filtragem adaptativa de Wiener. Cada comando de fala foi segmentado em 200 janelas, com superposição de 25% . As features utilizadas foram a taxa de cruzamento de zeros, a energia de curto prazo e os coeficientes ceptrais na escala de frequência mel. Os dois primeiros coeficientes da codificação linear preditiva e o seu erro também foram testados. A rede neural empregada como classificador foi um perceptron multicamadas, treinado pelo algoritmo backpropagation. Várias experimentações foram realizadas para a escolha de limiares, valores práticos, features e configurações da rede neural. Os resultados foram considerados muito bons, alcançando uma taxa de acertos de 89,16%, sob as condições de pior caso da amostragem dos comandos. / Systems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands. processamento de sinais reconhecimento de palavras MFCC coeficientes `mel-cepstral LPC redes neurais backpropagation automation signal processing word recognition MFCC mel-frequency ceptral coefficients LPC neural networks backpropagation ENGENHARIA MECANICA
44	VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ / DEVELOPMENT OF ALGORITHMS FOR GUNSHOT DETECTION Hrabina, Martin January 2019 (has links) Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
45	Adaptive Voice Control System using AI Steen, Jasmine, Wilroth, Markus January 2021 (has links) Controlling external actions with the voice is something humans have tried to do for a long time. There are many different ways to implement a voice control system, and many of these applications require internet connections. Leaving the application area limited, as commercially available voice controllers have been stagnating behind due to the cost of developing and maintaining. In this project an artifact was created to work as an easy to use, generic, voice controller tool that allows the user to easily create different voice commands that can be implemented in many different applications and platforms. The user shall have no need of understanding or experience of voice controls in order to use and implement the voice controller. Mel Frequency Cepstral Converter MFCC Artificial Neural Network ANN Automatic Speech Recognition ASR Voice Controller Speech Recognition Speech Model Computer Sciences Datavetenskap (datalogi)
46	Detection and localization of cough from audio samples for cough-based COVID-19 detection / Detektion och lokalisering av hosta från ljudprover för hostbaserad COVID-19-upptäckt Krishnamurthy, Deepa January 2021 (has links) Since February 2020, the world is in a COVID-19 pandemic [1]. Researchers around the globe are pitching in to develop a fast reliable, non-invasive testing methodology to solve this problem and one of the key directions of research is to utilize coughs and their corresponding vocal biomarkers for diagnosis of COVID-19. In this thesis, we propose a fast, real-time cough detection pipeline that can be used to detect and localize coughs from audio samples. The core of the pipeline utilizes the yolo-v3 model [2] from vision domain to localize coughs in the audio spectrograms by treating them as objects. This outcome is transformed to localize the boundaries of cough utterances in the input signal. The system to detect coughs from CoughVid dataset [3] is then evaluated. Furthermore, the pipeline is compared with other existing algorithms like tinyyolo-v3 to test for better localization and classification. Average precision(AP@0.5) of yolo-v3 and tinyyolo-v3 model are 0.67 and 0.78 respectively. Based on the AP values, tinyyolo-v3 performs better than yolo-v3 by atleast 10% and based on its computational advantage, its inference time was also found to be 2.4 times faster than yolo-v3 model in our experiments. This work is considered to be novel and significant in detection and localization of cough in an audio stream. In the end, the resulting cough events are used to extract MFCC features from it and classifiers were trained to predict whether a cough has COVID-19 or not. The performance of different classifiers were compared and it was observed that random forest outperformed other models with a precision of 83.04%. It can also be inferred from the results that the classifier looks promising, however, in future this model has to be trained using clinically approved dataset and tested for its reliability in using this model in a clinical setup. / Sedan februari 2020 är världen inne i en COVID-19-pandemi [1]. Forskare runt om i världen satsar på att utveckla en snabb tillförlitlig, icke-invasiv testmetodik för att lösa detta problem och en av de viktigaste forskningsriktningarna är att använda hosta och deras motsvarande vokala biomarkörer för diagnos av COVID-19. I denna avhandling föreslår vi en snabb pipeline för hostdetektering i realtid som kan användas för att upptäcka och lokalisera hosta från ljudprover. Kärnan i rörledningen använder yolo-v3-modellen [2] från syndomänen för att lokalisera hosta i ljudspektrogrammen genom att behandla dem som objekt. Detta resultat transformeras för att lokalisera gränserna för hosta yttranden i insignalen. Systemet för att upptäcka hosta från CoughVid dataset [3] utvärderas sedan. Dessutom jämförs rörledningen med andra befintliga algoritmer som tinyyolo-v3 för att testa för bättre lokalisering och klassificering. Genomsnittlig precision (AP@0.5) för modellen yolo-v3 och tinyyolo-v3 är 0,67 respektive 0,78. Baserat på AP-värdena fungerar tinyyolo-v3 bättre än yolo-v3 med minst 10% och baserat på dess beräkningsfördel befanns dess inferenstid också vara 2,4 gånger snabbare än yolo-v3- modellen i våra experiment. Detta arbete anses vara nytt och viktigt för att upptäcka och lokalisera hosta i en ljudström. I slutändan används de resulterande hosthändel-serna för att extrahera MFCC-funktioner från det och klassificerare utbildades för att förutsäga om en hosta har COVID-19 eller inte. Prestanda för olika klassificerare jämfördes och det observerades att slumpmässig skog överträffade andra modeller med en precision på 83.04%. Av resultaten kan man också dra slutsatsen att klassificeraren ser lovande ut, men i framtiden måste denna modell utbildas med hjälp av kliniskt godkänd dataset och testas med avseende på dess tillförlitlighet vid användning av denna modell i ett kliniskt upplägg. Cough detection and Localization Cough based digital biomarkers COVID- 19 You Only Look Once algorithm Hostdetektering och lokalisering Hostbaserade digitala biomarkörer COVID- 19 You Only Look Once algoritm Computer and Information Sciences Data- och informationsvetenskap
47	SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION Jingzhao Dai (6642491) 11 June 2019 (has links) <p>Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].</p> <p>In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.</p> <p>Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.</p> Sparse discrete wavelet decomposition Bandpass filter banks Support vector machine Support vector classification Random forest K nearest neighbors Convolutional neural networks
48	An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and Classification Alsouda, Yasser January 2019 (has links) Noise is defined as any undesired sound. Urban noise and its effect on citizens area significant environmental problem, and the increasing level of noise has become a critical problem in some cities. Fortunately, noise pollution can be mitigated by better planning of urban areas or controlled by administrative regulations. However, the execution of such actions requires well-established systems for noise monitoring. In this thesis, we present a solution for noise measurement and classification using a low-power and inexpensive IoT unit. To measure the noise level, we implement an algorithm for calculating the sound pressure level in dB. We achieve a measurement error of less than 1 dB. Our machine learning-based method for noise classification uses Mel-frequency cepstral coefficients for audio feature extraction and four supervised classification algorithms (that is, support vector machine, k-nearest neighbors, bootstrap aggregating, and random forest). We evaluate our approach experimentally with a dataset of about 3000 sound samples grouped in eight sound classes (such as car horn, jackhammer, or street music). We explore the parameter space of the four algorithms to estimate the optimal parameter values for the classification of sound samples in the dataset under study. We achieve noise classification accuracy in the range of 88% – 94%. urban noise sound pressure level (SPL) internet of things (IoT) machine learning support vector machine (SVM) k-nearest neighbors (KNN) bootstrap aggregating (Bagging) random forest Elektroteknik och elektronik
49	Semantic Classification And Retrieval System For Environmental Sounds Okuyucu, Cigdem 01 October 2012 (has links) (PDF) The growth of multimedia content in recent years motivated the research on audio classification and content retrieval area. In this thesis, a general environmental audio classification and retrieval approach is proposed in which higher level semantic classes (outdoor, nature, meeting and violence) are obtained from lower level acoustic classes (emergency alarm, car horn, gun-shot, explosion, automobile, motorcycle, helicopter, wind, water, rain, applause, crowd and laughter). In order to classify an audio sample into acoustic classes, MPEG-7 audio features, Mel Frequency Cepstral Coefficients (MFCC) feature and Zero Crossing Rate (ZCR) feature are used with Hidden Markov Model (HMM) and Support Vector Machine (SVM) classifiers. Additionally, a new classification method is proposed using Genetic Algorithm (GA) for classification of semantic classes. Query by Example (QBE) and keyword-based query capabilities are implemented for content retrieval.
50	[en] SPEECH RECOGNITION IN NOISE ENVIRONMENT / [es] RECONOCIMIENTO DE VOZ EN PRESCENCIA DE RUIDO / [pt] RECONHECIMENTO DE VOZ EM PRESENÇA DE RUÍDO DEBORA ANDREA DE OLIVEIRA SANTOS 02 October 2001 (has links) [pt] Este trabalho apresenta um estudo comparativo de três técnicas de melhoria das taxas de reconhecimento de voz em ambiente adverso, a saber: Normalização da Média Cepestral (CMN), Subtração Espectral e Regressão Linear no Sentido da Máxima Verossimilhança (MLLR), aplicadas isoladamente e em concomitância, duas a duas. Os testes são realizados usando um sistema simples: reconhecimento de palavras isoladas (dígitos de zero a nove, e meia), modo dependente do locutor, modelos ocultos de Markov do tipo contínuo, e vetores de atributos com doze coeficientes cepestrais derivados da análise de predição linear. São adotados três tipos de ruído (gaussiano branco, falatório e de fábrica) em nove razões sinal-ruído diferentes. Os resultados experimentais demonstram que o emprego isolado das técnicas de reconhecimento robusto é, em geral, vantajoso, pois nas diversas razões sinal-ruído para as quais os testes são efetuados, quando as taxas de reconhecimento não sofrem um acréscimo, mantém-se as mesmas obtidas quando não se aplica nenhum método de aumento da robustez. Analisando-se comparativamente as implementações isoladas e simultânea das técnicas, constata-se que a simultânea nem sempre é atraente, dependendo da dupla empregada. Apresentam-se, ainda, os resultados decorrentes do uso de modelos ruidosos, observando-se que, embora sejam inegavelmente melhores, sua utilização é inviável na prática. Das técnicas implementadas, a que representa resultados mais próximos ao emprego de modelos ruidosos é a MLLR, seguida pela CMN, e por último pela Subtração Espectral. Estas últimas, embora percam em desempenho para a primeira, apresentam como vantagem a simplicidade e a generalidade. No que concerne as técnicas usadas concomitantemente, a dupla Subtração Espectral e MLLR é a considerada de melhor performance, pois mostra-se conveniente em relação ao emprego isolado de ambos os métodos, o que nem sempre ocorre com o uso de outras combinações das técnicas individuais. / [en] This work presents a comparative study of three techniques for improving the speech recognition rates in adverse environment, namely: Cepstral Mean Normalization (CMN), Spectral Subtraction and Maximum Likelihood Linear Regression (MLLR). They are implemented in two ways: separately and in pairs. The tests are carried out on a simple system: recognition of isolated words (digits from zero to nine, and the word half), speaker-dependent mode, continuous hidden Markov models, and speech feature vectors with twelve cepstral coefficients derived from linear predictive analysis. Three types of noise are considered (the white one, voice babble and from factory) at nine different signal-to-noise ratios. Experimental result demonstrate that it is worth using separately the techniques of robust recognition. This is because for all signal-to-noise conditions when the recognition accuracy is not improved it is the same one obtained when no method for increasing the robustness is applied. Analyzing comparatively the isolated and simultaneous applications of the techniques, it is verified that the later is not always more attractive than the former one. This depends on the pair of techniques. The use of noisy models is also considered. Although it presents better results, it is not feasible to implement in pratical situations. Among the implemented techniques, MLLR presents closer results to the ones obtaneid with noisy models, followed by CMN, and, at last, by Spectral Subtraction. Although the two later ones are beaten by the first, in terms of recognition accuracy, their advantages are the simplicity and the generality. The use of simultaneous techniques reveals that the pair Spectral Subtraction and MLLR is the one with the best performance because it is superior in comparison with the individual use of both methods. This does not happen with other combination of techniques. / [es] Este trabajo presenta un estudio comparativo de tres técnicas de mejoría de las tasas de reconocimiento de voz en ambiente adverso, a saber: Normalización de la Media Cepextral (CMN), Substracción Espectral y Regresión Lineal en el Sentido de la Máxima Verosimilitud (MLLR), aplicadas separada y conjuntamente, dos a dos. Las pruebas son realizados usando un sistema simple: reconocimiento de palabras aisladas (dígitos de cero al nueve, y media), de modo dependiente del locutor, modelos ocultos de Markov de tipo contínuo, y vectores de atributos con doce coeficientes cepextrales derivados del análisis de predicción lineal. Se adoptan tres tipos de ruido (gausiano blanco, parlatorio y de fábrica) en nueve razones señal- ruido diferentes. Los resultados experimentales demuestran que el empleo aislado de las técnicas de reconocimiento robusto es, en general, ventajoso, pues en las diversas relaciones señal ruido para las cuales las pruebas son efetuadas, cuando la tasa de reconocimiento no aumenta, manteniendo las mismas tasas cuando no se aplica ningún método de aumento de robustez. Analizando comparativamente las implementaciones aisladas y simultáneas de las técnicas, se constata que no siempre la simultánea resulta atractiva, dependiendo de la dupla utilizada. Se presentan además los resultados al utilizar modelos ruidosos, observando que, aunque resultan mejores, su utilización em la práctica resulta inviable. De las técnicas implementadas, la que presenta resultados más próximos al empleo de modelos ruidosos es la MLLR, seguida por la CMN, y por último por la Substracción Espectral. Estas últimas, aunque tienen desempeño peor que la primera, tienen como ventaja la simplicidad y la generalidad. En lo que se refiere a las técnicas usadas concomitantemente, la dupla Substracción Espectral y MLLR es la de mejor performance, pues se muestra conveniente en relación al empleo aislado de ambos métodos, lo que no siempre ocurre con el uso de otras combinaciones de las técnicas individuales. [pt] RUIDO [en] NOISE [pt] NORMALIZACAO DA MEDIA CEPESTRAL [en] CEPSTRAL MEAN NORMALIZATION [pt] SUBTRACAO ESPECTRAL [en] SPECTRAL SUBTRACTION [pt] RECONHECIMENTO DE VOZ [en] SPEECH RECOGNITION

Search results