• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 19
  • 7
  • 5
  • 4
  • 1
  • Tagged with
  • 61
  • 45
  • 44
  • 42
  • 34
  • 34
  • 26
  • 25
  • 22
  • 20
  • 13
  • 12
  • 11
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Application of LabVIEW and myRIO to voice controlled home automation

Lindstål, Tim, Marklund, Daniel January 2019 (has links)
The aim of this project is to use NI myRIO and LabVIEW for voice controlled home automation. The NI myRIO is an embedded device which has a Xilinx FPGA and a dual-core ARM Cortex-A9processor as well as analog input/output and digital input/output, and is programmed with theLabVIEW, a graphical programming language. The voice control is implemented in two differentsystems. The first system is based on an Amazon Echo Dot for voice recognition, which is acommercial smart speaker developed by Amazon Lab126. The Echo Dot devices are connectedvia the Internet to the voice-controlled intelligent personal assistant service known as Alexa(developed by Amazon), which is capable of voice interaction, music playback, and controllingsmart devices for home automation. This system in the present thesis project is more focusingon myRIO used for the wireless control of smart home devices, where smart lamps, sensors,speakers and a LCD-display was implemented. The other system is more focusing on myRIO for speech recognition and was built on myRIOwith a microphone connected. The speech recognition was implemented using mel frequencycepstral coefficients and dynamic time warping. A few commands could be recognized, includinga wake word ”Bosse” as well as other four commands for controlling the colors of a smart lamp. The thesis project is shown to be successful, having demonstrated that the implementation ofhome automation using the NI myRIO with two voice-controlled systems can correctly controlhome devices such as smart lamps, sensors, speakers and a LCD-display.
32

A Design of Recognition Rate Improving Strategy for Japanese Speech Recognition System

Lin, Cheng-Hung 24 August 2010 (has links)
This thesis investigates the recognition rate improvement strategies for a Japanese speech recognition system. Both training data development and consonant correction scheme are studied. For training data development, a database of 995 two-syllable Japanese words is established by phonetic balanced sieving. Furthermore, feature models for the 188 common Japanese mono-syllables are derived through mixed position training scheme to increase recognition rate. For consonant correction, a sub-syllable model is developed to enhance the consonant recognition accuracy, and hence further improve the overall correct rate for the whole Japanese phrases. Experimental results indicate that the average correct rate for Japanese phrase recognition system with 34 thousand phrases can be improved from 86.91% to 92.38%.
33

Cepstral Deconvolution Method For Measurement Of Absorption And Scattering Coefficients Of Materials

Aslan, Gokhan 01 January 2007 (has links) (PDF)
Several methods are developed to measure absorption and scattering coefficients of materials. In this study, a new method based on cepstral deconvolution technique is proposed. A reverberation room method standardized recently by ISO (ISO 17497-1) is taken as the reference for measurements. Several measurements were conducted in a physically scaled reverberation room and results are evaluated according to these two methods, namely, the method given in the standard and cepstral deconvolution method. Two methods differ from each other in the estimation of specular parts of room impulse responses essential for determination of scattering coefficients. In the standard method, specular part is found by synchronous averaging of impulse responses. However, cepstral deconvolution method utilizes cepstral analysis to obtain the specular part instead of averaging. Results obtained by both of these two approaches are compared for five different test materials. Both of the methods gave almost same values for absorption coefficients. On the other hand, lower scattering coefficient values have been obtained for cepstral deconvolution with respect to the ISO method.
34

Single-trial classification of an EEG-based brain computer interface using the wavelet packet decomposition and cepstral analysis

Lodder, Shaun 12 1900 (has links)
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2009. / ENGLISH ABSTRACT: Brain-Computer Interface (BCI) monitors brain activity by using signals such as EEG, EcOG, and MEG, and attempts to bridge the gap between thoughts and actions by providing control to physical devices that range from wheelchairs to computers. A crucial process for a BCI system is feature extraction, and many studies have been undertaken to find relevant information from a set of input signals. This thesis investigated feature extraction from EEG signals using two different approaches. Wavelet packet decomposition was used to extract information from the signals in their frequency domain, and cepstral analysis was used to search for relevant information in the cepstral domain. A BCI was implemented to evaluate the two approaches, and three classification techniques contributed to finding the effectiveness of each feature type. Data containing two-class motor imagery was used for testing, and the BCI was compared to some of the other systems currently available. Results indicate that both approaches investigated were effective in producing separable features, and, with further work, can be used for the classification of trials based on a paradigm exploiting motor imagery as a means of control. / AFRIKAANSE OPSOMMING: ’n Brein-Rekenaar Koppelvlak (BRK) monitor brein aktiwiteit deur gebruik te maak van seine soos EEG, EcOG, en MEG. Dit poog om die gaping tussen gedagtes en fisiese aksies te oorbrug deur beheer aan toestelle soos rolstoele en rekenaars te verskaf. ’n Noodsaaklike proses vir ’n BRK is die ontginning van toepaslike inligting uit inset-seine, wat kan help om tussen verskillende gedagtes te onderskei. Vele studies is al onderneem oor hoe om sulke inligting te vind. Hierdie tesis ondersoek die ontginning van kenmerk-vektore in EEG-seine deur twee verskillende benaderings. Die eerste hiervan is golfies pakkie ontleding, ’n metode wat gebruik word om die sein in die frekwensie gebied voor te stel. Die tweede benadering gebruik kepstrale analise en soek vir toepaslike inligting in die kepstrale domein. ’n BRK is geïmplementeer om beide metodes te evalueer. Die toetsdata wat gebruik is, het bestaan uit twee-klas motoriese verbeelde bewegings, en drie klassifikasie-tegnieke was gebruik om die doeltreffendheid van die twee metodes te evalueer. Die BRK is vergelyk met ander stelsels wat tans beskikbaar is, en resultate dui daarop dat beide metodes doeltreffend was. Met verdere navorsing besit hulle dus die potensiaal om gebruik te word in stelsels wat gebruik maak van motoriese verbeelde bewegings om fisiese toestelle te beheer.
35

Inversion acoustique articulatoire à partir de coefficients cepstraux / Acoustic-to-articulatory inversion from cepstral coefficients

Busset, Julie 25 March 2013 (has links)
L'inversion acoustique-articulatoire de la parole consiste à récupérer la forme du conduit vocal à partir d'un signal de parole. Ce problème est abordé à l'aide d'une méthode d'analyse par synthèse reposant sur un modèle physique de production de la parole contrôlé par un petit nombre de paramètres décrivant la forme du conduit vocal : l'ouverture de la mâchoire, la forme et la position de la langue et la position des lèvres et du larynx. Afin de s'approcher de la géométrie de notre locuteur, le modèle articulatoire est construit à l'aide de contours articulatoires issus d'images cinéradiographiques présentant une vue sagittale du conduit vocal. Ce synthétiseur articulatoire nous permet de créer une table formée de couples associant un vecteur articulatoire au vecteur acoustique correspondant. Nous n'utiliserons pas les formants (fréquences de résonance du conduit vocal) comme vecteur acoustique car leur extraction n'est pas toujours fiable provoquant des erreurs lors de l'inversion. Les coefficients cepstraux sont utilisés comme vecteur acoustique. De plus, l'effet de la source et les disparités entre le conduit vocal du locuteur et le modèle articulatoire sont pris en compte explicitement en comparant les spectres naturels à ceux produits par le synthétiseur car nous disposons des deux signaux / The acoustic-to-articulatory inversion of speech consist in the recovery of the vocal tract shape from the speech signal. This problem is tackled with an analysis-by-synthesis method depending on a physical model of speech production controlled by a small number of parameters describing the vocal tract shape: the jaw opening, the shape and the position of the tongue and the position of lips and larynx. In order to approach the geometry of the speaker, the articulatory model is built with articulatory contours from cineradiographic images of the sagittal view of the vocal tract. This articulatory synthesizer allows us to create a table made up with couples associating a articulatory vector with the corresponding acoustic vector. The formants (resonance frequency of the vocal tract shape) are not used as acoustic vector because their extraction is not always reliable causing errors during inversion. The cepstral coefficients are used as acoustic vector. Moreover, the source effect and the mismatch between the speaker vocal tract and the articulatory model are considered explicitly comparing the natural spectrum with those produced by the synthesizer because we have the both signals
36

Análise cepstral baseada em diferentes famílias transformada wavelet / Cepstral analysis based on different family of wavelet transform

Sanchez, Fabrício Lopes 02 December 2008 (has links)
Este trabalho apresenta um estudo comparativo entre diferentes famílias de transformada Wavelet aplicadas à análise cepstral de sinais digitais de fala humana, com o objetivo específico de determinar o período de pitch dos mesmos e, ao final, propõe um algoritmo diferencial para realizar tal operação, levando-se em consideração aspectos importantes do ponto de vista computacional, tais como: desempenho, complexidade do algoritmo, plataforma utilizada, dentre outros. São apresentados também, os resultados obtidos através da implementação da nova técnica (baseada na transformada wavelet) em comparação com a abordagem tradicional (baseada na transformada de Fourier). A implementação da técnica foi testada em linguagem C++ padrão ANSI sob as plataformas Windows XP Professional SP3, Windows Vista Business SP1, Mac OSX Leopard e Linux Mandriva 10. / This work presents a comparative study between different family of wavelets applied on cepstral analysis of the digital speech human signal with specific objective for determining of pitch period of the same and in the end, proposes an differential algorithm to make such a difference operation take into consideration important aspects of computational point of view, such as: performance, algorithm complexity, used platform, among others. They are also present, the results obtained through of the technique implementation compared with the traditional approach. The technique implementation was tested in C++ language standard ANSI under the platform Windows XP Professional SP3 Edition, Windows Vista Business SP1, MacOSX Leopard and Linux Mandriva 10.
37

Spectral Estimation by Geometric, Topological and Optimization Methods

Enqvist, Per January 2001 (has links)
QC 20100601
38

Optimizing text-independent speaker recognition using an LSTM neural network

Larsson, Joel January 2014 (has links)
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.
39

Accent Classification from Speech Samples by Use of Machine Learning

Carol Pedersen Unknown Date (has links)
“Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier. Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed.
40

Accent Classification from Speech Samples by Use of Machine Learning

Carol Pedersen Unknown Date (has links)
“Accent” is the pattern of speech pronunciation by which one can identify a person’s linguistic, social or cultural background. It is an important source of inter-speaker variability and a particular problem for automated speech recognition. The aim of the study was to investigate a new computational approach to accent classification which did not require phonemic segmentation or the identification of phonemes as input, and which could therefore be used as a simple, effective accent classifier. Through a series of structured experiments this study investigated the effectiveness of Support Vector Machines (SVMs) for speech accent classification using time-based units rather than linguistically-informed ones, and compared it to the accuracy of other machine learning methods, as well as the ability of humans to classify speech according to accent. A corpus of read-speech was collected in two accents of English (Arabic and “Indian”) and used as the main datasource for the experiments. Mel-frequency cepstral coefficients were extracted from the speech samples and combined into larger units of 10 to 150ms duration, which then formed the input data for the various machine learning systems. Support Vector Machines were found to classify the samples with up to 97.5% accuracy with very high precision and recall, using samples of between 1 and 4 seconds of speech. This compared favourably with a human listener study where subjects were able to distinguish between the two accent groups with an average of 92.5% accuracy in approximately 8 seconds. Repeating the SVM experiments on a different corpus resulted in a best classification accuracy of 84.6%. Experiments using a decision tree learner and a rule-based classifier on the original corpus gave a best accuracy of 95% but results over the range of conditions were much more variable than those using the SVM. Rule extraction was performed in order to help explain the results and better inform the design of the system. The new approach was therefore shown to be effective for accent classification, and a plan for its role within various other larger speech-related contexts was developed.

Page generated in 0.0362 seconds