1 |
A Design of Mandarin Speech Recognition System for Addresses in Taiwan¡AHong Kong and ChinaWang, San-ming 06 September 2007 (has links)
The objective of this thesis is to design and implement a speech inputting system for addresses in Taiwan,Mainland china and HongKong,The completed system has the capability to identify full census and posting addresses in Taiwan and full posting addresses in Peking¡BShanghai¡BTien-Jin and Chungchin of China¡CFor HongKong,a partial address system,including region/street name or school,hotal and other public location names,is implemented¡C
In this thesis,Mel frequency cepstrum coefficient,Hidden Mavkov model and lexicon search strategy are applied to choose the initial address candidates¡FMandarin intonation classification technique is then used to increase the final correct rate,under speaker dependent case,a 90%correct rate can be reached by using a Intel Celeron 2.4GHz CPU and RedHat Linux 9.0 operating system¡CThe total address-inputting task can be completed within 3 seconds¡C
|
2 |
A Design of Recognition Rate Improving Strategy for Speech Recognition System - A Case Study on Mandarin Name and Phrase Recognition SystemChen, Ru-Ping 30 August 2008 (has links)
The objective of this thesis is to design and implement a speech recognition system for Mandarin names and phrases. This system utilizes Mel frequency cepstral coefficients, hidden Markov model and lexicon search strategy to select the phrase candidates. The experimental results indicate that for the speaker dependent case, a strategy incorporating overlapping frames and hybrid training can result in an improvement of 4%, 5%, 4% and 2% on the recognition rate for the Mandarin name, two-word, three-word and four-word phrase recognition systems respectively. Under Redhat Linux 9.0 operating system, any Mandarin name or phrase can be recognized within 2 seconds by a computer with Intel Celeron 2.4 GHz CPU.
|
3 |
Speech and music discrimination using short-time featuresMubarak, Omer Mohsin, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2006 (has links)
This thesis addresses the problem of classifying an audio stream as either speech or music, an issue which is beginning to receive increasing attention due to its wide range of applications. Various techniques have been presented in last decade to discriminate between speech and music. However, their accuracy is still not sufficient since music can refer to a very broad class of signals due to the large number of musical instruments found in audio data. Performance can also be further compromised in noisy conditions, which are unavoidable in some practical situations. This thesis presents an analysis of feature extraction techniques and classifiers currently being used, followed by the proposal and evaluation of new features for improved classification. These include two novel cepstral features, delta cepstral energy and power spectrum deviation, along with amplitude and frequency modulation features. The modified group delay feature, initially proposed for speech recognition, is also investigated for speech and music discrimination. Experiments were performed using different sets of features, compared among themselves and with conventional MFCCs using error rate criteria and Detection Error Trade-off curves. It is shown that the proposed cepstral and modulation features result in an increase in the accuracy of the conventional MFCC based system. However, the modified group delay feature which has been shown to improve accuracy for speech classification problems, does not contribute much to the problem of speech and music discrimination. Among the ones presented here the optimum feature configuration, both modulation features with MFCC, resulted in overall error rate of 6.57% as compared to 7.43% for MFCC alone.
|
4 |
Research and simulation on speech recognition by MatlabPan, Linlin January 2014 (has links)
With the development of multimedia technology, speech recognition technology has increasingly become a hotspot of research in recent years. It has a wide range of applications, which deals with recognizing the identity of the speakers that can be classified into speech identification and speech verification according to decision modes.The main work of this thesis is to study and research the techniques, algorithms of speech recognition, thus to create a feasible system to simulate the speech recognition. The research work and achievements are as following: First: The author has done a lot of investigation in the field of speech recognition with the adequate research and study. There are many algorithms about speech recognition, to sum up, the algorithms can divided into two categories, one of them is the direct speech recognition, which means the method can recognize the words directly, and another prefer the second method that recognition based on the training model. Second: find a useable and reasonable algorithm and make research about this algorithm. Besides, the author has studied algorithms, which are used to extract the word's characteristic parameters based on MFCC(Mel frequency Cepstrum Coefficients) , and training the Characteristic parameters based on the GMM(Gaussian mixture mode) . Third: The author has used the MATLAB software and written a program to implement the speech recognition algorithm and also used the speech process toolbox in this program. Generally speaking, whole system includes the module of the signal process, MFCC characteristic parameter and GMM training. Forth: Simulation and analysis the results. The MATLAB system will read the wav file, play it first, and then calculate the characteristic parameters automatically. All content of the speech signal have been distinguished in the last step. In this paper, the author has recorded speech from different people to test the systems and the simulation results shown that when the testing environment is quiet enough and the speaker is the same person to record for 20 times, the performance of the algorithm is approach to 100% for pair of words in different and same syllable. But the result will be influenced when the testing signal is surrounded with certain noise level. The simulation system won’t work with a good output, when the speaker is not the same one for recording both reference and testing signal.
|
5 |
A Design of Speech Recognition System for Two-word¡BThree-word and Four-word Mandarin PhrasesWu, Jung-chun 06 September 2007 (has links)
In this thesis, a two-word, three-word and four-word Mandarin phrases speech recognition system is studied and implemented. This system utilizes hidden Markov model, lexicon search strategy and tone recognition to select the initial phrase candidates and make the final decision. Experimental results indicate that using about one third of the total phrase population, 80%, 92% and 97% correct rates can be achieved for the 70,000 two-word, 24,000 three-word and 22,000 four-word phrases recognition problems respectively. Any spoken phrase can be found within 1 second, using a PC with Intel Celeron 2.4 GHz CPU and Red Hat Linux 9.0 operating system.
|
6 |
A design of speech recognition system for one hundred thousand Chinese namesTu, Chiu-chuan 06 September 2007 (has links)
The objective of this thesis is to design and implement a speech recognition system for one hundred thousand Chinese names. Mel frequency cepstrum coefficient, hidden Markov model and lexicon search strategy are utilized to choose the name candidates. Furthermore, a mandarin intonation technique is also incorporated into this system to increase the final speech recognition accuracy.
The experimental results indicate that for the speaker dependent case, an 85% correct rate can be achieved by use of the proposed intonation classification scheme and the balanced monosyllable training database. The above correct rate has an increase of 8% over the previous method without using these two techniques. Under Redhat Linux 9.0 environment, a mandarin name can be recognized within 2 seconds by the use of a computer with Intel Celeron 2.4 GHz CPU.
|
7 |
Šnekos atpažinimas / Speech RecognitionDobrovolskis, Martynas 14 June 2005 (has links)
Voice recognition technologies appeared in the period of general device miniaturization, when all technologies were commonly integrated into one lust. There is no space for buttons and displays anymore.
To have a good system of Lithuanian language recognition, a number of throughout researches must be implemented. Only after selecting the most efficient speech recognition scheme, we can proceed to the development of software adapted to the contemporary time. The aim of this paper is to determine, how efficient speech recognition is possible using neuron networks. MFCC and LPC coefficients were chosen as the parameters characterizing the phonemes. The paper attempts at the determination of the coefficients, which lead to the most efficient recognition of phonemes. For testing, programs PRAAT and MatLab were used.
After implementing a number of phoneme recognition experiments in the research work, the results were obtained, which lead to the following conclusions:
1. In case of using neuron network for the recognition of isolated sounds and characterizing the phonemes by MFCC or LPC coefficients, the possibility of recognition does not exceed 90 per cent. It is not enough for quality recognition of Lithuanian speech.
2. In case of using MFCC coefficients, separate phonemes are recognized better than using LPC coefficients. The difference is about 15 per cent.
3. The advantage of LPC coefficients in comparison with MFCC is the curve of recognition possibility, which is more even... [to full text]
|
8 |
Alignement du chant par rapport à une référence audio en temps réelJulien, Eric January 2013 (has links)
Dans l'optique de créer un système de karaoké qui modifie une interprétation chantée à capella en temps réel, il est nécessaire de pouvoir localiser l'interprète par rapport à une référence afin de pouvoir déterminer quelle serait la cible d'un algorithme de modification de la voix. Pour qu'un tel système fonctionne bien, il est nécessaire que l'algorithme d'alignement exploite au maximum les spécificités de la voix, qu'il utilise l'information liée au texte prononcé plutôt qu'aux aspects artistiques du chant, qu'il soit à temps réel et qu'il offr la plus faible latence possible. Afin d'atteindre ces objectifs, un système d'alignement basé sur le Dynamic Time Warping (DTW) a été développé. Une adaptation temps réel simple de l'algorithme ordinaire de la DTW qui permet d'atteindre les objectifs énumérés est proposée et comparée à d'autres approches répertoriées dans la littérature. Cette adaptation a permis d'obtenir de meilleurs résultats que les autres techniques testées. Une étude comparative de trois types d'analyses spectrales couramment utilisées dans des systèmes de reconnaissance automatique de la voix a été réalisée, dans le cadre spécifique d'un algorithme d'alignement de la voix chantée. Les coefficients évalués sont les Mel-frquency Cepstrum Coefficients (MFCC), les Warped Discrete Cosine Transform Coefficients (WDCTC) et les coefficients de l'analyse Perceptual Linear Prediction (PLP). Les résultats obtenus indiquent une meilleure performance pour l'analyse PLP. L'utilisation d'une fonction de transformation linéaire par morceaux, appliquée aux matrices de coûts instantanés obtenues, permet de rendre l'alignement le plus facilement distinguable dans les matrices de coûts cumulés calculées. Les paramètres de la fonction de transformation peuvent être obtenus par l'optimisation en boucle fermée par recherche directe par motif. Une fonction-objectif permettant d'éviter les discontinuités de l'écart quadratique moyen sur l'alignement est développée. Plusieurs matrices de coûts peuvent être combinées entre elles en effectuant une somme pondérée des matrices de coûts instantanées transformées de chacun des paramètres considérés. La pondération est également obtenue par optimisation. Plusieurs assemblages sont comparés : les meilleurs résultats sont obtenus avec une combinaison de l'analyse PLP et du niveau d'énergie et des dérivées de ceux-ci. L'écart moyen sur l'alignement de référence est de l'ordre de 50 ms, avec un écart-type d'environ 75 ms pour les séquences testées. Des perspectives permettant d'améliorer la convergence de l'algorithme pour les paires de séquences audio difficiles à aligner, d'obtenir de meilleures matrices de coûts en utilisant d'autres contraintes locales, en considérant l'intégration de nouveaux paramètres tels le pitch ou en utilisant une base de données de voix chantée segmentée pour optimiser une mesure de distance sont données.
|
9 |
GluA2 - Glutamatergic Receptor Study: A Molecular Approach. / GluA2 - Glutamatergic Receptor Study: A Molecular ApproachMartins, Ana Caroline Vasconcelos January 2017 (has links)
Submitted by José Orlando Soares de Oliveira (orlando.soares@bol.com.br) on 2017-11-30T12:23:47Z
No. of bitstreams: 1
2017_tese_acvmartins.pdf: 10270409 bytes, checksum: f2b0eb40db54875e0e40a6d040ce7336 (MD5) / Rejected by Weslayne Nunes de Sales (weslaynesales@ufc.br), reason: A aluna optou por publicar apenas os elementos pré-textuais. on 2017-12-01T12:36:51Z (GMT) / Submitted by José Orlando Soares de Oliveira (orlando.soares@bol.com.br) on 2017-12-01T13:50:35Z
No. of bitstreams: 1
Tese corrigida - elementos pretextuais.pdf: 159585 bytes, checksum: 9531b29bc8c5a46f5ed5753442df383f (MD5) / Approved for entry into archive by Weslayne Nunes de Sales (weslaynesales@ufc.br) on 2017-12-01T13:57:30Z (GMT) No. of bitstreams: 1
Tese corrigida - elementos pretextuais.pdf: 159585 bytes, checksum: 9531b29bc8c5a46f5ed5753442df383f (MD5) / Made available in DSpace on 2017-12-01T13:57:30Z (GMT). No. of bitstreams: 1
Tese corrigida - elementos pretextuais.pdf: 159585 bytes, checksum: 9531b29bc8c5a46f5ed5753442df383f (MD5)
Previous issue date: 2017-11-17 / Glutamate receptors are the mediators of most excitatory neurotransmission processes in the central nervous system, acting as prominent targets for the treatment of several neurological disorders such as Epilepsy, Amyotrophic Lateral Sclerosis, Parkinson’s disease and Alzheimer’s disease. Hence an improved understanding of how glutamate and other ligands interact with the binding domain, of these receptors, can bring relevant insights to the development of new ligands. Therefore, this work aims to study the GluA2–ligand interaction using the structure of GluA2 co-crystallized with the ligands glutamate, AMPA, kainate and DNQX applying a method based on the Density Functional Theory combined with the molecular fractionation with conjugate caps scheme. To address that the dielectric constant of the GluA2 receptor is not homogeneous, a novel molecular approach was proposed and it was applied to study the interaction between the GluA2 and the ligands glutamate, AMPA, kainate and DNQX. The results obtained, considering the inhomogeneous model, were compared with those obtained using an uniform dielectric function for the GluA2 receptor and with data published in the literature establishing a more detailed description of the relevant amino acid residues for the protein-ligand binding interaction. Molecular dynamics studies and protein DFT calculations usually consider a fixed value for the protein dielectric function. In this work when ε = 1 is considered, many amino acid residues seem important, but when the dielectric constant shield was considered, they lost their relevance. The results for the GluA2-ligand total interaction energy and the D1-ligand and D2-ligand total interaction energy also shed some light on the differentiation between full and partial agonists, and between agonists and antagonists. Additionally, the results allow a hypothesis on the correlation between the Glu705-ligand interaction energy and the ligand action, paving the way for the use of the inhomogeneous dielectric function to study glutamate receptors and other protein-ligand systems. Finally, the results also suggests that for different ligands, different homogeneous dielectric constant will be able to well represent the system GluA2-ligand, making it necessary the previous analyses with the inhomogeneous dielectric constant approach. / Os receptores de glutamato são os mediadores da maioria dos processos de neurotransmissão excitatória no sistema nervoso central, atuando como alvos proeminentes para o tratamento de vários distúrbios neurológicos, como Epilepsia, Esclerose Lateral Amiotrófica, Doença de Parkinson e Doença de Alzheimer. Assim, uma compreensão aprimorada de como o glutamato e outros ligantes interagem com o domínio de interação, desses receptores, pode trazer informações relevantes para o desenvolvimento de novos ligantes. Portanto, este trabalho teve por objetivo estudar a interação GluA2-ligante utilizando a estrutura de GluA2 co-cristalizada com os ligantes Glutamato, AMPA, Cainato e DNQX utilizando método baseado na Teoria do Funcional da Densidade combinado com o esquema de fracionamento molecular com capas conjugadas. Para abordar que a constante dielétrica do receptor GluA2 não é homogênea, foi proposta uma nova abordagem molecular, que foi aplicada para estudar a interação entre a GluA2 e os ligantes Glutamato, AMPA, Cainato e DNQX. Os resultados obtidos, considerando o modelo não-homogêneo, foram comparados com aqueles obtidos usando uma função dielétrica uniforme para o receptor GluA2 e com dados publicados na literatura, estabelecendo uma descrição mais detalhada dos resíduos de aminoácido mais relevantes para a interação proteína-ligante. Estudos de dinâmica molecular e cálculos DFT de sistemas proteicos normalmente consideram um valor fixo para a função dielétrica proteica. Nesse trabalho quando ε = 1 é considerado, muitos resíduos de aminoácido parecem relevantes, mas quando a blindagem da constante dielétrica foi considerada, eles perderam sua relevância. Os resultados apresentados para a energia de interação total GluA2-ligante e a energia de interação total D1-ligante e D2-ligante contribuiu com a diferenciação entre agonistas totais e agonistas parciais e entre agonistas e antagonistas. Além disso, os resultados permitem que seja feita hipótese sobre a correlação entre a energia de interação Glu705-ligante e a ação do ligante, abrindo caminho para o uso da função dielétrica não-homogênea para estudar receptores de glutamato e outros sistemas proteína-ligante. Por fim, os resultados também sugerem que para diferentes ligantes, diferentes constantes dielétricas homogêneas serão capazes de representar bem o sistema GluA2-ligante, tornando necessária a análise prévia com a abordagem da constante dielétrica não-homogênea.
|
10 |
Age and Gender Recognition for Speech Applications based on Support Vector MachinesErokyar, Hasan 30 October 2014 (has links)
Automatic age and gender recognition for speech applications is very important for a number of reasons. One of the reasons is that it can improve human-machine interaction. For example, the advertisements can be specialized based on the age and the gender of the person on the phone. It also can help identify suspects in criminal cases or at least it can minimize the number of suspects. Some other uses of this system can be applied for adaptation of waiting queue music where a different type of music can be played according to the person's age and gender. And also using this age and gender recognition system, the statistics about age and gender information for a specific population can be learned. Machine learning is part of artificial intelligence which aims to learn from data. Machine Learning has a long history. But due to some limitations, for ex. , the cost of computation and due to some inefficient algorithms, it was not applied to speech recognition tasks. Only for a decade, researchers started to apply these algorithms to some real world tasks, for ex., speech recognition, computer vision, finance, banking, robotics etc. In this thesis, recognition of age and gender was done using a popular machine learning algorithm and the performance of the system was compared. Also the dataset included real -life examples, so that the system is adaptable to real world applications. To remove the noise and to get the features of speech examples, some digital signal processing techniques were used. Useful speech features that were used in this work were: pitch frequency and cepstral representations.
The performance of the age and gender recognition system depends on the speech features used. As the first speech feature, the fundamental frequency was selected. Fundamental frequency is the main differentiating factor between male and female speakers. Also, fundamental frequency for each age group is different. So in order to build age and gender recognition system, fundamental frequency was used. To get the fundamental frequency of speakers, harmonic to sub harmonic ratio method was used. The speech was divided into frames and fundamental frequency for each frame was calculated. In order to get the fundamental frequency of the speaker, the mean value of all the speech frames were taken. It turns out that, fundamental frequency is not only a good discriminator gender, but also it is a good discriminator of age groups simply because there is a distinction between age groups and the fundamental frequencies. Mel Frequency Cepstral Coefficients (MFCC) is a good feature for speech recognition and so it was selected. Using MFCC, the age and gender recognition accuracies were satisfactory. As an alternative to MFCC, Shifted Delta Cepstral (SDC) was used as a speech feature. SDC is extracted using MFCC and the advantage of SDC is that, it is more robust under noisy data. It captures the essential information in noisy speech better. From the experiments, it was seen that SDC did not give better recognition rates because the dataset did not contain too much noise. Lastly, a combination of pitch and MFCC was used to get even better recognition rates. The final fused system has an overall recognition value of 64.20% on ELSDSR [32] speech corpus.
|
Page generated in 0.0491 seconds