Global ETD Search

571	Vliv akustiky prostředí na úspěšnost rozpoznávače řeči / Impact of Environment Acoustics on Speech Recognition Accuracy Paliesek, Jakub January 2021 (has links) This diploma thesis deals with impact of room acoustics on automatic speech recognition (ASR) accuracy. Experiments were evaluated on speech corpus LibriSpeech and database of impulse responses and noise called ReverbDB. Used ASRs were based on Mini LibriSpeech recipe for Kaldi. First it was examined how well can ASR learn to transcribe in selected environments by using the same acoustic conditions during training and testing. Next, experiments were carried out with modifications of ASR architecture in order to achieve better robustness against new conditions by using methods for adapation to room acoustics - r-vectors and i-vectors. It was shown that recently proposed method of r-vectors is beneficial even when using real impulse responses for data augmentation.
572	Test-Retest Reliability in the Determination of the Speech Recognition Threshold Jacobs, Alyssa Montierth 28 March 2012 (has links) For many years, speech recognition threshold (SRT) testing has been used as an indicator of audiologic health. However, with changing methods and technology, test-retest reliability has not been reviewed extensively with newer digitally recorded spondaic words which meet a published criterion of listener familiarity. This study examined the test-retest reliability of 33 high frequency usage and psychometrically equated spondaic words. The American Speech-Language-Hearing Association recommended method (2-dB decrements) was used to measure the left and right SRT of 40 participants using both male and female talker recordings. For each participant, four SRTs were found during the test condition and four SRTs were found during the retest condition. All of the SRT scores were analyzed and the averaged SRT values found using a male talker recording resulted in an average retest SRT to be 1.4 dB better than the average test SRT. The averaged SRT values found using a female talker recording resulted in an averaged retest SRT to be 1.2 dB better than the averaged test SRT. The SRT scores also showed high validity when compared to each participant's pure tone average (PTA). This study additionally found no significant interaction in using a male versus a female talker when using digitally recorded and psychometrically equated spondaic words. speech recognition threshold test-retest reliability digitally recorded materials Communication Sciences and Disorders
573	Dekodér pro systém detekce klíčových slov / Decoder for key word detection system Krotký, Jan January 2009 (has links) The essay presents the basic characteristics of human speech recognition, describes systems for the detection of key words and further deals with the proposal of each decoder blocks divided into three chapters. The first one describes the operations that are performed before the signal distribution of the framework and the segmentation. The second chapter describes the calculation of short-term energy, the number of zero passes and self-correlative, prediction and Mel-frequency cepstral coefficients. The third chapter, which describes the design of the block decoder, describes the method of dynamic time destruction and the method based on hidden Markov model. The final part of the essay describes decoders working with a speech and a proposal for a simple decoder working with isolated words, which was based issued and tested based on the preceding chapters.
574	Zpracování signálů pomocí skrytých Markovových modelů / Signal processing by hidden Markov models Hampl, Jindřich January 2010 (has links) One of the most common methods for isolated words recognition is based on Hidden Markov models. Speech signal can be considered as a sequence of successive parts of the signal with specific statistical parameters. Hidden Markov model corresponds to the statistical model with the final number of states, which may be useful for signals such as speech. HTK module is a software tools, which is mostly used to work with hidden Markov models.
575	Adaptace rozpoznávače řeči na datech bez přepisu / Unsupervised Adaptation of Speech Recognizer Švec, Ján January 2015 (has links) The goal of this thesis is to design and test techniques for unsupervised adaptation of speech recognizers on some audio data without any textual transcripts. A training set is prepared at first, and a baseline speech recognition system is trained. This sistem is used to transcribe some unseen data. We will experiment with an adaptation data selection process based on some speech transcript quality measurement. The system is re-trained on this new set than, and the accuracy is evaluated. Then we experiment with the amount of adaptation data.
576	Far-Field Speech Recognition / Far-Field Speech Recognition Žmolíková, Kateřina January 2016 (has links) Systémy rozpoznávání řeči v dnešní době dosahují poměrně vysoké úspěšnosti. V případě řeči, která je snímána vzdáleným mikrofonem a je tak narušena množstvím šumu a dozvukem (reverberací), je ale přesnost rozpoznávání značně zhoršena. Tento problém je možné zmírnit využitím mikrofonních polí. Tato práce se zabývá technikami, které umožňují kombinovat signály z více mikrofonů tak, aby byla zlepšena kvalita výsledného signálu a tedy i přesnost rozpoznávání. Práce nejprve shrnuje teorii rozpoznávání řeči a uvádí nejpoužívanější algoritmy pro zpracování mikrofonních polí. Následně jsou demonstrovány a analyzovány výsledky použití dvou metod pro beamforming a metody dereverberace vícekanálových signálů. Na závěr je vyzkoušen alternativní způsob beamformingu za použití neuronových sítí.
577	Réseaux de neurones récurrents pour le traitement automatique de la parole / Speech processing using recurrent neural networks Gelly, Grégory 22 September 2017 (has links) Le domaine du traitement automatique de la parole regroupe un très grand nombre de tâches parmi lesquelles on trouve la reconnaissance de la parole, l'identification de la langue ou l'identification du locuteur. Ce domaine de recherche fait l'objet d'études depuis le milieu du vingtième siècle mais la dernière rupture technologique marquante est relativement récente et date du début des années 2010. C'est en effet à ce moment qu'apparaissent des systèmes hybrides utilisant des réseaux de neurones profonds (DNN) qui améliorent très notablement l'état de l'art. Inspirés par le gain de performance apporté par les DNN et par les travaux d'Alex Graves sur les réseaux de neurones récurrents (RNN), nous souhaitions explorer les capacités de ces derniers. En effet, les RNN nous semblaient plus adaptés que les DNN pour traiter au mieux les séquences temporelles du signal de parole. Dans cette thèse, nous nous intéressons tout particulièrement aux RNN à mémoire court-terme persistante (Long Short Term Memory (LSTM) qui permettent de s'affranchir d'un certain nombre de difficultés rencontrées avec des RNN standards. Nous augmentons ce modèle et nous proposons des processus d'optimisation permettant d'améliorer les performances obtenues en segmentation parole/non-parole et en identification de la langue. En particulier, nous introduisons des fonctions de coût dédiées à chacune des deux tâches: un simili-WER pour la segmentation parole/non-parole dans le but de diminuer le taux d'erreur d'un système de reconnaissance de la parole et une fonction de coût dite de proximité angulaire pour les problèmes de classification multi-classes tels que l'identification de la langue parlée. / Automatic speech processing is an active field of research since the 1950s. Within this field the main area of research is automatic speech recognition but simpler tasks such as speech activity detection, language identification or speaker identification are also of great interest to the community. The most recent breakthrough in speech processing appeared around 2010 when speech recognition systems using deep neural networks drastically improved the state-of-the-art. Inspired by this gains and the work of Alex Graves on recurrent neural networks (RNN), we decided to explore the possibilities brought by these models on realistic data for two different tasks: speech activity detection and spoken language identification. In this work, we closely look at a specific model for the RNNs: the Long Short Term Memory (LSTM) which mitigates a lot of the difficulties that can arise when training an RNN. We augment this model and introduce optimization methods that lead to significant performance gains for speech activity detection and language identification. More specifically, we introduce a WER-like loss function to train a speech activity detection system so as to minimize the word error rate of a downstream speech recognition system. We also introduce two different methods to successfully train a multiclass classifier based on neural networks for tasks such as LID. The first one is based on a divide-and-conquer approach and the second one is based on an angular proximity loss function. Both yield performance gains but also speed up the training process. Réseaux de neurones récurrents Reconnaissance de la parole LSTM Recurrent neural networks Speech recognition LSTM
578	Development and validation of a South African English smartphone-based speech-in-noise hearing test Engelbrecht, Jenni-Mari January 2017 (has links) Approximately 80% of the adult and elderly population ≥65 years have not been assessed or treated for a hearing loss, despite the effect a hearing loss has on communication and quality of life (World Health Organization [WHO], 2013a). In South Africa, many challenges to the health care system exist of which access to ear and hearing health care is one of the major problems. This study aimed to develop and validate a smartphone-based digits-in-noise hearing test for South African English towards improved access to hearing screening. The study also considered the effect of hearing loss and English speaking competency on the South African English digits-in-noise hearing test to evaluate its suitability for use across native (N) and non-native (NN) speakers. Lastly, the study evaluated the digits-in-noise test’s applicability as part of the diagnostic audiometric test battery as a clinical test to measure speech recognition ability in noise. During the development and validation phase of this study the sample size consisted of 40 normal-hearing subjects with thresholds ≤15 dB across the frequency spectrum (250 – 8000 Hertz [Hz]) and 186 subjects with normal-hearing in both ears, or normal-hearing in the better ear. Single digits (0 – 9) were recorded and spoken by a N English female speaker. Level corrections were applied to create a set of homogeneous digits with steep speech recognition functions. A smartphone application (app) was created to utilize 120 digit-triplets in noise as test material. An adaptive test procedure determined the speech reception threshold (SRT). Experiments were performed to determine headphones effects on the SRT and to establish normative data. The results showed steep speech recognition functions with a slope of 20%/dB for digit-triplets presented in noise using the smartphone app. The results of five headphone types indicate that the smartphone-based hearing test is reliable and can be conducted using standard Android smartphone headphones or clinical headphones. A prospective cross-sectional cohort study of N and NN English adults with and without sensorineural hearing loss compared pure-tone air conduction thresholds to the SRT recorded with the smartphone digits-in-noise hearing test. A rating scale was used for NN English listeners’ self-reported competence in speaking English. This study consisted of 454 adult listeners (164 male, 290 female; range 16 – 90 years), of which 337 listeners had a best ear 4 frequency pure-tone average (4FPTA; 0.5, 1, 2 and 4 kHz) of ≤25 dB hearing level (HL). A linear regression model identified three predictors of the digits-in-noise SRT namely 4FPTA, age and self-reported English speaking competence. The NN group with poor self-reported English speaking competence (≤5/10) performed significantly (p<0.01) poorer than the N & NN (≥6/10) group on the digits-in-noise test. Screening characteristics of the test improved with separate cut-off values depending on self-reported English speaking competence for the N & NN (≥6/10) group and NN (≤5/10) group. Logistic regression models, that include age in the analysis, showed a further improvement in sensitivity and specificity for both groups (area under the receiver operator characteristic curve [AUROC] .962 and .903 respectively). A descriptive study evaluated 109 adult subjects (43 male, 66 female) with and without sensorineural hearing loss by comparing pure-tone air conduction thresholds, speech recognition monaural performance score intensity (SRS dB) and the digits-in-noise SRT. An additional nine adult hearing aid users (4 male, 5 female) was utilized in a subset to determine aided and unaided digits-in-noise SRTs. The digits-in-noise SRT was strongly associated with the best ear 4FPTA (r=0.81) and maximum SRS dB (r=0.72). The digits-in-noise test had high sensitivity and specificity to identify abnormal pure-tone (0.88 and 0.88 respectively) and SRS dB (0.76 and 0.88 respectively) results. There was a mean signal-to-noise ratio (SNR) improvement in the aided condition that demonstrated an overall benefit of 0.84 dB SNR. A significant individual variability between subjects in the aided condition (-3.2 to -9.4 dB SNR) and unaided condition (-2 to -9.4 dB SNR) was indicated. This study demonstrated that a smartphone app provides the opportunity to use the English digits-in-noise hearing test as a national test for South Africans. The smartphone app can accommodate NN listeners by adjusting reference scores based on a self-reported English speaking competence. The inclusion of age when determining the screening test result increases the accuracy of the screening test in normal-hearing listeners. Providing these adjustments can ensure adequate test performance across N English and NN English listeners. Furthermore, the digits-in-noise SRT is strongly associated with the best ear 4FPTA and maximum SRS dB and could therefore provide complementary information on speech recognition impairment in noise in a clinical audiometric setting. The digits-in-noise SRT can also demonstrate benefit for hearing aid fittings. The test is quick to administer and provides information on the SNR loss. The digits-in-noise SRT could therefore serve as a valuable tool in counselling and management of expectations for persons with hearing loss who receives amplification. / Thesis (PhD)--University of Pretoria, 2017. / National Research Foundation (NRF) / Speech-Language Pathology and Audiology / PhD / Unrestricted Hearing test Smartphone Speech-in-noise Digits-in-noise Adult hearing screening Speech recognition in noise Hearing loss UCTD
579	Primena retke reprezentacije na modelima Gausovih mešavina koji se koriste za automatsko prepoznavanje govora / An application of sparse representation in Gaussian mixture models used inspeech recognition task Jakovljević Nikša 10 March 2014 (has links) <p>U ovoj disertaciji je predstavljen model koji aproksimira pune kova-<br />rijansne matrice u modelu gausovih mešavina (GMM) sa smanjenim<br />brojem parametara i izračunavanja koji su potrebni za izračunavanje<br />izglednosti. U predloženom modelu inverzne kovarijansne matrice su<br />aproksimirane korišćenjem retke reprezentacije njihovih karakteri-<br />stičnih vektora. Pored samog modela prikazan je i algoritam za<br />estimaciju parametara zasnovan na kriterijumu maksimizacije<br />izgeldnosti. Eksperimentalni rezultati na problemu prepoznavanja<br />govora su pokazali da predloženi model za isti nivo greške kao GMM<br />sa upunim kovarijansnim, redukuje broj parametara za 45%.</p> / <p>This thesis proposes a model which approximates full covariance matrices in<br />Gaussian mixture models with a reduced number of parameters and<br />computations required for likelihood evaluations. In the proposed model<br />inverse covariance (precision) matrices are approximated using sparsely<br />represented eigenvectors. A maximum likelihood algorithm for parameter<br />estimation and its practical implementation are presented. Experimental<br />results on a speech recognition task show that while keeping the word error<br />rate close to the one obtained by GMMs with full covariance matrices, the<br />proposed model can reduce the number of parameters by 45%.</p>
580	Intelligent chatbot assistant: A study of Natural Language Processing and Artificial Intelligence Lerjebo, Linus, Hägglund, Johannes January 2020 (has links) The development and research of Artificial Intelligence have had a recent surge in recent years, which includes the medical field. Despite the new technology and tools available, the staff is still under a heavy workload. The goal of this thesis is to analyze the possibilities of a chatbot whose purpose is to assist the medical staff and provide safety for the patients by guaranteeing that they are being monitored. With the use of technologies such as Artificial Intelligence, Natural Language Processing, and Voice Over Internet Protocol, the chatbot can communicate with the patient. It will work as an assistant for the working staff and provide the information from the calls to the medical staff. With the answers provided from the call, the staff will not be needing to ask routine questions every time and can provide help more quickly. The chatbot is administrated through a web application where administrators can initiate calls and add patients to the database. Artificial Intelligence Natural Language Processing Speech Recognition Dialogflow Chatbot Public switched telephone network Computer Systems Datorsystem

Search results