• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 510
  • 40
  • 37
  • 35
  • 27
  • 25
  • 21
  • 21
  • 11
  • 11
  • 11
  • 11
  • 11
  • 11
  • 11
  • Tagged with
  • 924
  • 924
  • 509
  • 216
  • 165
  • 150
  • 148
  • 100
  • 98
  • 86
  • 78
  • 72
  • 71
  • 71
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
381

Speech Recognition Error Prediction Approaches with Applications to Spoken Language Understanding

Serai, Prashant January 2021 (has links)
No description available.
382

Lietuvių šnekos atpažinimo akustinis modeliavimas / Acoustic modelling of Lithuanian speech recognition

Laurinčiukaitė, Sigita 26 June 2008 (has links)
Darbas „Lietuvių šnekos atpažinimo akustinis modeliavimas“ yra skirtas lietuvių šnekos atpažinimo akustiniam modeliavimui. Darbe buvo tirtas žodžiais, skiemenimis, kontekstiniais skiemenimis, fonemomis ir kontekstinėmis fonemomis grįstas šnekos atpažinimas. Tyrimai atlikti izoliuotiems žodžiams ir ištisinei šnekai. Iki šiol lietuvių šnekos atpažinime populiariausi kalbos vienetai buvo fonema ir kontekstinė fonema, o kitų kalbos vienetų analizė nebuvo atliekama. Šiame darbe siekiama palyginti lingvistinio tipo kalbos vienetų gebėjimą modeliuoti šneką ir parodyti, kad kalbos vienetų analizė siūlo alternatyvius fonemai ir kontekstinei fonemai kalbos vienetus. Darbe pasiūlyta metodika mišriam skiemenų ir fonemų akustiniam modeliavimui, naujas kalbos vienetas – pseudo-skiemuo; technologijos atskirų kalbos vienetų akustiniam modeliavimui (schemos, įrankiai, rekomendacijos). Eksperimentiniams tyrimams atlikti paruoštas izoliuotų žodžių garsynas ir sukurtos dvi ištisinės šnekos garsyno LRN versijos. Ištyrus izoliuotų žodžių atpažinimą, akustinius modelius konstruojant žodžiams, nustatyta, kad modelių mokymo aibės dydis, akustinių modelių mokymo aibės turinys daro įtaką šnekos atpažinimo tikslumui. Pateikiamos rekomendacijos akustiniam modeliavimui žodžių pagrindu. Ištyrus izoliuotų žodžių atpažinimą, akustinius modelius konstruojant žodžiams, skiemenims ir fonemoms, gauti rezultatai 98 ±1,8 % tikslumu siejami su skiemens tipo kalbos vienetais. Dėl skiemenų akustinio modeliavimo... [toliau žr. visą tekstą] / This paper is devoted to an acoustic modelling of Lithuanian speech recognition. Word-, syllable-, contextual syllable-, phoneme- and contextual phoneme-based speech recognition was investigated. Investigations were performed for isolated words and continuous speech. The most popular sub-word units in Lithuanian speech recognition are phonemes and contextual phonemes, and research on other sub-word units is omitted. This paper aims to compare capacity of linguistic sub-word units to model speech and to demonstrate that investigation of sub-word units suggest using alternative sub-word units to phoneme and contextual phoneme. The dissertation proposes a new methodology for acoustic modelling of syllables and phonemes, new sub-word unit – pseudo-syllable; technologies for acoustic modelling of separate sub-word units, including developed schemes, tools and recommendations. Speech corpus of isolated words was prepared and two versions of corpus of continuous speech LRN were developed for experimental research. Investigation of recognition of isolated words and construction of acoustic models for words showed that a size of training set of acoustic models, a content of training set in regard to number of speakers have an influence on speech recognition accuracy. The recommendations for word-based acoustic modelling are given. Investigation of recognition of isolated words and construction of acoustic models for words, syllables and phonemes showed that the best recognition... [to full text]
383

Acoustic modelling of Lithuanian speech recognition / Lietuvių šnekos atpažinimo akustinis modeliavimas

Laurinčiukaitė, Sigita 26 June 2008 (has links)
This paper is devoted to an acoustic modelling of Lithuanian speech recognition. Word-, syllable-, contextual syllable-, phoneme- and contextual phoneme-based speech recognition was investigated. Investigations were performed for isolated words and continuous speech. The most popular sub-word units in Lithuanian speech recognition are phonemes and contextual phonemes, and research on other sub-word units is omitted. This paper aims to compare capacity of linguistic sub-word units to model speech and to demonstrate that investigation of sub-word units suggest using alternative sub-word units to phoneme and contextual phoneme. The dissertation proposes a new methodology for acoustic modelling of syllables and phonemes, new sub-word unit – pseudo-syllable; technologies for acoustic modelling of separate sub-word units, including developed schemes, tools and recommendations. Speech corpus of isolated words was prepared and two versions of corpus of continuous speech LRN were developed for experimental research. Investigation of recognition of isolated words and construction of acoustic models for words showed that a size of training set of acoustic models, a content of training set in regard to number of speakers have an influence on speech recognition accuracy. The recommendations for word-based acoustic modelling are given. Investigation of recognition of isolated words and construction of acoustic models for words, syllables and phonemes showed that the best recognition... [to full text] / Darbas „Lietuvių šnekos atpažinimo akustinis modeliavimas“ yra skirtas lietuvių šnekos atpažinimo akustiniam modeliavimui. Darbe buvo tirtas žodžiais, skiemenimis, kontekstiniais skiemenimis, fonemomis ir kontekstinėmis fonemomis grįstas šnekos atpažinimas. Tyrimai atlikti izoliuotiems žodžiams ir ištisinei šnekai. Iki šiol lietuvių šnekos atpažinime populiariausi kalbos vienetai buvo fonema ir kontekstinė fonema, o kitų kalbos vienetų analizė nebuvo atliekama. Šiame darbe siekiama palyginti lingvistinio tipo kalbos vienetų gebėjimą modeliuoti šneką ir parodyti, kad kalbos vienetų analizė siūlo alternatyvius fonemai ir kontekstinei fonemai kalbos vienetus. Darbe pasiūlyta metodika mišriam skiemenų ir fonemų akustiniam modeliavimui, naujas kalbos vienetas – pseudo-skiemuo; technologijos atskirų kalbos vienetų akustiniam modeliavimui (schemos, įrankiai, rekomendacijos). Eksperimentiniams tyrimams atlikti paruoštas izoliuotų žodžių garsynas ir sukurtos dvi ištisinės šnekos garsyno LRN versijos. Ištyrus izoliuotų žodžių atpažinimą, akustinius modelius konstruojant žodžiams, nustatyta, kad modelių mokymo aibės dydis, akustinių modelių mokymo aibės turinys daro įtaką šnekos atpažinimo tikslumui. Pateikiamos rekomendacijos akustiniam modeliavimui žodžių pagrindu. Ištyrus izoliuotų žodžių atpažinimą, akustinius modelius konstruojant žodžiams, skiemenims ir fonemoms, gauti rezultatai 98 ±1,8 % tikslumu siejami su skiemens tipo kalbos vienetais. Dėl skiemenų akustinio modeliavimo... [toliau žr. visą tekstą]
384

The Pursuit of Effective Artificial Tactile Speech Communication: Improvements and Cognitive Characteristics of a Phonemic-based Approach

Juan S Martinez (6622304) 26 April 2023 (has links)
<p>Tactile speech communication allows individuals to understand speech by sensations transmitted through the sense of touch. Devices that enable tactile speech communication can be an effective means to transmit important messages when the visual and/or auditory systems are overloaded or impaired. This has applications in silent communication and for people with hearing and/or visual impairments. An effective artificial speech communication system must be learned in a reasonable time and be easily remembered. Moreover, it must transmit any word at suitable rates for speech communication. The pursuit of a system that fulfills these requirements is a complex task that requires work in different areas. This thesis presents advancements in four of them. First is the matter of encoding speech information. Here, a phonemic-based approach allowed participants to recognize of tactile phonemes, words, phrases and full sentences. Second is the issue of training users in the use of the system. To this end, this thesis investigated the phenomenon of incidental categorization of vibrotactile stimuli as the foundation of more natural methods to learn a tactile speech communication system. Third is the matter of the neural processing of the tactile speech information. Here, an exploration of the functional characteristics of the phonemic-based approach using EEG was conducted. Finally, there is the matter of implementing the system for consumer use. In this area, this work addresses practical considerations of delivering rich haptic effects with current wearable technologies. These are informative for the design of actuators used in tactile speech communication devices.</p>
385

A Comparative Analysis of Whisper and VoxRex on Swedish Speech Data

Fredriksson, Max, Ramsay Veljanovska, Elise January 2024 (has links)
With the constant development of more advanced speech recognition models, the need to determine which models are better in specific areas and for specific purposes becomes increasingly crucial. Even more so for low-resource languages such as Swedish, dependent on the progress of models for the large international languages. Lagerlöf (2022) conducted a comparative analysis between Google’s speech-to-text model and NLoS’s VoxRex B, concluding that VoxRex was the best for Swedish audio. Since then, OpenAI released their Automatic Speech Recognition model Whisper, prompting a reassessment of the preferred choice for transcribing Swedish. In this comparative analysis using data from Swedish radio news segments, Whisper performs better than VoxRex in tests on the raw output, highly affected by more proficient sentence constructions. It is not possible to conclude which model is better regarding pure word prediction. However, the results favor VoxRex, displaying a lower variability, meaning that even though Whisper can predict full text better, the decision of what model to use should be determined by the user’s needs.
386

Graphical Models for Robust Speech Recognition in Adverse Environments

Rennie, Steven J. 01 August 2008 (has links)
Robust speech recognition in acoustic environments that contain multiple speech sources and/or complex non-stationary noise is a difficult problem, but one of great practical interest. The formalism of probabilistic graphical models constitutes a relatively new and very powerful tool for better understanding and extending existing models, learning, and inference algorithms; and a bedrock for the creative, quasi-systematic development of new ones. In this thesis a collection of new graphical models and inference algorithms for robust speech recognition are presented. The problem of speech separation using multiple microphones is first treated. A family of variational algorithms for tractably combining multiple acoustic models of speech with observed sensor likelihoods is presented. The algorithms recover high quality estimates of the speech sources even when there are more sources than microphones, and have improved upon the state-of-the-art in terms of SNR gain by over 10 dB. Next the problem of background compensation in non-stationary acoustic environments is treated. A new dynamic noise adaptation (DNA) algorithm for robust noise compensation is presented, and shown to outperform several existing state-of-the-art front-end denoising systems on the new DNA + Aurora II and Aurora II-M extensions of the Aurora II task. Finally, the problem of speech recognition in speech using a single microphone is treated. The Iroquois system for multi-talker speech separation and recognition is presented. The system won the 2006 Pascal International Speech Separation Challenge, and amazingly, achieved super-human recognition performance on a majority of test cases in the task. The result marks a significant first in automatic speech recognition, and a milestone in computing.
387

A study on the integration of phonetic landmarks into large vocabulary continuous speech decoding / Une étude sur l'intégration de repères phonétiques dans le décodage de la parole continue à grand vocabulaire

Ziegler, Stefan 17 January 2014 (has links)
Cette thèse étudie l'intégration de repères phonétiques dans la reconnaissance automatique de la parole (RAP) continue à grand vocabulaire. Les repères sont des événements à temps discret indiquant la présence d’événements phonétiques dans le signal de parole. Le but est de développer des détecteurs de repères qui sont motivés par la connaissance phonétique afin de modéliser quelques événements phonétiques plus précisément. La thèse présente deux approches de détection de repères, qui utilisent l'information extraite par segments et étudie deux méthodes différentes pour intégrer les repères dans le décodage, qui sont un élagage basé sur les repères et une approche reposant sur les combinaisons pondérées. Alors que les deux approches de détection de repères présentées améliorent les performance de reconnaissance de la parole comparées à l'approche de référence, elles ne surpassent pas les prédictions phonétiques standards par trame. Ces résultats indiquant que la RAP guidée par des repères nécessite de l'information phonétique très hétérogène pour être efficace, la thèse présente une troisième méthode d'intégration conçue pour intégrer un nombre arbitraire de flux de repères hétérogènes et asynchrones dans la RAP. Les résultats indiquent que cette méthode est en effet en mesure d'améliorer le système de référence, pourvu que les repères fournissent de l'information complémentaire aux modèles acoustiques standards. / This thesis studies the integration of phonetic landmarks into standard statistical large vocabulary continuous speech recognition (LVCSR). Landmarks are discrete time instances that indicate the presence of phonetic events in the speech signal. The goal is to develop landmark detectors that are motivated by phonetic knowledge in order to model selected phonetic classes more precisely than it is possible with standard acoustic models. The thesis presents two landmark detection approaches, which make use of segment-based information and studies two different methods to integrate landmarks into the decoding, which are landmark-based pruning and a weighted combination approach. While both approaches improve speech recognition performance compared to the baseline using weighted combination of landmarks and acoustic scores during decoding, they do not outperform standard frame-based phonetic predictions. Since these results indicate that landmark-driven LVCSR requires the integration of very heterogeneous information, the thesis presents a third integration framework that is designed to integrate an arbitrary number of heterogeneous and asynchronous landmark streams into LVCSR. The results indicate that this framework is indeed ale to improve the baseline system, as soon as landmarks provide complementary information to the regular acoustic models.
388

Perceptual features for speech recognition

Haque, Serajul January 2008 (has links)
Automatic speech recognition (ASR) is one of the most important research areas in the field of speech technology and research. It is also known as the recognition of speech by a machine or, by some artificial intelligence. However, in spite of focused research in this field for the past several decades, robust speech recognition with high reliability has not been achieved as it degrades in presence of speaker variabilities, channel mismatch condi- tions, and in noisy environments. The superb ability of the human auditory system has motivated researchers to include features of human perception in the speech recognition process. This dissertation investigates the roles of perceptual features of human hearing in automatic speech recognition in clean and noisy environments. Methods of simplified synaptic adaptation and two-tone suppression by companding are introduced by temporal processing of speech using a zero-crossing algorithm. It is observed that a high frequency enhancement technique such as synaptic adaptation performs better in stationary Gaussian white noise, whereas a low frequency enhancement technique such as the two-tone sup- pression performs better in non-Gaussian non-stationary noise types. The effects of static compression on ASR parametrization are investigated as observed in the psychoacoustic input/output (I/O) perception curves. A method of frequency dependent asymmetric compression technique, that is, higher compression in the higher frequency regions than the lower frequency regions, is proposed. By asymmetric compression, degradation of the spectral contrast of the low frequency formants due to the added compression is avoided. A novel feature extraction method for ASR based on the auditory processing in the cochlear nucleus is presented. The processings for synchrony detection, average discharge (mean rate) processing and the two tone suppression are segregated and processed separately at the feature extraction level according to the differential processing scheme as observed in the AVCN, PVCN and the DCN, respectively, of the cochlear nucleus. It is further observed that improved ASR performances can be achieved by separating the synchrony detection from the synaptic processing. A time-frequency perceptual spectral subtraction method based on several psychoacoustic properties of human audition is developed and evaluated by an ASR front-end. An auditory masking threshold is determined based on these psychoacoustic e?ects. It is observed that in speech recognition applications, spec- tral subtraction utilizing psychoacoustics may be used for improved performance in noisy conditions. The performance may be further improved if masking of noise by the tonal components is augmented by spectral subtraction in the masked region.
389

Speech Signal Classification Using Support Vector Machines

Sood, Gaurav 07 1900 (has links)
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high‐performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the dependency on Hidden Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. In this work a novel approach based upon probabilistic kernels in support vector machines have been attempted for speech data classification. The classification accuracy in case of support vector classification depends upon the kernel function used which in turn depends upon the data set in hand. But still as of now there is no way to know a priori which kernel will give us best results The kernel used in this work tries to normalize the time dimension by fitting a probability distribution over individual data points which normalizes the time dimension inherent to speech signals which facilitates the use of support vector machines since it acts on static data only. The divergence between these probability distributions fitted over individual speech utterances is used to form the kernel matrix. Vowel Classification, Isolated Word Recognition (Digit Recognition), have been attempted and results are compared with state of art systems.
390

Objective-driven discriminative training and adaptation based on an MCE criterion for speech recognition and detection

Shin, Sung-Hwan 13 January 2014 (has links)
Acoustic modeling in state-of-the-art speech recognition systems is commonly based on discriminative criteria. Different from the paradigm of the conventional distribution estimation such as maximum a posteriori (MAP) and maximum likelihood (ML), the most popular discriminative criteria such as MCE and MPE aim at direct minimization of the empirical error rate. As recent ASR applications become diverse, it has been increasingly recognized that realistic applications often require a model that can be optimized for a task-specific goal or a particular scenario beyond the general purposes of the current discriminative criteria. These specific requirements cannot be directly handled by the current discriminative criteria since the objective of the criteria is to minimize the overall empirical error rate. In this thesis, we propose novel objective-driven discriminative training and adaptation frameworks, which are generalized from the minimum classification error (MCE) criterion, for various tasks and scenarios of speech recognition and detection. The proposed frameworks are constructed to formulate new discriminative criteria which satisfy various requirements of the recent ASR applications. In this thesis, each objective required by an application or a developer is directly embedded into the learning criterion. Then, the objective-driven discriminative criterion is used to optimize an acoustic model in order to achieve the required objective. Three task-specific requirements that the recent ASR applications often require in practice are mainly taken into account in developing the objective-driven discriminative criteria. First, an issue of individual error minimization of speech recognition is addressed and we propose a direct minimization algorithm for each error type of speech recognition. Second, a rapid adaptation scenario is embedded into formulating discriminative linear transforms under the MCE criterion. A regularized MCE criterion is proposed to efficiently improve the generalization capability of the MCE estimate in a rapid adaptation scenario. Finally, the particular operating scenario that requires a system model optimized at a given specific operating point is discussed over the conventional receiver operating characteristic (ROC) optimization. A constrained discriminative training algorithm which can directly optimize a system model for any particular operating need is proposed. For each of the developed algorithms, we provide an analytical solution and an appropriate optimization procedure.

Page generated in 0.058 seconds