• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 508
  • 40
  • 37
  • 35
  • 27
  • 25
  • 21
  • 21
  • 11
  • 11
  • 11
  • 11
  • 11
  • 11
  • 11
  • Tagged with
  • 921
  • 921
  • 507
  • 215
  • 162
  • 149
  • 148
  • 99
  • 98
  • 84
  • 78
  • 72
  • 69
  • 69
  • 67
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Robust speech recognition in noise using statistical signal mapping

Dahan, Jean-Guy January 1994 (has links)
Note:
112

Automatic identification and recognition of deaf speech /

Abdelhamied, Kadry A. January 1986 (has links)
No description available.
113

Automatic labelling of mandarin

陳達宗, Chan, Tat-chung. January 1996 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
114

Improved acoustic modelling for HMMs using linear transformations

Leggetter, Christopher John January 1995 (has links)
No description available.
115

A speech input modality for computer-aided drawing : user interface issues

Kay, Peter January 1995 (has links)
No description available.
116

Subspace Gaussian mixture models for automatic speech recognition

Lu, Liang January 2013 (has links)
In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach — the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored. In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database.
117

Full covariance modelling for speech recognition

Bell, Peter January 2010 (has links)
HMM-based systems for Automatic Speech Recognition typically model the acoustic features using mixtures of multivariate Gaussians. In this thesis, we consider the problem of learning a suitable covariance matrix for each Gaussian. A variety of schemes have been proposed for controlling the number of covariance parameters per Gaussian, and studies have shown that in general, the greater the number of parameters used in the models, the better the recognition performance. We therefore investigate systems with full covariance Gaussians. However, in this case, the obvious choice of parameters – given by the sample covariance matrix – leads to matrices that are poorly-conditioned, and do not generalise well to unseen test data. The problem is particularly acute when the amount of training data is limited. We propose two solutions to this problem: firstly, we impose the requirement that each matrix should take the form of a Gaussian graphical model, and introduce a method for learning the parameters and the model structure simultaneously. Secondly, we explain how an alternative estimator, the shrinkage estimator, is preferable to the standard maximum likelihood estimator, and derive formulae for the optimal shrinkage intensity within the context of a Gaussian mixture model. We show how this relates to the use of a diagonal covariance smoothing prior. We compare the effectiveness of these techniques to standard methods on a phone recognition task where the quantity of training data is artificially constrained. We then investigate the performance of the shrinkage estimator on a large-vocabulary conversational telephone speech recognition task. Discriminative training techniques can be used to compensate for the invalidity of the model correctness assumption underpinning maximum likelihood estimation. On the large-vocabulary task, we use discriminative training of the full covariance models and diagonal priors to yield improved recognition performance.
118

Audio fingerprinting for speech reconstruction and recognition in noisy environments

Liu, Feng 13 April 2017 (has links)
Audio fingerprinting is a highly specific content-based audio retrieval technique. Given a short audio fragment as query, an audio fingerprinting system can identify the particular file that contains the fragment in a large library potentially consisting of millions of audio files. In this thesis, we investigate the possibility and feasibility of applying audio fingerprinting to do speech recognition in noisy environments based on speech reconstruction. To reconstruct noisy speech, the speech is divided into small segments of equal length at first. Then, audio fingerprinting is used to find the most similar segment in a large dataset consisting of clean speech files. If the similarity is above a threshold, the noisy segment is replaced with the clean segment. At last, all the segments, after conditional replacement, are concatenated to form the reconstructed speech, which is sent to a traditional speech recognition system. In the above procedure, a critical step is using audio fingerprinting to find the clean speech segment in a dataset. To test its performance, we build a landmark-based audio fingerprinting system. Experimental results show that this baseline system performs well in traditional applications, but its accuracy in this new application is not as good as we expected. Next, we propose three strategies to improve the system, resulting in better accuracy than the baseline system. Finally, we integrate the improved audio fingerprinting system into a traditional speech recognition system and evaluate the performance of the whole system. / Graduate
119

Development of a cloud platform for automatic speech recognition / Development of a cloud platform for automatic speech recognition

Klejch, Ondřej January 2015 (has links)
This thesis presents a cloud platform for automatic speech recognition, CloudASR, built on top of Kaldi speech recognition toolkit. The platform sup- ports both batch and online speech recognition mode and it has an annotation interface for transcription of the submitted recordings. The key features of the platform are scalability, customizability and easy deployment. Benchmarks of the platform show that the platform achieves comparable performance with Google Speech API in terms of latency and it can achieve better accuracy on limited domains. Furthermore, the benchmarks show that the platform is able to handle more than 1000 parallel requests given enough computational resources. 1
120

Voice recognition systems : assessment of implementation aboard U.S. naval ships

Wilson, Shawn C. 03 1900 (has links)
Approved for public release; distribution is unlimited. / Technological advances have had profound effects on the conduct of military operations in both peacetime and in war. One advance that has had a great impact outside the military by reducing human intervention is Voice Recognition (VR) technology. This thesis will examine the implementation of a Voice Recognition System as a shipdriving device and as a means of decreasing the occurrence of mishaps while reducing the level of fatigue of watchstanders on the bridge. Chapter I will discuss the need for the United States Navy to investigate the implementation of a Voice Recognition System to help reduce the probability of mishaps occurring. Chapter II will explain voice recognition technology, how it works, and how the proposed system can be fielded aboard U.S. Navy ships. Chapter III will examine the opinions (on the implementation of a Voice Recognition System) of officers charged with the safe navigation of naval ships. Chapter IV will review the concerns of officers, and will justify the implementation by answering these concerns. The conclusion will iterate the advances in voice recognition, and why a Voice Recognition system should be implemented on the bridges of U.S. Navy ships. / Lieutenant, United States Navy

Page generated in 0.0557 seconds