191 |
Considering design for automatic speech recognition in use.Kraal, Ben James, n/a January 2006 (has links)
Talking to a computer is hard. Large vocabulary automatic speech recognition
(ASR) systems are difficult to use and yet they are used by many
people in their daily work. This thesis addresses the question: How is
ASR used and made usable and useful in the workplace now?
To answer these questions I went into two workplaces where ASR is
currently used and one where ASR could be used in the future. This field
work was done with designing in mind. ASR dictation systems are currently
used in the Australian Public Service (APS) by people who suffer
chronic workplace overuse injuries and in the Hansard department of Parliament
House (Hansard) by un-injured people.
Analysing the experiences of the users in the APS and at Hansard
showed that using an ASR system in the workplace follows a broad trajectory
that ends in the continued effort to maintain its usefulness. The
usefulness of the ASR systems is �performed into existence� by the users
with varying degrees of success. For both the APS and Hansard users,
they use ASR to allow work to be performed; ASR acts to bridge the gap
between otherwise incompatible ways of working.
This thesis also asks: How could ASR be used and made usable and
useful in workplaces in the future? To answer this question, I observed
the work of communicating sentences at the ACT Magistrates Court.
Communicating sentences is a process that is distributed in space and
time throughout the Court and embodied in a set of documents that have
a co-ordinating role. A design for an ASR system that supports the process
of communicating sentences while respecting existing work process
is described.
Moving from field work to design is problematic. This thesis performs
the process of moving from field work to design, as described above, and
reflects the use of various analytic methods used to distill insights from
field work data.
The contributions of this thesis are:
� The pragmatic use of existing social research methods and their antecedents
as a corpus of analyses to inspire new designs;
vi
� a demonstration of the use of Actor-Network Theory in design both
as critique and as part of a design process;
� empirical field-work evidence of how large vocabulary ASR is used
in the workplace;
� a design showing how ASR could be introduced to the rich, complicated,
environment of the ACT Magistrates Court; and,
� a performance of the process of moving from field work to design.
|
192 |
Sequential organization in computational auditory scene analysisShao, Yang, January 2007 (has links)
Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 156-168).
|
193 |
Intonation modelling for the Nguni languagesGovender, Natasha. January 2006 (has links)
Thesis (M. Computer Science)--University of Pretoria, 2006. / Includes summary. Includes bibliographical references (leaves 46-48).
|
194 |
Speaker-machine interaction in automatic speech recognition.January 1970 (has links)
Also issued as a Ph.D. thesis in the Dept. of Electrical Engineering, 1970. / Bibliography: p.109-112.
|
195 |
Nonlinear compensation and heterogeneous data modeling for robust speech recognitionZhao, Yong 21 February 2013 (has links)
The goal of robust speech recognition is to maintain satisfactory recognition accuracy under mismatched operating conditions. This dissertation addresses the robustness issue from two directions.
In the first part of the dissertation, we propose the Gauss-Newton method as a unified approach to estimating noise parameters for use in prevalent nonlinear compensation models, such as vector Taylor series (VTS), data-driven parallel model combination (DPMC), and unscented transform (UT), for noise-robust speech recognition. While iterative estimation of noise means in a generalized EM framework has been widely known, we demonstrate that such approaches are variants of the Gauss-Newton method. Furthermore, we propose a novel noise variance estimation algorithm that is consistent with the Gauss-Newton principle. The formulation of the Gauss-Newton method reduces the noise estimation problem to determining the Jacobians of the corrupted speech parameters. For sampling-based compensations, we present two methods, sample Jacobian average (SJA) and cross-covariance (XCOV), to evaluate these Jacobians. The Gauss-Newton method is closely related to another noise estimation approach, which views the model compensation from a generative perspective, giving rise to an EM-based algorithm analogous to the ML estimation for factor analysis (EM-FA). We demonstrate a close connection between these two approaches: they belong to the family of gradient-based methods except with different convergence rates. Note that the convergence property can be crucial to the noise estimation in many applications where model compensation may have to be frequently carried out in changing noisy environments to retain desired performance. Furthermore, several techniques are explored to further improve the nonlinear compensation approaches. To overcome the demand of the clean speech data for training acoustic models, we integrate nonlinear compensation with adaptive training. We also investigate the fast VTS compensation to improve the noise estimation efficiency, and combine the VTS compensation with acoustic echo cancellation (AEC) to mitigate issues due to interfering background speech. The proposed noise estimation algorithm is evaluated for various compensation models on two tasks. The first is to fit a GMM model to artificially corrupted samples, the second is to perform speech recognition on the Aurora 2 database, and the third is on a speech corpus simulating the meeting of multiple competing speakers. The significant performance improvements confirm the efficacy of the Gauss-Newton method to estimating the noise parameters of the nonlinear compensation models.
The second research work is devoted to developing more effective models to take full advantage of heterogeneous speech data, which are typically collected from thousands of speakers in various environments via different transducers. The proposed synchronous HMM, in contrast to the conventional HMMs, introduces an additional layer of substates between the HMM state and the Gaussian component variables. The substates have the capability to register long-span non-phonetic attributes, such as gender, speaker identity, and environmental condition, which are integrally called speech scenes in this study. The hierarchical modeling scheme allows an accurate description of probability distribution of speech units in different speech scenes. To address the data sparsity problem in estimating parameters of multiple speech scene sub-models, a decision-based clustering algorithm is presented to determine the set of speech scenes and to tie the substate parameters, allowing us to achieve an excellent balance between modeling accuracy and robustness. In addition, by exploiting the synchronous relationship among the speech scene sub-models, we propose the multiplex Viterbi algorithm to efficiently decode the synchronous HMM within a search space of the same size as for the standard HMM. The multiplex Viterbi can also be generalized to decode an ensemble of isomorphic HMM sets, a problem often arising in the multi-model systems. The experiments on the Aurora 2 task show that the synchronous HMMs produce a significant improvement in recognition performance over the HMM baseline at the expense of a moderate increase in the memory requirement and computational complexity.
|
196 |
Implementation of Embedded Mandarin Speech Recognition System in Travel DomainChen, Bo-han 07 September 2009 (has links)
We build a two-pass Mandarin Automatic Speech Recognition (ASR) decoder on mobile device (PDA). The first-pass recognizing base syllable is implemented by discrete Hidden Markov Model (HMM) with time-synchronous, tree-lexicon Viterbi search. The second-pass dealing with language model, pronunciation lexicon and N-best syllable hypotheses from first-pass is implemented by Weighted Finite State Transducer (WFST). The best word sequence is obtained by shortest path algorithms over the composition result. This system limits the application in travel domain and it decouples the application of acoustic model and the application of language model into independent recognition passes. We report the real-time recognition performance performed on ASUS P565 with a 800MHz processor, 128MB RAM running Microsoft Window Mobile 6 operating system.
The 26-hour TCC-300 speech data is used to train 151 acoustic model. The 3-minute speech data recorded by reading the travel-domain transcriptions is used as the testing set for evaluating the performances (syllable, character accuracies) and real-time factors on PC and on PDA. The trained bi-gram model with 3500-word from BTEC corpus is used in second-pass.
In the first-pass, the best syllable accuracy is 38.8% given 30-best syllable hypotheses using continuous HMM and 26-dimension feature. Under the above syllable hypotheses and acoustic model, we obtain 27.6% character accuracy on PC after the second-pass.
|
197 |
Multilingual speech recognition /Uebler, Ulla. January 2000 (has links)
Thesis (doctoral)--University of Erlangen-Nurnberg, 2000. / Cover title. Includes bibliographical references (p. 161-170) and index.
|
198 |
Model selection based speaker adaptation and its application to nonnative speech recognition /He, Xiaodong, January 2003 (has links)
Thesis (Ph. D.)--University of Missouri-Columbia, 2003. / Typescript. Vita. Includes bibliographical references (leaves 99-110). Also available on the Internet.
|
199 |
Model selection based speaker adaptation and its application to nonnative speech recognitionHe, Xiaodong, January 2003 (has links)
Thesis (Ph. D.)--University of Missouri-Columbia, 2003. / Typescript. Vita. Includes bibliographical references (leaves 99-110). Also available on the Internet.
|
200 |
Network training for continuous speech recognitionAlphonso, Issac John. January 2003 (has links)
Thesis (M.S.)--Mississippi State University. Department of Electrical and Computer Engineering. / Title from title screen. Includes bibliographical references.
|
Page generated in 0.1317 seconds