291 |
The use of belief networks in natural language understanding and dialog modeling.January 2001 (has links)
Wai, Chi Man Carmen. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 129-136). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Natural Language Understanding --- p.3 / Chapter 1.3 --- BNs for Handling Speech Recognition Errors --- p.4 / Chapter 1.4 --- BNs for Dialog Modeling --- p.5 / Chapter 1.5 --- Thesis Goals --- p.8 / Chapter 1.6 --- Thesis Outline --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Natural Language Understanding --- p.11 / Chapter 2.1.1 --- Rule-based Approaches --- p.12 / Chapter 2.1.2 --- Stochastic Approaches --- p.13 / Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.16 / Chapter 2.2 --- Handling Recognition Errors in Spoken Queries --- p.17 / Chapter 2.3 --- Spoken Dialog Systems --- p.19 / Chapter 2.3.1 --- Finite-State Networks --- p.21 / Chapter 2.3.2 --- The Form-based Approaches --- p.21 / Chapter 2.3.3 --- Sequential Decision Approaches --- p.22 / Chapter 2.3.4 --- Machine Learning Approaches --- p.24 / Chapter 2.4 --- Belief Networks --- p.27 / Chapter 2.4.1 --- Introduction --- p.27 / Chapter 2.4.2 --- Bayesian Inference --- p.29 / Chapter 2.4.3 --- Applications of the Belief Networks --- p.32 / Chapter 2.5 --- Chapter Summary --- p.33 / Chapter 3 --- Belief Networks for Natural Language Understanding --- p.34 / Chapter 3.1 --- The ATIS Domain --- p.35 / Chapter 3.2 --- Problem Formulation --- p.36 / Chapter 3.3 --- Semantic Tagging --- p.37 / Chapter 3.4 --- Belief Networks Development --- p.38 / Chapter 3.4.1 --- Concept Selection --- p.39 / Chapter 3.4.2 --- Bayesian Inferencing --- p.40 / Chapter 3.4.3 --- Thresholding --- p.40 / Chapter 3.4.4 --- Goal Identification --- p.41 / Chapter 3.5 --- Experiments on Natural Language Understanding --- p.42 / Chapter 3.5.1 --- Comparison between Mutual Information and Informa- tion Gain --- p.42 / Chapter 3.5.2 --- Varying the Input Dimensionality --- p.44 / Chapter 3.5.3 --- Multiple Goals and Rejection --- p.46 / Chapter 3.5.4 --- Comparing Grammars --- p.47 / Chapter 3.6 --- Benchmark with Decision Trees --- p.48 / Chapter 3.7 --- Performance on Natural Language Understanding --- p.51 / Chapter 3.8 --- Handling Speech Recognition Errors in Spoken Queries --- p.52 / Chapter 3.8.1 --- Corpus Preparation --- p.53 / Chapter 3.8.2 --- Enhanced Belief Network Topology --- p.54 / Chapter 3.8.3 --- BNs for Handling Speech Recognition Errors --- p.55 / Chapter 3.8.4 --- Experiments on Handling Speech Recognition Errors --- p.60 / Chapter 3.8.5 --- Significance Testing --- p.64 / Chapter 3.8.6 --- Error Analysis --- p.65 / Chapter 3.9 --- Chapter Summary --- p.67 / Chapter 4 --- Belief Networks for Mixed-Initiative Dialog Modeling --- p.68 / Chapter 4.1 --- The CU FOREX Domain --- p.69 / Chapter 4.1.1 --- Domain-Specific Constraints --- p.69 / Chapter 4.1.2 --- Two Interaction Modalities --- p.70 / Chapter 4.2 --- The Belief Networks --- p.70 / Chapter 4.2.1 --- Informational Goal Inference --- p.72 / Chapter 4.2.2 --- Detection of Missing / Spurious Concepts --- p.74 / Chapter 4.3 --- Integrating Two Interaction Modalities --- p.78 / Chapter 4.4 --- Incorporating Out-of-Vocabulary Words --- p.80 / Chapter 4.4.1 --- Natural Language Queries --- p.80 / Chapter 4.4.2 --- Directed Queries --- p.82 / Chapter 4.5 --- Evaluation of the BN-based Dialog Model --- p.84 / Chapter 4.6 --- Chapter Summary --- p.87 / Chapter 5 --- Scalability and Portability of Belief Network-based Dialog Model --- p.88 / Chapter 5.1 --- Migration to the ATIS Domain --- p.89 / Chapter 5.2 --- Scalability of the BN-based Dialog Model --- p.90 / Chapter 5.2.1 --- Informational Goal Inference --- p.90 / Chapter 5.2.2 --- Detection of Missing / Spurious Concepts --- p.92 / Chapter 5.2.3 --- Context Inheritance --- p.94 / Chapter 5.3 --- Portability of the BN-based Dialog Model --- p.101 / Chapter 5.3.1 --- General Principles for Probability Assignment --- p.101 / Chapter 5.3.2 --- Performance of the BN-based Dialog Model with Hand- Assigned Probabilities --- p.105 / Chapter 5.3.3 --- Error Analysis --- p.108 / Chapter 5.4 --- Enhancements for Discourse Query Understanding --- p.110 / Chapter 5.4.1 --- Combining Trained and Handcrafted Probabilities --- p.110 / Chapter 5.4.2 --- Handcrafted Topology for BNs --- p.111 / Chapter 5.4.3 --- Performance of the Enhanced BN-based Dialog Model --- p.117 / Chapter 5.5 --- Chapter Summary --- p.120 / Chapter 6 --- Conclusions --- p.122 / Chapter 6.1 --- Summary --- p.122 / Chapter 6.2 --- Contributions --- p.126 / Chapter 6.3 --- Future Work --- p.127 / Bibliography --- p.129 / Chapter A --- The Two Original SQL Query --- p.137 / Chapter B --- "The Two Grammars, GH and GsA" --- p.139 / Chapter C --- Probability Propagation in Belief Networks --- p.149 / Chapter C.1 --- Computing the aposteriori probability of P*(G) based on in- put concepts --- p.151 / Chapter C.2 --- Computing the aposteriori probability of P*(Cj) by backward inference --- p.154 / Chapter D --- Total 23 Concepts for the Handcrafted BN --- p.156
|
292 |
An HMM-based speech recognition IC.January 2003 (has links)
Han Wei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 60-61). / Abstracts in English and Chinese. / Abstract --- p.i / 摘要 --- p.ii / Acknowledgements --- p.iii / Contents --- p.iv / List of Figures --- p.vi / List of Tables --- p.vii / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1. --- Speech Recognition --- p.1 / Chapter 1.2. --- ASIC Design with HDLs --- p.3 / Chapter Chapter 2 --- Theory of HMM-Based Speech Recognition --- p.6 / Chapter 2.1. --- Speaker-Dependent and Speaker-Independent --- p.6 / Chapter 2.2. --- Frame and Feature Vector --- p.6 / Chapter 2.3. --- Hidden Markov Model --- p.7 / Chapter 2.3.1. --- Markov Model --- p.8 / Chapter 2.3.2. --- Hidden Markov Model --- p.9 / Chapter 2.3.3. --- Elements of an HMM --- p.10 / Chapter 2.3.4. --- Types of HMMs --- p.11 / Chapter 2.3.5. --- Continuous Observation Densities in HMMs --- p.13 / Chapter 2.3.6. --- Three Basic Problems for HMMs --- p.15 / Chapter 2.4. --- Probability Evaluation --- p.16 / Chapter 2.4.1. --- The Viterbi Algorithm --- p.17 / Chapter 2.4.2. --- Alternative Viterbi Implementation --- p.19 / Chapter Chapter 3 --- HMM-based Isolated Word Recognizer Design Methodology …… --- p.20 / Chapter 3.1. --- Speech Recognition Based On Single Mixture --- p.23 / Chapter 3.2. --- Speech Recognition Based On Double Mixtures --- p.25 / Chapter Chapter 4 --- VLSI Implementation of the Speech Recognizer --- p.29 / Chapter 4.1. --- The System Requirements --- p.29 / Chapter 4.2. --- Implementation of a Speech Recognizer with a Single-Mixture HMM --- p.30 / Chapter 4.3. --- Implementation of a Speech Recognizer with a Double-Mixture HMM --- p.39 / Chapter 4.4. --- Extend Usage in High Order Mixtures HMM --- p.46 / Chapter 4.5. --- Pipelining and the System Timing --- p.50 / Chapter Chapter 5 --- Simulation and IC Testing --- p.53 / Chapter 5.1. --- Simulation Result --- p.53 / Chapter 5.2. --- Testing --- p.55 / Chapter Chapter 6 --- Discussion and Conclusion --- p.58 / Reference --- p.60 / Appendix I Verilog Code of the Double-Mixture HMM Based Speech Recognition IC (RTL Level) --- p.62 / Subtracter --- p.62 / Multiplier --- p.63 / Core_Adder --- p.65 / Register for X --- p.66 / Subtractor and Comparator --- p.67 / Shifter --- p.68 / Look-Up Table --- p.71 / Register for Constants --- p.79 / Register for Scores --- p.80 / Final Score Register --- p.84 / Controller --- p.86 / Top --- p.97 / Appendix II Chip Microphotograph --- p.103 / Appendix III Pin Assignment of the Speech Recognition IC --- p.104 / Appendix IV The Testing Board of the IC --- p.108
|
293 |
Automatic speech recognition system for people with speech disordersRamaboka, Manthiba Elizabeth January 2018 (has links)
Thesis (M.Sc. (Computer Science)) --University of Limpopo, 2018 / The conversion of speech to text is essential for communication between speech
and visually impaired people. The focus of this study was to develop and evaluate an
ASR baseline system designed for normal speech to correct speech disorders.
Normal and disordered speech data were sourced from Lwazi project and UCLASS,
respectively. The normal speech data was used to train the ASR system. Disordered
speech was used to evaluate performance of the system. Features were extracted
using the Mel-frequency cepstral coefficients (MFCCs) method in the processing
stage. The cepstral mean combined variance normalization (CMVN) was applied to
normalise the features. A third-order language model was trained using the SRI
Language Modelling (SRILM) toolkit. A recognition accuracy of 65.58% was
obtained. The refinement approach is then applied in the recognised utterance to
remove the repetitions from stuttered speech. The approach showed that 86% of
repeated words in stutter can be removed to yield an improved hypothesized text
output. Further refinement of the post-processing module ASR is likely to achieve a
near 100% correction of stuttering speech
Keywords: Automatic speech recognition (ASR), speech disorder, stuttering
|
294 |
Investigation into automatic speech recognition of different dialects of Northern SothoMapeka, Madimetja Asaph January 2005 (has links)
Thesis (MSc. (Computer Science)) -- University of Limpopo, 2005 / Refer to the document / Telkom (SA),
HP (SA) and
National Research Fund
|
295 |
Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech RecognitionHoffman, Jeffrey Dean 01 December 2016 (has links)
Automatic speech recognition has become a standard feature on many consumer electronics and automotive products, and the accuracy of the decoded speech has improved dramatically over time. Often, designers of these products achieve accuracy by employing microphone arrays and beamforming algorithms to reduce interference. However, beamforming microphone arrays are too large for small form factor products such as smart watches. Yet these small form factor products, which have precious little space for tactile user input (i.e. knobs, buttons and touch screens), would benefit immensely from a user interface based on reliably accurate automatic speech recognition.
This thesis proposes a solution for interference mitigation that employs blind source separation with a compact array of commercially available unidirectional microphone elements. Such an array provides adequate spatial diversity to enable blind source separation and would easily fit in a smart watch or similar small form factor product. The solution is characterized using publicly available speech audio clips recorded for the purpose of testing automatic speech recognition algorithms. The proposal is modelled in different interference environments and the efficacy of the solution is evaluated. Factors affecting the performance of the solution are identified and their influence quantified. An expectation is presented for the quality of separation as well as the resulting improvement in word error rate that can be achieved from decoding the separated speech estimate versus the mixture obtained from a single unidirectional microphone element. Finally, directions for future work are proposed, which have the potential to improve the performance of the solution thereby making it a commercially viable product.
|
296 |
Incorporation of syntax and semantics to improve the performance of an automatic speech recognizerRapholo, Moyahabo Isaiah January 2012 (has links)
Thesis (M.Sc. (Computer Science)) -- University of Limpopo, 2012 / Automatic Speech Recognition (ASR) is a technology that allows a computer to identify spoken words and translate those spoken words into text. Speech recognition systems have started to be used in may application areas such as healthcare, automobile, e-commerce, military, and others. The use of these speech recognition systems is usually limited by their poor performance.
In this research we are looking at improving the performance of the baseline ASR systems by incorporating syntactic structures in grammar into an existing Northern Sotho ASR, based on hidden Markov models (HMMs). The syntactic structures will be applied to the vocabulary used within the healthcare application area domain. The Backus Naur Form (BNF) and the Extended Backus Naur Form (EBNF) was used to specify the grammar. The experimental results show the overall improvement to the baseline ASR System and hence give a basis for following this approach.
|
297 |
Forensic speaker analysis and identification by computer : a Bayesian approach anchored in the cepstral domainKhodai-Joopari, Mehrdad, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2007 (has links)
This thesis advances understanding of the forensic value of the automatic speech parameters by addressing the following question: what is the potentiality of the speech cepstrum as a forensic-acoustic parameter? Despite many advances in automatic speech and speaker recognition, robust and unconstrained progress in technical forensic speaker identification has been partly impeded by our incomplete understanding of the interaction and relation between forensic phonetics and the techniques employed in state-of-the-art automatic speech and speaker recognition. The posed question underlies the recurrent and longstanding issue of acoustic parameterisation in the area of forensic phonetics, where 1) speaker identification often must be carried out under less than optimal conditions, and 2) views differ on the usefulness and trustworthiness of the formant frequency measurements. To this end, a new formulation for the forensic evaluation of speech data was derived which is effectively a spectral likelihood ratio with enhanced sensitivity to the local peaks of the formant structure of the speech spectrum of vowel sounds, while retaining the characteristics of the Bayesian framework. This new hybrid formula was used together with a novel approach, which is founded on a statistically-based matched-pairs technique to account for various levels of variation inherent in speech recordings, thereby providing a spectrally meaningful measure of variations between two speech spectra and hence the true worth of speech samples as forensic evidence. The experimental results are obtained based on a forensically-realistic database of a relatively large population of 297 native speakers of Japanese. In sum, the research conducted in this thesis is a major step forward in advancing the forensic-phonetic field which broadens the objective basis of the forensic speaker identification. Beyond advancing knowledge in the field, the semi data-independent nature of the new formula ultimately has great implications in technical forensic speaker identification. It also provides us with a valuable biometric tool with both academic and commercial potential in crime investigation in a field which is already suffering from the lack of adequate data.
|
298 |
Short-Time Phase Spectrum in Human and Automatic Speech RecognitionAlsteris, Leigh, n/a January 2006 (has links)
Incorporating information from the short-time phase spectrum into a feature set for automatic speech recognition (ASR) may possibly serve to improve recognition accuracy. Currently, however, it is common practice to discard this information in favour of features that are derived purely from the short-time magnitude spectrum. There are two reasons for this: 1) the results of some well-known human listening experiments have indicated that the short-time phase spectrum conveys a negligible amount of intelligibility at the small window durations of 20-40 ms used for ASR spectral analysis, and 2) using the short-time phase spectrum directly for ASR has proven di?cult from a signal processing viewpoint, due to phase-wrapping and other problems. In this thesis, we explore the possibility of using short-time phase spectrum information for ASR by considering the two points mentioned above. To address the ?rst point, we conduct our own set of human listening experiments. Contrary to previous studies, our results indicate that the short-time phase spectrum can indeed contribute signi?cantly to speech intelligibility over small window durations of 20-40 ms. Also, the results of these listening experiments, in addition to some ASR experiments, indicate that at least part of this intelligibility may be supplementary to that provided by the short-time magnitude spectrum. To address the second point (i.e., the signal processing di?culties), it may be necessary to transform the short-time phase spectrum into a more physically meaningful representation from which useful features could possibly be extracted. Speci?cally, we investigate the frequency-derivative (or group delay function, GDF) and the time-derivative (or instantaneous frequency distribution, IFD) as potential candidates for this intermediate representation. We have performed various experiments which show that the GDF and IFD may be useful for ASR. We conduct several ASR experiments to test a feature set derived from the GDF. We ?nd that, in most cases, these features perform worse than the standard MFCC features. Therefore, we suggest that a short-time phase spectrum feature set may ultimately be derived from a concatenation of information from both the GDF and IFD representations. For best performance, the feature set may also need to be concatenated with short-time magnitude spectrum information. Further to addressing the two aforementioned points, we also discuss a number of other speech applications in which the short-time phase spectrum has proven to be very useful. We believe that an appreciation for how the short-time phase spectrum has been used for other tasks, in addition to the results of our research, will provoke fellow researchers to also investigate its potential for use in ASR.
|
299 |
Combining acoustic analysis and phonotactic analysis to improve automatic speech recognitionNulsen, Susan, n/a January 1998 (has links)
This thesis addresses the problem of automatic speech recognition, specifically, how
to transform an acoustic waveform into a string of words or phonemes. A preliminary
chapter gives linguistic information potentially useful in automatic speech
recognition. This is followed by a description of the Wave Analysis Laboratory
(WAL), a rule-based system which detects features in speech and was designed as
the acoustic front end of a speech recognition system. Temporal reasoning as used
in WAL rules is examined. The use of WAL in recognizing one particular class of
speech sounds, the nasal consonants, is described in detail.
The remainder of the thesis looks at the statistical analysis of samples of spontaneous
speech. An orthographic transcription of a large sample of spontaneous
speech is automatically translated into phonemes. Tables of the frequencies of
word initial and word final phoneme clusters are constructed to illustrate some
of the phonotactic constraints of the language. Statistical data is used to assign
phonemes to phonotactic classes. These classes are unlike the acoustic classes,
although there is a general distinction between the vowels, the consonants and the
word boundary.
A way of measuring the phonetic balance of a sample of speech is described. This
can be used as a means of ranking potential test samples in terms of how well they
represent the language.
A phoneme n-gram model is used to measure the entropy of the language. The
broad acoustic encoding output from WAL is used with this language model to
reconstruct a small test sample.
"Branching" a simpler alternative to perplexity is introduced and found to give
similar results to perplexity. Finally, the drop in branching is calculated as knowledge
of various sets of acoustic classes is considered.
In the work described in this thesis the main contributions made to automatic
speech recognition and the study of speech are in the development of the Wave
Analysis Laboratory and in the analysis of speech from a phonotactic point of view.
The phoneme cluster frequencies provide new information on spoken language,
as do the phonotactic classes. The measures of phonetic balance and branching
provide additional tools for use in the development of speech recognition systems.
|
300 |
Learning pronunciation variation : A data-driven approach to rule-based lecxicon adaptation for automatic speech recognitionAmdal, Ingunn January 2002 (has links)
<p>To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differed compared with using only one canonical pronunciation per word. Modelling the variation using the acoustic models (using context dependency and/or speaker dependent adaptation) gave a significant improvement, but the resulting performance for non-native and spontaneous speech was still far from read speech.</p><p>In this dissertation a complete data-driven approach to rule-based lexicon adaptation is presented, where the effect of the acoustic models is incorporated in the rule pruning metric. Reference and alternative transcriptions were aligned by dynamic programming, but with a data-driven method to derive the phone-to-phone substitution costs. The costs were based on the statistical co-occurrence of phones, association strength. Rules for pronunciation variation were derived from this alignment. The rules were pruned using a new metric based on acoustic log likelihood. Well trained acoustic models are capable of modelling much of the variation seen, and using the acoustic log likelihood to assess the pronunciation rules prevents the lexical modelling from adding variation already accounted for as shown for direct pronunciation variation modelling.</p><p>For the non-native task data-driven pronunciation modelling by learning pronunciation rules gave a significant performance gain. Acoustic log likelihood rule pruning performed better than rule probability pruning.</p><p>For spontaneous dictation the pronunciation variation experiments did not improve the performance. The answer to how to better model the variation for spontaneous speech seems to lie neither in the acoustical nor the lexical modelling. The main differences between read and spontaneous speech are the grammar used and disfluencies like restarts and long pauses. The language model may thus be the best starting point for more research to achieve better performance for this speaking style.</p>
|
Page generated in 0.1499 seconds