Spelling suggestions: "subject:"speechrecognition"" "subject:"breedsrecognition""
441 |
The use of belief networks in natural language understanding and dialog modeling.January 2001 (has links)
Wai, Chi Man Carmen. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 129-136). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Natural Language Understanding --- p.3 / Chapter 1.3 --- BNs for Handling Speech Recognition Errors --- p.4 / Chapter 1.4 --- BNs for Dialog Modeling --- p.5 / Chapter 1.5 --- Thesis Goals --- p.8 / Chapter 1.6 --- Thesis Outline --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Natural Language Understanding --- p.11 / Chapter 2.1.1 --- Rule-based Approaches --- p.12 / Chapter 2.1.2 --- Stochastic Approaches --- p.13 / Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.16 / Chapter 2.2 --- Handling Recognition Errors in Spoken Queries --- p.17 / Chapter 2.3 --- Spoken Dialog Systems --- p.19 / Chapter 2.3.1 --- Finite-State Networks --- p.21 / Chapter 2.3.2 --- The Form-based Approaches --- p.21 / Chapter 2.3.3 --- Sequential Decision Approaches --- p.22 / Chapter 2.3.4 --- Machine Learning Approaches --- p.24 / Chapter 2.4 --- Belief Networks --- p.27 / Chapter 2.4.1 --- Introduction --- p.27 / Chapter 2.4.2 --- Bayesian Inference --- p.29 / Chapter 2.4.3 --- Applications of the Belief Networks --- p.32 / Chapter 2.5 --- Chapter Summary --- p.33 / Chapter 3 --- Belief Networks for Natural Language Understanding --- p.34 / Chapter 3.1 --- The ATIS Domain --- p.35 / Chapter 3.2 --- Problem Formulation --- p.36 / Chapter 3.3 --- Semantic Tagging --- p.37 / Chapter 3.4 --- Belief Networks Development --- p.38 / Chapter 3.4.1 --- Concept Selection --- p.39 / Chapter 3.4.2 --- Bayesian Inferencing --- p.40 / Chapter 3.4.3 --- Thresholding --- p.40 / Chapter 3.4.4 --- Goal Identification --- p.41 / Chapter 3.5 --- Experiments on Natural Language Understanding --- p.42 / Chapter 3.5.1 --- Comparison between Mutual Information and Informa- tion Gain --- p.42 / Chapter 3.5.2 --- Varying the Input Dimensionality --- p.44 / Chapter 3.5.3 --- Multiple Goals and Rejection --- p.46 / Chapter 3.5.4 --- Comparing Grammars --- p.47 / Chapter 3.6 --- Benchmark with Decision Trees --- p.48 / Chapter 3.7 --- Performance on Natural Language Understanding --- p.51 / Chapter 3.8 --- Handling Speech Recognition Errors in Spoken Queries --- p.52 / Chapter 3.8.1 --- Corpus Preparation --- p.53 / Chapter 3.8.2 --- Enhanced Belief Network Topology --- p.54 / Chapter 3.8.3 --- BNs for Handling Speech Recognition Errors --- p.55 / Chapter 3.8.4 --- Experiments on Handling Speech Recognition Errors --- p.60 / Chapter 3.8.5 --- Significance Testing --- p.64 / Chapter 3.8.6 --- Error Analysis --- p.65 / Chapter 3.9 --- Chapter Summary --- p.67 / Chapter 4 --- Belief Networks for Mixed-Initiative Dialog Modeling --- p.68 / Chapter 4.1 --- The CU FOREX Domain --- p.69 / Chapter 4.1.1 --- Domain-Specific Constraints --- p.69 / Chapter 4.1.2 --- Two Interaction Modalities --- p.70 / Chapter 4.2 --- The Belief Networks --- p.70 / Chapter 4.2.1 --- Informational Goal Inference --- p.72 / Chapter 4.2.2 --- Detection of Missing / Spurious Concepts --- p.74 / Chapter 4.3 --- Integrating Two Interaction Modalities --- p.78 / Chapter 4.4 --- Incorporating Out-of-Vocabulary Words --- p.80 / Chapter 4.4.1 --- Natural Language Queries --- p.80 / Chapter 4.4.2 --- Directed Queries --- p.82 / Chapter 4.5 --- Evaluation of the BN-based Dialog Model --- p.84 / Chapter 4.6 --- Chapter Summary --- p.87 / Chapter 5 --- Scalability and Portability of Belief Network-based Dialog Model --- p.88 / Chapter 5.1 --- Migration to the ATIS Domain --- p.89 / Chapter 5.2 --- Scalability of the BN-based Dialog Model --- p.90 / Chapter 5.2.1 --- Informational Goal Inference --- p.90 / Chapter 5.2.2 --- Detection of Missing / Spurious Concepts --- p.92 / Chapter 5.2.3 --- Context Inheritance --- p.94 / Chapter 5.3 --- Portability of the BN-based Dialog Model --- p.101 / Chapter 5.3.1 --- General Principles for Probability Assignment --- p.101 / Chapter 5.3.2 --- Performance of the BN-based Dialog Model with Hand- Assigned Probabilities --- p.105 / Chapter 5.3.3 --- Error Analysis --- p.108 / Chapter 5.4 --- Enhancements for Discourse Query Understanding --- p.110 / Chapter 5.4.1 --- Combining Trained and Handcrafted Probabilities --- p.110 / Chapter 5.4.2 --- Handcrafted Topology for BNs --- p.111 / Chapter 5.4.3 --- Performance of the Enhanced BN-based Dialog Model --- p.117 / Chapter 5.5 --- Chapter Summary --- p.120 / Chapter 6 --- Conclusions --- p.122 / Chapter 6.1 --- Summary --- p.122 / Chapter 6.2 --- Contributions --- p.126 / Chapter 6.3 --- Future Work --- p.127 / Bibliography --- p.129 / Chapter A --- The Two Original SQL Query --- p.137 / Chapter B --- "The Two Grammars, GH and GsA" --- p.139 / Chapter C --- Probability Propagation in Belief Networks --- p.149 / Chapter C.1 --- Computing the aposteriori probability of P*(G) based on in- put concepts --- p.151 / Chapter C.2 --- Computing the aposteriori probability of P*(Cj) by backward inference --- p.154 / Chapter D --- Total 23 Concepts for the Handcrafted BN --- p.156
|
442 |
An HMM-based speech recognition IC.January 2003 (has links)
Han Wei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 60-61). / Abstracts in English and Chinese. / Abstract --- p.i / 摘要 --- p.ii / Acknowledgements --- p.iii / Contents --- p.iv / List of Figures --- p.vi / List of Tables --- p.vii / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1. --- Speech Recognition --- p.1 / Chapter 1.2. --- ASIC Design with HDLs --- p.3 / Chapter Chapter 2 --- Theory of HMM-Based Speech Recognition --- p.6 / Chapter 2.1. --- Speaker-Dependent and Speaker-Independent --- p.6 / Chapter 2.2. --- Frame and Feature Vector --- p.6 / Chapter 2.3. --- Hidden Markov Model --- p.7 / Chapter 2.3.1. --- Markov Model --- p.8 / Chapter 2.3.2. --- Hidden Markov Model --- p.9 / Chapter 2.3.3. --- Elements of an HMM --- p.10 / Chapter 2.3.4. --- Types of HMMs --- p.11 / Chapter 2.3.5. --- Continuous Observation Densities in HMMs --- p.13 / Chapter 2.3.6. --- Three Basic Problems for HMMs --- p.15 / Chapter 2.4. --- Probability Evaluation --- p.16 / Chapter 2.4.1. --- The Viterbi Algorithm --- p.17 / Chapter 2.4.2. --- Alternative Viterbi Implementation --- p.19 / Chapter Chapter 3 --- HMM-based Isolated Word Recognizer Design Methodology …… --- p.20 / Chapter 3.1. --- Speech Recognition Based On Single Mixture --- p.23 / Chapter 3.2. --- Speech Recognition Based On Double Mixtures --- p.25 / Chapter Chapter 4 --- VLSI Implementation of the Speech Recognizer --- p.29 / Chapter 4.1. --- The System Requirements --- p.29 / Chapter 4.2. --- Implementation of a Speech Recognizer with a Single-Mixture HMM --- p.30 / Chapter 4.3. --- Implementation of a Speech Recognizer with a Double-Mixture HMM --- p.39 / Chapter 4.4. --- Extend Usage in High Order Mixtures HMM --- p.46 / Chapter 4.5. --- Pipelining and the System Timing --- p.50 / Chapter Chapter 5 --- Simulation and IC Testing --- p.53 / Chapter 5.1. --- Simulation Result --- p.53 / Chapter 5.2. --- Testing --- p.55 / Chapter Chapter 6 --- Discussion and Conclusion --- p.58 / Reference --- p.60 / Appendix I Verilog Code of the Double-Mixture HMM Based Speech Recognition IC (RTL Level) --- p.62 / Subtracter --- p.62 / Multiplier --- p.63 / Core_Adder --- p.65 / Register for X --- p.66 / Subtractor and Comparator --- p.67 / Shifter --- p.68 / Look-Up Table --- p.71 / Register for Constants --- p.79 / Register for Scores --- p.80 / Final Score Register --- p.84 / Controller --- p.86 / Top --- p.97 / Appendix II Chip Microphotograph --- p.103 / Appendix III Pin Assignment of the Speech Recognition IC --- p.104 / Appendix IV The Testing Board of the IC --- p.108
|
443 |
Sistemas de adaptação ao locutor utilizando autovozes. / Speaker adaptation system using eigenvoices.Liselene de Abreu Borges 20 December 2001 (has links)
O presente trabalho descreve duas técnicas de adaptação ao locutor para sistemas de reconhecimento de voz utilizando um volume de dados de adaptação reduzido. Regressão Linear de Máxima Verossimilhança (MLLR) e Autovozes são as técnicas trabalhadas. Ambas atualizam as médias das Gaussianas dos modelos ocultos de Markov (HMM). A técnica MLLR estima um grupo de transformações lineares para os parâmetros das medias das Gaussianas do sistema. A técnica de Autovozes baseia-se no conhecimento prévio das variações entre locutores. Para obtermos o conhecimento prévio, que está contido nas autovozes, utiliza-se a análise em componentes principais (PCA). Fizemos os testes de adaptação das médias em um sistema de reconhecimento de voz de palavras isoladas e de vocabulário restrito. Contando com um volume grande de dados de adaptação (mais de 70% das palavras do vocabulário) a técnica de autovozes não apresentou resultados expressivos com relação aos que a técnica MLLR apresentou. Agora, quando o volume de dados reduzido (menos de 15% das palavras do vocabulário) a técnica de Autovozes apresentou-se superior à MLLR. / This present work describe two speaker adaptation technique, using a small amount of adaptation data, for a speech recognition system. These techniques are Maximum Likelihood Linear Regression (MLLR) and Eigenvoices. Both re-estimates the mean of a continuous density Hidden Markov Model system. MLLR technique estimates a set of linear transformations for mean parameters of a Gaussian system. The eigenvoice technique is based on a previous knowledge about speaker variation. For obtaining this previous knowledge, that are retained in eigenvoices, it necessary to apply principal component analysis (PCA). We make adaptation tests over an isolated word recognition system, restrict vocabulary. If a large amount of adaptation data is available (up to 70% of all vocabulary) Eigenvoices technique does not appear to be a good implementation if compared with the MLLR technique. Now, when just a small amount of adaptation data is available (less than 15 % of all vocabulary), Eigenvoices technique get better results than MLLR technique.
|
444 |
Identificação de locutor usando modelos de misturas de gaussianas. / Speaker identification using Gaussian mixture models.Denis Pirttiaho Cardoso 03 April 2009 (has links)
A identificação de locutor está relacionada com a seleção de um locutor dentro de um conjunto de membros pré-definidos e neste trabalho os experimentos foram realizados utilizando um sistema de identificação de locutor independente de texto baseado em modelos de mistura de gaussianas. Para realizar os testes, foi empregado o banco de voz TIMIT e sua correspondente versão corrompida por ruído de canal telefônico, isto é, NTIMIT. O aparelho fonador pode ser representado por coeficientes mel-cepstrais obtidos por meio de banco de filtros ou, alternativamente, por coeficientes de predição linear. Adicionalmente, a técnica de subtração da média cepstral é aplicada quando o banco de voz NITMIT é utilizado com o intuito de minimizar a distorção de canal intrínseca a ele. A componente da locução para a qual os coeficientes mel-cepstrais são calculados é obtida através de um detector de atividade de voz (DAV). No entanto, os DAVs são em geral sensíveis à relação de sinal-ruído da locução, sendo necessário adaptá-los para as condições de operação do sistema. É sugerida a integração no DAV de um estimador da relação de sinal-ruído baseado no método Minima Controlled Recursive Average (MCRA), que é necessário para permitir o tratamento de sinais tanto limpos quanto ruidosos. É observado que em locuções de elevada relação de sinal-ruído, como aquelas provenientes do banco de voz TIMIT, o método mais apropriado de extração dos coeficientes mel-cepstrais foi o padrão, isto é, baseado em banco de filtros, enquanto que para sinais de voz ruidosos a técnica de subtração da média cepstral aliada à extração dos coeficientes mel-cepstrais a partir de coeficientes de predição linear revelou os melhores resultados. / Speaker identification is concerned with the selection of one speaker within a set of enrolled members and in this work the experiments were performed using a textindependent cohort Gaussian mixture model (GMM) speaker identification system. In order to perform the tests, TIMIT speech database is used and its corresponding version corrupted by a noisy telephone channel, i.e., NTIMIT. The vocal tract is represented by Mel-cepstral frequency coefficients with filter banks or, alternatively, by linear prediction cepstral coefficients. Additionally, the cepstral mean subtraction technique is applied when the NTIMIT database is used to minimize the channel distortion intrinsic to it. The utterance component for which the Mel-frequency cepstral coefficients is obtained using a voice activity detector (VAD). However, the VADs are generally sensitive to the signal-to-noise ratio of the utterance, making it necessary to adapt them to the system operating conditions. A signal-to-noise ratio estimator is included in the proposal VAD, which is based on Minima Controlled Recursive Average (MCRA), in order to be able to handle both clean and noisy speech. It is observed that in high signal-to-noise ratio utterances, such as those from the TIMIT database, the more appropriate extraction method for the Mel-frequency cepstral coefficients was the baseline one consisting of filter banks, while for noisy speech the technique of cepstral mean subtraction coupled with the extraction of Mel-frequency cepstral coefficients from linear prediction cepstral coefficients provided the best results.
|
445 |
Identificação de locutor usando modelos de misturas de gaussianas. / Speaker identification using Gaussian mixture models.Cardoso, Denis Pirttiaho 03 April 2009 (has links)
A identificação de locutor está relacionada com a seleção de um locutor dentro de um conjunto de membros pré-definidos e neste trabalho os experimentos foram realizados utilizando um sistema de identificação de locutor independente de texto baseado em modelos de mistura de gaussianas. Para realizar os testes, foi empregado o banco de voz TIMIT e sua correspondente versão corrompida por ruído de canal telefônico, isto é, NTIMIT. O aparelho fonador pode ser representado por coeficientes mel-cepstrais obtidos por meio de banco de filtros ou, alternativamente, por coeficientes de predição linear. Adicionalmente, a técnica de subtração da média cepstral é aplicada quando o banco de voz NITMIT é utilizado com o intuito de minimizar a distorção de canal intrínseca a ele. A componente da locução para a qual os coeficientes mel-cepstrais são calculados é obtida através de um detector de atividade de voz (DAV). No entanto, os DAVs são em geral sensíveis à relação de sinal-ruído da locução, sendo necessário adaptá-los para as condições de operação do sistema. É sugerida a integração no DAV de um estimador da relação de sinal-ruído baseado no método Minima Controlled Recursive Average (MCRA), que é necessário para permitir o tratamento de sinais tanto limpos quanto ruidosos. É observado que em locuções de elevada relação de sinal-ruído, como aquelas provenientes do banco de voz TIMIT, o método mais apropriado de extração dos coeficientes mel-cepstrais foi o padrão, isto é, baseado em banco de filtros, enquanto que para sinais de voz ruidosos a técnica de subtração da média cepstral aliada à extração dos coeficientes mel-cepstrais a partir de coeficientes de predição linear revelou os melhores resultados. / Speaker identification is concerned with the selection of one speaker within a set of enrolled members and in this work the experiments were performed using a textindependent cohort Gaussian mixture model (GMM) speaker identification system. In order to perform the tests, TIMIT speech database is used and its corresponding version corrupted by a noisy telephone channel, i.e., NTIMIT. The vocal tract is represented by Mel-cepstral frequency coefficients with filter banks or, alternatively, by linear prediction cepstral coefficients. Additionally, the cepstral mean subtraction technique is applied when the NTIMIT database is used to minimize the channel distortion intrinsic to it. The utterance component for which the Mel-frequency cepstral coefficients is obtained using a voice activity detector (VAD). However, the VADs are generally sensitive to the signal-to-noise ratio of the utterance, making it necessary to adapt them to the system operating conditions. A signal-to-noise ratio estimator is included in the proposal VAD, which is based on Minima Controlled Recursive Average (MCRA), in order to be able to handle both clean and noisy speech. It is observed that in high signal-to-noise ratio utterances, such as those from the TIMIT database, the more appropriate extraction method for the Mel-frequency cepstral coefficients was the baseline one consisting of filter banks, while for noisy speech the technique of cepstral mean subtraction coupled with the extraction of Mel-frequency cepstral coefficients from linear prediction cepstral coefficients provided the best results.
|
446 |
The Effect of the Slope of the Psychometric Function on the Measurement of Speech Recognition Threshold Using a Male TalkerBakhsh, Nujod Ali 01 June 2018 (has links)
Speech audiometry is the aspect of audiology that provides critical information on how individuals hear one of the most important sounds of daily life: speech. The speech recognition threshold (SRT) is a measure of speech audiometry that is widely used to provide information on an individual's capacity to hear speech. Over time, researchers and clinicians have worked to improve the SRT by developing and modifying a variety of word lists to be used during testing. Eventually, spondaic words were selected as the best stimuli for the SRT. The spondaic words had to meet four criteria: familiarity, phonetic dissimilarity, normal sampling of English sounds, and homogeneity with respect to audibility. This study examined the aspect of homogeneity with regard to slope of the psychometric function. Specifically, whether slope of the psychometric function had an effect on the number of words used to obtain the SRT, and thus reduce test time, as well as whether slope had an effect on the relationship between the SRT and the pure-tone average (PTA). It was hypothesized that words with a steep slope would significantly reduce test time and yield a close SRT-PTA agreement. Three word lists (steep, medium, and shallow sloping words), all recorded by a male talker, were used to obtain the SRT on 40 participants (ages 18-30 years). Statistical analysis showed significant differences in the number of words to obtain the SRT and the SRT-PTA agreement. However, when the differences were examined from a clinical perspective, the results were negligible. When compared with words with medium and steep slopes, words with shallow slope required an average of four extra words to obtain the SRT, which does not result in a meaningful reduction in test time. For clinical purposes, it appears that the slope of the psychometric function does not need to be taken into consideration for the SRT. Clinicians may use a variety of words as long as they meet the original four criteria for selection of spondees.
|
447 |
The Effect of the Slope of the Psychometric Function on the Measurement of Speech Recognition Threshold Using a Female TalkerReese, Jessica Lee 01 June 2018 (has links)
Speech audiometry has long been a component of a thorough audiological examination. The speech recognition threshold (SRT) measurement is perhaps the most widely used measurement in speech audiometry. For decades, researchers and clinicians have worked to create and fine-tune word lists that for use in SRT testing; their aim being to improve the accuracy for classifying a client's ability to hear and comprehend speech. Experts in the field have agreed to follow four tenets of speech audiometry when selecting word sets. This study examined whether improvement to stimulus lists for SRT measurement could be made in regards to the tenet of homogeneity with respect to audibility if the slope of the psychometric function were a selection consideration. The study was performed with the hypothesis that steeply sloping words would significantly reduce the number of words needed to obtain the SRT. Three word lists, all recorded by a female talker, comprising of steeply sloping words, medium sloping words, and shallow sloping words, were used in the study. Participants with normal hearing between the ages of 18 and 30 years provided data that was used to calculate SRT measurements for all three lists from each ear. The results showed a significant difference in the number of words needed to obtain the SRT when comparing the steep and shallow word sets and the shallow and medium word sets. Steeply sloping words required fewer words to obtain the SRT, M = 17.02. Shallow sloping words required the most words, M = 18.88, amounting to a difference of 1.86 words. While statistically different, a reduction by fewer than 2 words during the course of SRT testing will not equate to a substantial saving of time for the clinician. For clinical application, the slope of the psychometric function of the words used in SRT measurement need not be a primary consideration when developing stimulus lists.
|
448 |
Development of Psychometrically Equivalent Speech Recognition Threshold Materials for Native Cebuano SpeakersAnderson, Melissa Dawn 01 December 2016 (has links)
While there is a clear and immediate need for reliable speech audiometry materials to evaluate the speech recognition threshold (SRT), these recorded materials are not available in Cebuano, a language of the Philippines with 15.8 million speakers. The purpose of this study was to develop, digitally record, evaluate, and psychometrically equate a set of Cebuano trisyllabic words for use in measuring the SRT. To create the SRT materials, common Cebuano trisyllabic words were digitally recorded by a male talker of Cebuano and presented for evaluation to 20 native speakers of Cebuano with normal hearing. Based on psychometric performance, a set of 21 trisyllabic words with a psychometric function slope >7%/dB that allowed threshold adjustments to the pure tone average were selected and digitally adjusted. The resulting mean psychometric function slopes at 50% for the 21 SRT trisyllabic materials was 10.2%/dB. The results of the current study are comparable to those found in other languages. Digital recordings of the trisyllabic words are available on compact disc.
|
449 |
Development of Word Recognition Materials for Native Cebuano SpeakersGordon, Sarah Mickele 01 April 2017 (has links)
Within recent decades speech audiometry materials have been developed in various languages in order to more accurately identify and evaluate hearing impairment in native speakers. This advantage, however, is not available to native Cebuano speakers. The purpose of this study was to develop, digitally record, evaluate, and psychometrically equate a set of Cebuano bisyllabic word lists for use in measuring word recognition ability. This process began with recording 260 commonly used bisyllabic Cebuano words by a native speaker noted for his quality and pleasantness of speech in his native tongue. These recordings were then evaluated by 20 normally hearing native Cebuano listeners (21 to 63 years old). Of these words, 200 were selected and then divided into 4 lists of 50 bisyllabic words and 8 half-lists of 25 bisyllabic words. Statistical analysis of the word recognition materials found no significant difference among the lists or half-lists. The mean psychometric function slope at 50% for the bisyllabic word lists and half-lists is 7.3%/dB. The mean 50% threshold for the lists was 19.7 dB HL (SD = 0.1dB). Adjustments were not necessary. The results of the current study are comparable to those found in other languages. Digital recordings of the bisyllabic word lists are available on compact disc.
|
450 |
Psychometrically Equivalent Trisyllabic Words for Testing Spanish Pediatric Speech Recognition ThresholdsGraham, Jessica Lee 01 March 2016 (has links)
The purpose of this study was to use previously recorded Spanish trisyllabic words tested on adults in the measurement of the speech recognition threshold of Spanish-speaking children in order to (a) determine the words' appropriateness when testing children and (b) compare psychometric functions between adults and children. A selection of 28 frequently used trisyllabic words was chosen from previously recorded samples of male and female adult native speakers of Spanish. These words were then presented to 20 native Spanish-speaking children with normal hearing between the ages of 4 and 8 years. The words were presented starting at -5 dB HL and ascended in 5 dB increments until the presentation level reached 15 dB HL. Using logistic regression, psychometric functions were calculated for each word. Resulting pediatric thresholds were found to be 8.7 dB higher for male talkers and 11.0 dB higher for female talkers than previously reported adult thresholds. These results indicate a clinically significant threshold difference between pediatric and adult populations. Future research should be conducted to measure the speech recognition threshold (SRT) in children of varying ages to determine the age at which the SRT approximates with adult performance.
|
Page generated in 0.1008 seconds