Spelling suggestions: "subject:"speech.one recognition"" "subject:"speakers’speech recognition""
1 |
A correlogram approach to speaker identification based on a human auditory modelErtas, Figen January 1997 (has links)
No description available.
|
2 |
Measurement, analysis, and detection of nasalization in speechNiu, Xiaochuan 03 1900 (has links) (PDF)
Ph.D. / Computer Science and Electrical Engineering / Nasalization refers to the process of speech production in which significant amounts of airfow and sound energy are transmitted through the nasal tract. In phonetics, nasalization is necessary for certain phonemes to be produced in normal speech; and it can also be a normal consequence of coarticulation. In disordered speech, however, inappropriate nasalization can be one of the causes that reduces the intelligibility of speech. Instrumental measurement and analysis techniques are needed for better understanding the relationship between the physiological status and the aerodynamic and acoustic effects of nasalization during speech. The main aim of the research work presented in this dissertation is to investigate the aerodynamic and acoustic effects of nasalization, and to develop objective approaches to measure, analyze, and detect the nasalized segments in speech. Based on an extensive survey of existing literature on the measurements of velopharyngeal function, the acoustic production models of speech, the analysis methods and results of normal nasalization, and the analysis methods of resonance disorders, it is understood that the final acoustic representation of nasalization is a complex outcome that is affected by the degree of velopharyngeal opening, the variation of vocal tract configurations, the mixture of multiple acoustic channels and speaker differences. It is proposed to incorporate more available information besides single channel acoustic signals during the analysis of nasalization. In our research work, a parallel study of acoustic and aerodynamic signals reveals the complimentary information within the signals. In addition, dual-channel acoustic studies help to understand the acoustic relationship between the oral and nasal cavities, and show inherent advantages over the single-channel analysis. Based on the derivation and analysis of the dual-channel acoustic properties, automatic detectors of nasalization are developed and successfully tested. The techniques developed in these explorations provide novel instrumental and analysis approaches to possible applications such as phonetic studies of the normal nasalization process, clinical assessment of disordered nasal resonance, and special feature extraction for speech recognition.
|
3 |
A statistical approach to formant tracking /Gayvert, Robert T. January 1988 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 1989. / Includes bibliographical references (leaves 20-21).
|
4 |
Automatic formant labeling in continuous speech /Richards, Elizabeth A. January 1989 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 1989. / Includes bibliographical references (leaves 81-85).
|
5 |
Speech accent identification and speech recognition enhancement by speaker accent adaptation /Tanabian, Mohammad M., January 1900 (has links)
Thesis (M.Sc.) - Carleton University, 2005. / Includes bibliographical references (p. 150-155). Also available in electronic format on the Internet.
|
6 |
The role of coarticulation in speech-on-speech recognitionJett, Brandi 23 May 2019 (has links)
No description available.
|
7 |
Cochlear implant sound coding with across-frequency delaysTaft, Daniel Adam January 2009 (has links)
The experiments described in this thesis investigate the temporal relationship between frequency bands in a cochlear implant sound processor. Initial studies were of cochlea-based traveling wave delays for cochlear implant sound processing strategies. These were later broadened into studies of an ensemble of across-frequency delays. / Before incorporating cochlear delays into a cochlear implant processor, a set of suitable delays was determined with a psychoacoustic calibration to pitch perception, since normal cochlear delays are a function of frequency. The first experiment assessed the perception of pitch evoked by electrical stimuli from cochlear implant electrodes. Six cochlear implant users with acoustic hearing in their non-implanted ears were recruited for this, since they were able to compare electric stimuli to acoustic tones. Traveling wave delays were then computed for each subject using the frequencies matched to their electrodes. These were similar across subjects, ranging over 0-6 milliseconds along the electrode array. / The next experiment applied the calibrated delays to the ACE strategy filter outputs before maxima selection. The effects upon speech perception in noise were assessed with cochlear implant users, and a small but significant improvement was observed. A subsequent sensitivity analysis indicated that accurate calibration of the delays might not be necessary after all; instead, a range of across-frequency delays might be similarly beneficial. / A computational investigation was performed next, where a corpus of recorded speech was passed through the ACE cochlear implant sound processing strategy in order to determine how across-frequency delays altered the patterns of stimulation. A range of delay vectors were used in combination with a number of processing parameter sets and noise levels. The results showed that additional stimuli from broadband sounds (such as the glottal pulses of vowels) are selected when frequency bands are desynchronized with across-frequency delays. Background noise contains fewer dominant impulses than a single talker and so is not enhanced in this way. / In the following experiment, speech perception with an ensemble of across-frequency delays was assessed with eight cochlear implant users. Reverse cochlear delays (high frequency delays) were equivalent to conventional cochlear delays. Benefit was diminished for larger delays. Speech recognition scores were at baseline with random delay assignments. An information transmission analysis of speech in quiet indicated that the discrimination of voiced cues was most improved with across-frequency delays. For some subjects, this was seen as improved vowel discrimination based on formant locations and improved transmission of the place of articulation of consonants. / A final study indicated that benefits to speech perception with across-frequency delays are diminished when the number of maxima selected per frame is increased above 8-out-of-22 frequency bands.
|
8 |
Speech-in-Speech Recognition: Understanding the Effect of Different Talker MaskersBrown, Stephanie Danielle 23 May 2019 (has links)
No description available.
|
9 |
A Study On Language Modeling For Turkish Large Vocabulary Continuous Speech RecognitionBayer, Ali Orkan 01 June 2005 (has links) (PDF)
This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based models, and stem-end-based models. Two pass recognition is performed using the Hidden Markov Modeling Toolkit (HTK) for testing the system first with the bigram models and then with the trigram models. At the end of the study, it is found that trigram models over stems and endings give better results, since their coverage of the vocabulary is better.
|
10 |
Synchronous HMMs for audio-visual speech processingDean, David Brendan January 2008 (has links)
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
|
Page generated in 0.0812 seconds