Spelling suggestions: "subject:"epeech"" "subject:"cpeech""
391 |
Model-based speech separation and enhancement with single-microphone input. / CUHK electronic theses & dissertations collectionJanuary 2008 (has links)
Experiments were carried out for continuous real speech mixed with either competitive speech source or broadband noise. Results show that separation outputs bear similar spectral trajectories as the ideal source signals. For speech mixtures, the proposed algorithm is evaluated in two ways: segmental signal-to-interference ratio (segSIR) and Itakura-Saito distortion ( dIS). It is found that (1) interference signal power is reduced in term of segSIR improvement, even under harsh condition of similar target speech and interference powers; and (2) dIS between the estimated source and the clean speech source is significantly smaller than before processing. These assert the capability of the proposed algorithm to extract individual sources from a mixture signal by reducing the interference signal and generating appropriate spectral trajectory for individual source estimates. / Our approach is based on the findings of psychoacoustics. To separate individual sound sources in a mixture signal, human exploits perceptual cues like harmonicity, continuity, context information and prior knowledge of familiar auditory patterns. Furthermore, the application of prior knowledge of speech for top-down separation (called schema-based grouping) is found to be powerful, yet unexplored. In this thesis, a bi-directional, model-based speech separation and enhancement algorithm is proposed by utilizing speech schemas, in particular. As model patterns are employed to generate subsequent spectral envelopes in an utterance, output speech is expected to be natural and intelligible. / The proposed separation algorithm regenerates a target speech source by working out the corresponding spectral envelope and harmonic structure. In the first stage, an optimal sequence of Wiener filtering is determined for subsequent interference removal. Specifically, acoustic models of speech schemas represented by possible line spectrum pair (LSP) patterns, are manipulated to match the input mixture and the given transcription if available, in a top-down manner. Specific LSP patterns are retrieved to constitute a spectral evolution that synchronizes with the target speech source. With this evolution, the mixture spectrum is then filtered to approximate the target source in an appropriate signal level. In the second stage, irrelevant harmonic structure from interfering sources is eliminated by comb filtering. These filters are designed according to the results of pitch tracking. / This thesis focuses on speech source separation problem in a single-microphone scenario. Possible applications of speech separation include recognition, auditory prostheses and surveillance systems. Sound signals typically reach our ears as a mixture of desired signals, other competing sounds and background noise. Example scenarios are talking with someone in crowd with other people speaking or listening to an orchestra with a number of instruments playing concurrently. These sounds are often overlapped in time and frequency. While human attends to individual sources remarkably well under these adverse conditions even with a single ear, the performance of most speech processing system is easily degraded. Therefore, modeling how human auditory system performs is one viable way to extract target speech sources from the mixture before any vulnerable processes. / Lee, Siu Wa. / "April 2008." / Adviser: Chung Ching. / Source: Dissertation Abstracts International, Volume: 70-03, Section: B, page: 1846. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (p. 233-252). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.
|
392 |
Effects of speech and noise on Cantonese speech intelligibilityMak, Cheuk-yan, Charin. January 2006 (has links)
Thesis (M. Sc.)--University of Hong Kong, 2006. / Title proper from title frame. Also available in printed format.
|
393 |
Speech synthesis via adaptive Fourier decompositionLiu, Zhu Lin January 2011 (has links)
University of Macau / Faculty of Science and Technology / Department of Mathematics
|
394 |
CIAIR In-Car Speech Corpus : Influence of Driving StatusKawaguchi, Nobuo, Matsubara, Shigeki, Takeda, Kazuya, Itakura, Fumitada 03 1900 (has links)
No description available.
|
395 |
An upper bound for tactile recognition of speechMcClellan, Richard Paul, 1944- January 1967 (has links)
No description available.
|
396 |
Improved clipped speech systemsBoulay, Paul Frederick, 1936- January 1964 (has links)
No description available.
|
397 |
A new homomorphic vocoder framework using analysis-by-synthesis excitation analysisChung, Jae H. 05 1900 (has links)
No description available.
|
398 |
Analysis and compensation of stressed and noisy speech with application to robust automatic recognitionHansen, John H. L. 08 1900 (has links)
No description available.
|
399 |
Very low bit rate speech coding using the line spectrum pair transformation of the LPC coefficientsCrosmer, Joel R. 08 1900 (has links)
No description available.
|
400 |
The development of audiovisual speech perceptionHockley, Neil Spencer January 1994 (has links)
The developmental process of audiovisual speech perception was examined in this experiment using the McGurk paradigm (McGurk & MacDonald, 1976), in which a visual recording of a person saying a particular syllable is synchronized with the auditory presentation of another syllable. Previous studies have shown that audiovisual speech perception in adults and older children is very influenced by the visual speech information but children under five are influenced by the auditory input almost exclusively (McGurk & MacDonald, 1976; Massaro, 1984; and Massaro, Thompson, Barron, & Laren, 1986). In this investigation 46 children aged between 4:7 and 12:4, and 15 adults were presented with conflicting audiovisual syllables made according to the McGurk paradigm. The results indicated that the influence of auditory information decreased with age, while the influence of visual information increased with age. In addition, an adult-like response pattern was observed in only half of the children in the oldest child subject group (10-12 years old) suggesting that the integration of auditory and visual speech information continues to develop beyond the age of twelve.
|
Page generated in 0.0406 seconds