Global ETD Search

61	Effects of noise type on speech understanding Ng, H. N., Elaine., 吳凱寧. January 2006 (has links) published_or_final_version / abstract / Speech and Hearing Sciences / Master / Master of Science in Audiology Audiometry. Speech perception. Speech processing systems.
62	EFFICIENT CODING OF SPEECH SYNTHESIS DATA. Hosne-Sanaye, Simin. January 1984 (has links) No description available. Speech synthesis. Speech processing systems. Coding theory.
63	Voice recognition systems : assessment of implementation aboard U.S. naval ships Wilson, Shawn C. 03 1900 (has links) Approved for public release; distribution is unlimited. / Technological advances have had profound effects on the conduct of military operations in both peacetime and in war. One advance that has had a great impact outside the military by reducing human intervention is Voice Recognition (VR) technology. This thesis will examine the implementation of a Voice Recognition System as a shipdriving device and as a means of decreasing the occurrence of mishaps while reducing the level of fatigue of watchstanders on the bridge. Chapter I will discuss the need for the United States Navy to investigate the implementation of a Voice Recognition System to help reduce the probability of mishaps occurring. Chapter II will explain voice recognition technology, how it works, and how the proposed system can be fielded aboard U.S. Navy ships. Chapter III will examine the opinions (on the implementation of a Voice Recognition System) of officers charged with the safe navigation of naval ships. Chapter IV will review the concerns of officers, and will justify the implementation by answering these concerns. The conclusion will iterate the advances in voice recognition, and why a Voice Recognition system should be implemented on the bridges of U.S. Navy ships. / Lieutenant, United States Navy Voiceprints Speech processing systems Automatic speech recognition
64	Query-by-example spoken term detection for low-resource languages / CUHK electronic theses & dissertations collection January 2014 (has links) In this thesis, we consider the problem of query-by-example (QbyE) spoken term detection (STD) for low-resource languages. The problem is to automatically detect and locate the occurrences of a query term in a large audio database. The query term is given in the form of one or more audio examples. This research is motivated by the demand for information retrieval technologies that can handle speech data of low-resource languages. The major technical difficulty is that manual transcriptions and linguistic knowledge are not available for these languages. / The framework of acoustic segment modeling (ASM) is adopted for unsupervised training of a speech tokenizer. Three novel algorithms are developed for segment labeling in the ASM framework. The proposed algorithms are based on the use of different class-by-segment posterior representations and spectral clustering techniques. The posterior representations are shown to be more robust than conventional spectral representations. Spectral clustering has achieved significant success in many applications. Reformulations of spectral clustering algorithms are made to make them computationally feasible for clustering a large number of speech segments. Experiments on a multilingual speech database demonstrate the advantage of the proposed algorithms over existing approaches. / The speech tokenizer obtained with ASM is applied to QbyE STD. The detection of spoken queries is based on a frame-based template matching framework. The ASM tokenizer serves as the front-end to generate posterior features, which are used for temporal template matching by dynamic time warping (DTW). Experiments show that the ASM tokenizer outperforms a GMM tokenizer and language-mismatched phoneme recognizers. Moreover, a two-step approach is proposed for efficient search. / The frame-based template matching framework for QbyE STD is enhanced in three ways. A novel DTW matrix combination approach is proposed for the fusion of multiple systems with different posterior features. Pseudo-relevance feedback is used for query expansion, and score normalization is applied to calibrate the score distributions of different query terms. Experimental results show that the performances of the QbyE STD system are significantly improved by the three approaches. / 關鍵詞檢測是一項在大量語音數據庫中查找某關鍵詞位置的技術。關鍵詞檢測無論在學術研究領域還是實際應用領域都有非常重要的價值。傳統關鍵詞檢測的研究主要針對資源豐富的語言。本文研究針對資源匱乏的語言的關鍵詞檢測。在本文設定條件下，目標語言沒有足夠的資源訓練語音識別系統，並且關鍵詞以聲音樣例的形式給定。 / 本文採用聲學語音段建模（ASM）框架來無監督訓練語音識別器。我們提出三種新的方法用於ASM框架中的語音片段聚類。我們的方法基於一種新的魯棒的語音片段特徵，並且採用了譜聚類技術。實驗證明我們的方法優於另外三種常用的基線方法，能夠取得更好的建模效果。 / 我們將ASM識別器用於基於模板匹配的關鍵詞檢測系統中。在該系統中，ASM識別器被視為前端特徵轉換模塊，用於提取後驗概率特徵。為了提高檢測效率，我們還提出一種兩步檢測方法。實驗效果證明我們的方法能夠取得較高的檢測準確率。 / 為了進一步提高檢測準確率，本文從三個角度優化基於模板匹配的關鍵詞檢測系統。首先我們提出在動態時間規整的距離矩陣上進行系統融合。其次我們提出用偽相關反饋技術來獲取更多的關鍵詞樣例。最後我們對系統打分進行規整從而有利於在設定統一的打分門限。實驗結果證明這三種方法都有效的提高了關鍵詞檢測的系統性能。 / Wang, Haipeng. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 110-127). / Abstracts also in Chinese. / Title from PDF title page (viewed on 05, December, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. Speech processing systems TK7882.S65 W349 2014eb
65	A robust low bit rate quad-band excitation LSP vocoder. January 1994 (has links) by Chiu Kim Ming. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 103-108). / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Speech production --- p.2 / Chapter 1.2 --- Low bit rate speech coding --- p.4 / Chapter Chapter 2 --- Speech analysis & synthesis --- p.8 / Chapter 2.1 --- Linear prediction of speech signal --- p.8 / Chapter 2.2 --- LPC vocoder --- p.11 / Chapter 2.2.1 --- Pitch and voiced/unvoiced decision --- p.11 / Chapter 2.2.2 --- Spectral envelope representation --- p.15 / Chapter 2.3 --- Excitation --- p.16 / Chapter 2.3.1 --- Regular pulse excitation and Multipulse excitation --- p.16 / Chapter 2.3.2 --- Coded excitation and vector sum excitation --- p.19 / Chapter 2.4 --- Multiband excitation --- p.22 / Chapter 2.5 --- Multiband excitation vocoder --- p.25 / Chapter Chapter 3 --- Dual-band and Quad-band excitation --- p.31 / Chapter 3.1 --- Dual-band excitation --- p.31 / Chapter 3.2 --- Quad-band excitation --- p.37 / Chapter 3.3 --- Parameters determination --- p.41 / Chapter 3.3.1 --- Pitch detection --- p.41 / Chapter 3.3.2 --- Voiced/unvoiced pattern generation --- p.43 / Chapter 3.4 --- Excitation generation --- p.47 / Chapter Chapter 4 --- A low bit rate Quad-Band Excitation LSP Vocoder --- p.51 / Chapter 4.1 --- Architecture of QBELSP vocoder --- p.51 / Chapter 4.2 --- Coding of excitation parameters --- p.58 / Chapter 4.2.1 --- Coding of pitch value --- p.58 / Chapter 4.2.2 --- Coding of voiced/unvoiced pattern --- p.60 / Chapter 4.3 --- Spectral envelope estimation and coding --- p.62 / Chapter 4.3.1 --- Spectral envelope & the gain value --- p.62 / Chapter 4.3.2 --- Line Spectral Pairs (LSP) --- p.63 / Chapter 4.3.3 --- Coding of LSP frequencies --- p.68 / Chapter 4.3.4 --- Coding of gain value --- p.77 / Chapter Chapter 5 --- Performance evaluation --- p.80 / Chapter 5.1 --- Spectral analysis --- p.80 / Chapter 5.2 --- Subjective listening test --- p.93 / Chapter 5.2.1 --- Mean Opinion Score (MOS) --- p.93 / Chapter 5.2.2 --- Diagnostic Rhyme Test (DRT) --- p.96 / Chapter Chapter 6 --- Conclusions and discussions --- p.99 / References --- p.103 / Appendix A Subroutine of pitch detection --- p.A-I - A-III / Appendix B Subroutine of voiced/unvoiced decision --- p.B-I - B-V / Appendix C Subroutine of LPC coefficients calculation using Durbin's recursive method --- p.C-I - C-II / Appendix D Subroutine of LSP calculation using Chebyshev Polynomials --- p.D-I - D-III / Appendix E Single syllable word pairs for Diagnostic Rhyme Test --- p.E-I Vocoder Data Compression (Telecommunication) Speech processing systems
66	An automatic speaker recognition system. January 1989 (has links) by Yu Chun Kei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1989. / Bibliography: leaves 86-88. Automatic speech recognition Speech processing systems
67	Spectrogram generation with a minicomputer and a graphics terminal Sauder, Ronald Dale January 2010 (has links) Typescript, etc. / Digitized by Kansas Correctional Industries Speech processing systems Optical data processing
68	Some analyses of the speech of hearing-impaired speakers using digital signal processing techniques Briery, Debra Jane January 2011 (has links) Digitized by Kansas Correctional Industries Deaf--Means of communication Speech processing systems
69	Spoken language identification with prosodic features. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2011 (has links) The PAM-based prosodic LID system is compared with other prosodic LID systems with a task of pairwise language identification. The advantages of comprehensive modeling of prosodic features is clearly demonstrated. Analysis reveals the confusion patterns among target languages, as well as the feature-language relationship. The PAM-based prosodic LID system is combined with a state-of-the-art phonotactic system by score-level fusion. Complementary effects are demonstrated between the two different features in the LID problem. An additional operation on score calibration, which further improves the LID system performance, is also introduced. / There are no conventional ways to model prosody. We use a large prosodic feature set which covers fundamental frequency (FO), duration and intensity. It also considers various extraction and normalization methods of each type of features. In terms of modeling, the vector space modeling approach is adopted. We introduce a framework called prosodic attribute model (PAM) to model the acoustic correlates of prosodic events in a flexible manner. Feature selection and preliminary LID tests are carried out to derive a preferred term-document matrix construction for modeling. / This thesis focuses on the use of prosodic features for automatic spoken language identification (LID). LID is the problem of automatically determining the language of spoken utterances. After three decades of research, the state-of-the-art LID systems seem to give a saturating performance. To meet the tight requirements on accuracy, prosody is proposed as alternative features to provide complementary information to LID. / Ng, Wai Man. / Adviser: Tan Lee. / Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 112-125). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Prosodic analysis (Linguistics) Speech processing systems
70	Speaker recognition using complementary information from vocal source and vocal tract. / CUHK electronic theses & dissertations collection January 2005 (has links) Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively. / For speaker verification, a text-dependent weighting scheme is developed. Analysis results show that the source-tract discrimination ratio varies significantly across different sounds due to the diversity of vocal system configurations in speech production. This thesis analyzes the source-tract speaker discrimination ratio for the 10 Cantonese digits, upon which a digit-dependent source-tract weighting scheme is developed. Information fusion with such digit-dependent weights relatively improves the verification performance by 39.6% in matched conditions. / This thesis investigates the feasibility of using both vocal source and vocal tract information to improve speaker recognition performance. Conventional speaker recognition systems typically employ vocal tract related acoustic features, e.g the Mel-frequency cepstral coefficients (MFCC), for discriminative purpose. Motivated by the physiological significance of the vocal source and vocal tract system in speech production, this thesis develops a speaker recognition system to effectively incorporate these two complementary information sources for improved performance and robustness. / This thesis presents a novel approach of representing the speaker-specific vocal source characteristics. The linear predictive (LP) residual signal is adopted as a good representative of the vocal source excitation, in which the speaker specific information resides on both time and frequency domains. Haar transform and wavelet transform are applied for multi-resolution analyses of the LP residual signal. The resulting vocal source features, namely the Haar octave coefficients of residues (HOCOR) and wavelet octave coefficients of residues (WOCOR), can effectively extract the speaker-specific spectro-temporal characteristics of the LP residual signal. Particularly, with pitch-synchronous wavelet transform, the WOCOR feature set is capable of capturing the pitch-related low frequency properties and the high frequency information associated with pitch epochs, as well as their temporal variations within a pitch period and over consecutive periods. The generated vocal source and vocal tract features are complementary to each other since they are derived from two orthogonal components, the LP residual signal and LP coefficients. Therefore they can be fused to provide better speaker recognition performance. A preliminary scheme of fusing MFCC and WOCOR together illustrated that the identification and verification performance can be respectively improved by 34.6% and 23.6%, both in matched conditions. / To maximize the benefit obtained through the fusion of source and tract information, speaker discrimination dependent fusion techniques have been developed. For speaker identification, a confidence measure, which indicates the reliability of vocal source feature in speaker identification, is derived based on the discrimination ratio between the source and tract features in each identification trial. Information fusion with confidence measure offers better weighted scores given by the two features and avoids possible errors introduced by incorporating source information, thereby improves the identification performance further. Compared with MFCC, relative improvement of 46.8% has been achieved. / Zheng Nengheng. / "November 2005." / Adviser: Pak-Chung Ching. / Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6647. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (p. 123-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Human-computer interaction Speech processing systems

Search results