Global ETD Search

71	A robust low bit rate quad-band excitation LSP vocoder. January 1994 (has links) by Chiu Kim Ming. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 103-108). / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Speech production --- p.2 / Chapter 1.2 --- Low bit rate speech coding --- p.4 / Chapter Chapter 2 --- Speech analysis & synthesis --- p.8 / Chapter 2.1 --- Linear prediction of speech signal --- p.8 / Chapter 2.2 --- LPC vocoder --- p.11 / Chapter 2.2.1 --- Pitch and voiced/unvoiced decision --- p.11 / Chapter 2.2.2 --- Spectral envelope representation --- p.15 / Chapter 2.3 --- Excitation --- p.16 / Chapter 2.3.1 --- Regular pulse excitation and Multipulse excitation --- p.16 / Chapter 2.3.2 --- Coded excitation and vector sum excitation --- p.19 / Chapter 2.4 --- Multiband excitation --- p.22 / Chapter 2.5 --- Multiband excitation vocoder --- p.25 / Chapter Chapter 3 --- Dual-band and Quad-band excitation --- p.31 / Chapter 3.1 --- Dual-band excitation --- p.31 / Chapter 3.2 --- Quad-band excitation --- p.37 / Chapter 3.3 --- Parameters determination --- p.41 / Chapter 3.3.1 --- Pitch detection --- p.41 / Chapter 3.3.2 --- Voiced/unvoiced pattern generation --- p.43 / Chapter 3.4 --- Excitation generation --- p.47 / Chapter Chapter 4 --- A low bit rate Quad-Band Excitation LSP Vocoder --- p.51 / Chapter 4.1 --- Architecture of QBELSP vocoder --- p.51 / Chapter 4.2 --- Coding of excitation parameters --- p.58 / Chapter 4.2.1 --- Coding of pitch value --- p.58 / Chapter 4.2.2 --- Coding of voiced/unvoiced pattern --- p.60 / Chapter 4.3 --- Spectral envelope estimation and coding --- p.62 / Chapter 4.3.1 --- Spectral envelope & the gain value --- p.62 / Chapter 4.3.2 --- Line Spectral Pairs (LSP) --- p.63 / Chapter 4.3.3 --- Coding of LSP frequencies --- p.68 / Chapter 4.3.4 --- Coding of gain value --- p.77 / Chapter Chapter 5 --- Performance evaluation --- p.80 / Chapter 5.1 --- Spectral analysis --- p.80 / Chapter 5.2 --- Subjective listening test --- p.93 / Chapter 5.2.1 --- Mean Opinion Score (MOS) --- p.93 / Chapter 5.2.2 --- Diagnostic Rhyme Test (DRT) --- p.96 / Chapter Chapter 6 --- Conclusions and discussions --- p.99 / References --- p.103 / Appendix A Subroutine of pitch detection --- p.A-I - A-III / Appendix B Subroutine of voiced/unvoiced decision --- p.B-I - B-V / Appendix C Subroutine of LPC coefficients calculation using Durbin's recursive method --- p.C-I - C-II / Appendix D Subroutine of LSP calculation using Chebyshev Polynomials --- p.D-I - D-III / Appendix E Single syllable word pairs for Diagnostic Rhyme Test --- p.E-I Vocoder Data Compression (Telecommunication) Speech processing systems
72	An automatic speaker recognition system. January 1989 (has links) by Yu Chun Kei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1989. / Bibliography: leaves 86-88. Automatic speech recognition Speech processing systems
73	Spectrogram generation with a minicomputer and a graphics terminal Sauder, Ronald Dale January 2010 (has links) Typescript, etc. / Digitized by Kansas Correctional Industries Speech processing systems Optical data processing
74	Some analyses of the speech of hearing-impaired speakers using digital signal processing techniques Briery, Debra Jane January 2011 (has links) Digitized by Kansas Correctional Industries Deaf--Means of communication Speech processing systems
75	Spoken language identification with prosodic features. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2011 (has links) The PAM-based prosodic LID system is compared with other prosodic LID systems with a task of pairwise language identification. The advantages of comprehensive modeling of prosodic features is clearly demonstrated. Analysis reveals the confusion patterns among target languages, as well as the feature-language relationship. The PAM-based prosodic LID system is combined with a state-of-the-art phonotactic system by score-level fusion. Complementary effects are demonstrated between the two different features in the LID problem. An additional operation on score calibration, which further improves the LID system performance, is also introduced. / There are no conventional ways to model prosody. We use a large prosodic feature set which covers fundamental frequency (FO), duration and intensity. It also considers various extraction and normalization methods of each type of features. In terms of modeling, the vector space modeling approach is adopted. We introduce a framework called prosodic attribute model (PAM) to model the acoustic correlates of prosodic events in a flexible manner. Feature selection and preliminary LID tests are carried out to derive a preferred term-document matrix construction for modeling. / This thesis focuses on the use of prosodic features for automatic spoken language identification (LID). LID is the problem of automatically determining the language of spoken utterances. After three decades of research, the state-of-the-art LID systems seem to give a saturating performance. To meet the tight requirements on accuracy, prosody is proposed as alternative features to provide complementary information to LID. / Ng, Wai Man. / Adviser: Tan Lee. / Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 112-125). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Prosodic analysis (Linguistics) Speech processing systems
76	Speaker recognition using complementary information from vocal source and vocal tract. / CUHK electronic theses & dissertations collection January 2005 (has links) Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively. / For speaker verification, a text-dependent weighting scheme is developed. Analysis results show that the source-tract discrimination ratio varies significantly across different sounds due to the diversity of vocal system configurations in speech production. This thesis analyzes the source-tract speaker discrimination ratio for the 10 Cantonese digits, upon which a digit-dependent source-tract weighting scheme is developed. Information fusion with such digit-dependent weights relatively improves the verification performance by 39.6% in matched conditions. / This thesis investigates the feasibility of using both vocal source and vocal tract information to improve speaker recognition performance. Conventional speaker recognition systems typically employ vocal tract related acoustic features, e.g the Mel-frequency cepstral coefficients (MFCC), for discriminative purpose. Motivated by the physiological significance of the vocal source and vocal tract system in speech production, this thesis develops a speaker recognition system to effectively incorporate these two complementary information sources for improved performance and robustness. / This thesis presents a novel approach of representing the speaker-specific vocal source characteristics. The linear predictive (LP) residual signal is adopted as a good representative of the vocal source excitation, in which the speaker specific information resides on both time and frequency domains. Haar transform and wavelet transform are applied for multi-resolution analyses of the LP residual signal. The resulting vocal source features, namely the Haar octave coefficients of residues (HOCOR) and wavelet octave coefficients of residues (WOCOR), can effectively extract the speaker-specific spectro-temporal characteristics of the LP residual signal. Particularly, with pitch-synchronous wavelet transform, the WOCOR feature set is capable of capturing the pitch-related low frequency properties and the high frequency information associated with pitch epochs, as well as their temporal variations within a pitch period and over consecutive periods. The generated vocal source and vocal tract features are complementary to each other since they are derived from two orthogonal components, the LP residual signal and LP coefficients. Therefore they can be fused to provide better speaker recognition performance. A preliminary scheme of fusing MFCC and WOCOR together illustrated that the identification and verification performance can be respectively improved by 34.6% and 23.6%, both in matched conditions. / To maximize the benefit obtained through the fusion of source and tract information, speaker discrimination dependent fusion techniques have been developed. For speaker identification, a confidence measure, which indicates the reliability of vocal source feature in speaker identification, is derived based on the discrimination ratio between the source and tract features in each identification trial. Information fusion with confidence measure offers better weighted scores given by the two features and avoids possible errors introduced by incorporating source information, thereby improves the identification performance further. Compared with MFCC, relative improvement of 46.8% has been achieved. / Zheng Nengheng. / "November 2005." / Adviser: Pak-Chung Ching. / Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6647. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (p. 123-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Human-computer interaction Speech processing systems
77	Image processing methods to segment speech spectrograms for word level recognition Al-Darkazali, Mohammed January 2017 (has links) The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone. 620 TK7882.S65 Speech processing systems
78	Speech signal analysis. January 1997 (has links) by Bill, Kan Shek Chow. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliograhical references (leaves 39-40). / Chapter Chapter 1. --- Introduction --- p.1 / Chapter Chapter 2. --- The spectrogram --- p.4 / Chapter 2.1 --- Speech signal background --- p.4 / Chapter 2.2 --- Windowed Fourier transform --- p.4 / Chapter 2.3 --- Kernel function --- p.6 / Chapter 2.4 --- Spectrum analysis --- p.7 / Chapter 2.5 --- Spectrogram --- p.9 / Chapter 2.6 --- Reducing dimension of the spectrogram 一 Filter banks --- p.12 / Chapter 2.7 --- Recent experiment on filter banks --- p.12 / Chapter Chapter 3. --- Spectrogram compression --- p.15 / Chapter 3.1 --- Capturing the movement of the spectrum along time --- p.16 / Chapter 3.2 --- Informative statistics ´ؤ peak distance --- p.18 / Chapter 3.3 --- Estimated spectrogram --- p.21 / Chapter 3.4 --- Relationship between spectrogram and the speech signal --- p.22 / Chapter Chapter 4. --- The phase problem --- p.27 / Chapter 4.1 --- The role of the Fourier phase --- p.27 / Chapter 4.2 --- Iteration scheme --- p.27 / Chapter 4.3 --- Smoothing on the noise ´ؤ interpolation --- p.34 / Chapter Chapter 5. --- Conclusion and further discussion --- p.37 / Chapter 5.1 --- Conclusion --- p.37 / Chapter 5.2 --- Further discussion --- p.38 / References --- p.39 Speech processing systems Signal processing--Digital techniques
79	Speaker adaptation in joint factor analysis based text independent speaker verification Shou-Chun, Yin, 1980- January 2006 (has links) No description available. Automatic speech recognition. Speech processing systems.
80	Speech analysis techniques useful for low or variable bit rate coding Kim, Hyun Soo, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2005 (has links) We investigate, improve and develop speech analysis techniques which can be used to enhance various speech processing systems, especially low bit rate or variable bit rate coding of speech. The coding technique based on the sinusoidal representation of speech is investigated and implemented. Based on this study of the sinusoidal model of speech, improved analysis techniques to determine voicing, pitch and spectral estimation are developed, as well as noise reduction technique. We investigate the properties and limitations of the spectral envelope estimation vocoder (SEEVOC). We generalize, optimize and improve the SEEVOC and also compare it with LP in the presence of noise. The properties and applications of morphological filters for speech analysis are investigated. We introduce and investigate a novel nonlinear spectral envelope estimation method based on morphological operations, which is found to be very robust against noise. This method is also compared with the SEEVOC method. A simple method for the optimum selection of the structuring set size without using prior pitch information is proposed for many purposes. The morphological approach is then used for a new pitch estimation method and for the general sinusoidal analysis of speech or audio. Many of the new methods are based on a novel systematic analysis of the peak features of signals, including the study of higher order peaks. We propose a novel peak feature algorithm, which measure the peak characteristics of speech signal in time domain, to be used for end point detection and segmentation of speech. This nonparametric algorithm is flexible, efficient and very robust in noise. Several simple voicing measures are proposed and used in a new speech classifier. The harmonic-plus-noise decomposition technique is improved and extended to give an alternative to the methods used in the sinusoidal analysis method. Its applications to pitch estimation, speech classification and noise reduction are investigated. Speech processing systems Vocoder Coding theory

Search results