• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 211
  • 30
  • 25
  • 16
  • 16
  • 16
  • 16
  • 16
  • 16
  • 12
  • 6
  • 1
  • 1
  • Tagged with
  • 317
  • 317
  • 317
  • 107
  • 102
  • 60
  • 57
  • 48
  • 35
  • 35
  • 35
  • 31
  • 30
  • 29
  • 26
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Image processing methods to segment speech spectrograms for word level recognition

Al-Darkazali, Mohammed January 2017 (has links)
The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone.
72

Speech signal analysis.

January 1997 (has links)
by Bill, Kan Shek Chow. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliograhical references (leaves 39-40). / Chapter Chapter 1. --- Introduction --- p.1 / Chapter Chapter 2. --- The spectrogram --- p.4 / Chapter 2.1 --- Speech signal background --- p.4 / Chapter 2.2 --- Windowed Fourier transform --- p.4 / Chapter 2.3 --- Kernel function --- p.6 / Chapter 2.4 --- Spectrum analysis --- p.7 / Chapter 2.5 --- Spectrogram --- p.9 / Chapter 2.6 --- Reducing dimension of the spectrogram 一 Filter banks --- p.12 / Chapter 2.7 --- Recent experiment on filter banks --- p.12 / Chapter Chapter 3. --- Spectrogram compression --- p.15 / Chapter 3.1 --- Capturing the movement of the spectrum along time --- p.16 / Chapter 3.2 --- Informative statistics ´ؤ peak distance --- p.18 / Chapter 3.3 --- Estimated spectrogram --- p.21 / Chapter 3.4 --- Relationship between spectrogram and the speech signal --- p.22 / Chapter Chapter 4. --- The phase problem --- p.27 / Chapter 4.1 --- The role of the Fourier phase --- p.27 / Chapter 4.2 --- Iteration scheme --- p.27 / Chapter 4.3 --- Smoothing on the noise ´ؤ interpolation --- p.34 / Chapter Chapter 5. --- Conclusion and further discussion --- p.37 / Chapter 5.1 --- Conclusion --- p.37 / Chapter 5.2 --- Further discussion --- p.38 / References --- p.39
73

Speaker adaptation in joint factor analysis based text independent speaker verification

Shou-Chun, Yin, 1980- January 2006 (has links)
No description available.
74

Speech analysis techniques useful for low or variable bit rate coding

Kim, Hyun Soo, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2005 (has links)
We investigate, improve and develop speech analysis techniques which can be used to enhance various speech processing systems, especially low bit rate or variable bit rate coding of speech. The coding technique based on the sinusoidal representation of speech is investigated and implemented. Based on this study of the sinusoidal model of speech, improved analysis techniques to determine voicing, pitch and spectral estimation are developed, as well as noise reduction technique. We investigate the properties and limitations of the spectral envelope estimation vocoder (SEEVOC). We generalize, optimize and improve the SEEVOC and also compare it with LP in the presence of noise. The properties and applications of morphological filters for speech analysis are investigated. We introduce and investigate a novel nonlinear spectral envelope estimation method based on morphological operations, which is found to be very robust against noise. This method is also compared with the SEEVOC method. A simple method for the optimum selection of the structuring set size without using prior pitch information is proposed for many purposes. The morphological approach is then used for a new pitch estimation method and for the general sinusoidal analysis of speech or audio. Many of the new methods are based on a novel systematic analysis of the peak features of signals, including the study of higher order peaks. We propose a novel peak feature algorithm, which measure the peak characteristics of speech signal in time domain, to be used for end point detection and segmentation of speech. This nonparametric algorithm is flexible, efficient and very robust in noise. Several simple voicing measures are proposed and used in a new speech classifier. The harmonic-plus-noise decomposition technique is improved and extended to give an alternative to the methods used in the sinusoidal analysis method. Its applications to pitch estimation, speech classification and noise reduction are investigated.
75

Effects of noise type on speech understanding

Ng, H. N., Elaine. January 2006 (has links)
Thesis (M. Sc.)--University of Hong Kong, 2006. / Title proper from title frame. Also available in printed format.
76

An analysis-by-synthesis approach to sinusoidal modeling applied to speech and music signal processing

George, E. Bryan 12 1900 (has links)
No description available.
77

A study of convex optimization for discriminative training of hidden Markov models in automatic speech recognition /

Yin, Yan. January 2008 (has links)
Thesis (M.Sc.)--York University, 2008. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 101-109). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR45978
78

On the use of frame and segment-based methods for the detection and classification of speech sounds and features

Hou, Jun, January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Electrical and Computer Engineering." Includes bibliographical references (p. 121-126).
79

Improved polynomial segment model for speech recognition /

Li, Chak Fai. January 2004 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004. / Includes bibliographical references (leaves 80-84). Also available in electronic version. Access restricted to campus users.
80

Maximum likelihood normalization for robust speech recognition /

Lai, Yiu Pong. January 2003 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003. / Includes bibliographical references (leaves 98-103). Also available in electronic version. Access restricted to campus users.

Page generated in 0.0891 seconds