• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 256
  • 47
  • 25
  • 21
  • 16
  • 16
  • 16
  • 16
  • 16
  • 16
  • 12
  • 11
  • 6
  • 2
  • 2
  • Tagged with
  • 441
  • 441
  • 321
  • 143
  • 120
  • 78
  • 78
  • 69
  • 52
  • 43
  • 42
  • 41
  • 40
  • 38
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Deep Neural Network Approach for Single Channel Speech Enhancement Processing

Li, Dongfu January 2016 (has links)
Speech intelligibility represents how comprehensible a speech is. It is more important than speech quality in some applications. Single channel speech intelligibility enhancement is much more difficult than multi-channel intelligibility enhancement. It has recently been reported that training-based single channel speech intelligibility enhancement algorithms perform better than Signal to Noise Ratio (SNR) based algorithm. In this thesis, a training-based Deep Neural Network (DNN) is used to improve single channel speech intelligibility. To increase the performance of the DNN, the Multi-Resolution Cochlea Gram (MRCG) feature set is used as the input of the DNN. MATLAB objective test results show that the MRCG-DNN approach is more robust than a Gaussian Mixture Model (GMM) approach. The MRCG-DNN also works better than other DNN training algorithms. Various conditions such as different speakers, different noise conditions and reverberation were tested in the thesis.
132

Investigation of time-domain measurements for analysis and machine recognition of speech

Ito, Mabo Robert January 1971 (has links)
At present in speech analysis and mechanical speech recognition work, spectral measurements are the conventional form of signal representation and acoustical descriptions of speech sounds are usually given in terms of this form of representation. In this thesis, certain time-domain measurements are investigated as an alternative form of signal representation and as a basis for acoustical characterization of speech sounds. The primary measurements studied are the short-time averages of the zero-crossing rate of the acoustic waveform and the distribution patterns of the time intervals between zero-crossings. These measurements are found to be easy to implement with digital techniques and are implemented through digital computer simulation. Other advantages of these measurements include effectiveness in handling the large intensity range of speech sounds and ability to track rapid transient phenomena such as the release of unvoiced stops. Computer software for an interactive graphics facility was developed for acquisition, presentation, manipulation and analysis of the acoustic speech data. One of the pattern analysis programs, for the display of time-interval distribution data, yielded a visual presentation which could be compared to frequency spectrograms. Theoretical expressions are developed to relate the time-domain and spectral representation for some phone types and these relationships are compared with experimental results. The above theoretical expressions show that important spectral characterization features are accounted for. These findings, combined with empirical observation of the utility of the time-domain signal representation in phonetic characterization, indicates that this form of representation is a useful alternative to the spectral representation. The speech materials employed were selected to study temporal structures and contextual variations of acoustic properties and to provide quantitative data useful for word recognition applications. The vowels, fricatives and stops were the main phoneme classes studied. Quantitative data on the acoustic properties of the selected phonemes is presented and discussed in terms of i) our own spectral data, ii) other data reported in the literature and iii) simple production models. The time-domain signal representation was found to provide an effective means of analyzing and characterizing the acoustically complex stops and voiced fricatives. For the vowels and unvoiced fricatives, which are well suited to spectral analysis, the time domain measurements were found to yield very simple and direct characterization features. Some limited phonemic decomposition and machine recognition work is described which demonstrates the design of useful characterization features and provides a basis for further work. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
133

Development of tests and preprocessing algorithms for evaluation and improvement of speech recognition units

Wasmeier, Hans January 1986 (has links)
This study considered the evaluation of commercially available isolated word, speaker dependent, speech recognition units, and preprocessing techniques that may be used for improving their performance. The problem was considered in three separate stages. A series of tests were designed to exercise an isolated word, speaker dependent, speech recognition unit. These tests provided a sound basis for determining a given unit's strengths and weaknesses. This knowledge permits a more informed decision on the best recognition device for a given price range. As well, this knowledge may be used in the design of a robust vocabulary, and creation of guidelines for best performance. The test vocabularies were based on the forty English phonemes identified by Rabiner and Schafer [28] and the test variations were representative of common variations which may be expected in normal use. A digital archive system was implemented for storing the voice input of test subjects. This facility provided a data base for an investigation of preprocessing techniques. As well, it permits the testing of different speech recognition units with the same voice input, providing a platform for device comparison. Several speech preprocessing and performance improvement techniques were then investigated. Specifically, two types of time normalization, the enhancement of low energy phonemes and a change in training technique were investigated. These techniques permit a more accurate analysis of the failure mechanism of the speech recognition unit. They may also provide the basis for a speech preprocessor design which could be placed in front of a commercial speech recognition unit. A commercially available speech recognition unit, the NEC SR100, was used as a measure of the effectiveness of the tests and of the improvements. Results of the study indicated that the designed tests and the preprocessing & performance improvement techniques investigated were useful in identifying the speech recognition unit's weaknesses. Also, depending on the economics of implementation, it was found that preprocessing may provide a cost effective solution to some of the recognition unit's shortcomings. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
134

Speaker-independent access to a large lexicon

Mathan, Luc Stefan January 1987 (has links)
No description available.
135

Perceptual postfiltering for low bit rate speech coders

Chen, Wei, 1976- January 2007 (has links)
No description available.
136

An investigation of digital vocoders.

Trottier, Lorne Ira. January 1973 (has links)
No description available.
137

Speaker normalizing transforms in speech recogniton by computer

Sejnoha, Vladimir. January 1982 (has links)
No description available.
138

Speaker recognition using digit utterances

Scrimgeour, J. Michael. January 1984 (has links)
No description available.
139

Experiments on automatic phonetic segmentation and transcription of speech

Lennig, Matthew. January 1983 (has links)
No description available.
140

Pattern recognition of spoken words based on Haar functions /

Chi, Ben-chen January 1973 (has links)
No description available.

Page generated in 0.0672 seconds