• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 256
  • 47
  • 25
  • 21
  • 16
  • 16
  • 16
  • 16
  • 16
  • 16
  • 12
  • 11
  • 6
  • 2
  • 2
  • Tagged with
  • 442
  • 442
  • 322
  • 144
  • 120
  • 79
  • 79
  • 69
  • 53
  • 43
  • 42
  • 41
  • 40
  • 39
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
251

Estimation of the vocal tract shape from the acoustic waveform.

Paul, Douglas Baker January 1976 (has links)
Thesis. 1976. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / Microfiche copy available in Archives and Engineering. / Vita. / Bibliography: leaves 138-140. / Ph.D.
252

Non-uniform time-scale modification of speech

Holtzman Dantus, Samuel January 1980 (has links)
Thesis (Elec.E)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Bibliography: leaves 173-175. / by Samuel Holtzman Dantus. / Elec.E
253

The effect of amplitude compression on the intelligibility of speech for persons with sensorineural hearing loss.

Lippmann, Richard Paul January 1978 (has links)
Thesis. 1978. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Includes bibliographies. / Ph.D.
254

Time-varying linear predictive coding of speech signals.

Hall, Mark Gilbert January 1977 (has links)
Thesis. 1977. M.S.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Includes bibliographical references. / M.S.
255

Time-scale modification of speech based on short-time Fourier analysis.

Portnoff, Michael Rodney January 1978 (has links)
Thesis. 1978. Sc.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Vita. / Bibliography: p. 142-145. / Sc.D.
256

Acoustic characteristics and intelligibility of clear and conversational speech at the segmental level

Chen, Francine Robina January 1980 (has links)
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1980. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Bibliography: leaves 116-117. / by Francine Robina Chen. / M.S.
257

Robust methods for Chinese spoken document retrieval.

January 2003 (has links)
Hui Pui Yu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 158-169). / Abstracts in English and Chinese. / Abstract --- p.2 / Acknowledgements --- p.6 / Chapter 1 --- Introduction --- p.23 / Chapter 1.1 --- Spoken Document Retrieval --- p.24 / Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28 / Chapter 1.3 --- Motivation --- p.33 / Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34 / Chapter 1.4 --- Goals --- p.34 / Chapter 1.5 --- Thesis Organization --- p.35 / Chapter 2 --- Multimedia Repository --- p.37 / Chapter 2.1 --- The Cantonese Corpus --- p.37 / Chapter 2.1.1 --- The RealMedia´ёØCollection --- p.39 / Chapter 2.1.2 --- The MPEG-1 Collection --- p.40 / Chapter 2.2 --- The Multimedia Markup Language --- p.42 / Chapter 2.3 --- Chapter Summary --- p.44 / Chapter 3 --- Monolingual Retrieval Task --- p.45 / Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45 / Chapter 3.2 --- Automatic Speech Transcription --- p.46 / Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47 / Chapter 3.2.2 --- Indexing Units --- p.48 / Chapter 3.3 --- Known-Item Retrieval Task --- p.49 / Chapter 3.3.1 --- Evaluation ´ؤ Average Inverse Rank --- p.50 / Chapter 3.4 --- Retrieval Model --- p.51 / Chapter 3.5 --- Experimental Results --- p.52 / Chapter 3.6 --- Chapter Summary --- p.53 / Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55 / Chapter 4.1 --- Video-based Segmentation --- p.56 / Chapter 4.1.1 --- Metric Computation --- p.57 / Chapter 4.1.2 --- Shot Boundary Detection --- p.58 / Chapter 4.1.3 --- Shot Transition Detection --- p.67 / Chapter 4.2 --- Audio-based Segmentation --- p.69 / Chapter 4.2.1 --- Gaussian Mixture Models --- p.69 / Chapter 4.2.2 --- Transition Detection --- p.70 / Chapter 4.3 --- Performance Evaluation --- p.72 / Chapter 4.3.1 --- Automatic Story Segmentation --- p.72 / Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73 / Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74 / Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75 / Chapter 4.5 --- Retrieval Performance --- p.76 / Chapter 4.6 --- Chapter Summary --- p.78 / Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79 / Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81 / Chapter 5.1.1 --- Annotations from MmML --- p.81 / Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83 / Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84 / Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84 / Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87 / Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90 / Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90 / Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92 / Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92 / Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93 / Chapter 5.4 --- Chapter Summary --- p.94 / Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97 / Chapter 6.1 --- The TDT-2 Corpus --- p.99 / Chapter 6.1.1 --- English Textual Queries --- p.100 / Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101 / Chapter 6.2 --- Query Processing --- p.101 / Chapter 6.2.1 --- Query Weighting --- p.101 / Chapter 6.2.2 --- Bigram Formation --- p.102 / Chapter 6.3 --- Cross-language Retrieval Task --- p.103 / Chapter 6.3.1 --- Indexing Units --- p.104 / Chapter 6.3.2 --- Retrieval Model --- p.104 / Chapter 6.3.3 --- Performance Measure --- p.105 / Chapter 6.4 --- Relevance Feedback --- p.106 / Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107 / Chapter 6.5 --- Retrieval Performance --- p.107 / Chapter 6.6 --- Chapter Summary --- p.109 / Chapter 7 --- Conclusions and Future Work --- p.111 / Chapter 7.1 --- Future Work --- p.114 / Chapter A --- XML Schema for Multimedia Markup Language --- p.117 / Chapter B --- Example of Multimedia Markup Language --- p.128 / Chapter C --- Significance Tests --- p.135 / Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135 / Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137 / Chapter C.3 --- Document Expansion with Reporter Speech --- p.137 / Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140 / Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140 / Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142 / Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145 / Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148 / Chapter F --- Parameters Estimation --- p.152 / Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152 / Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153 / Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153 / Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154 / Chapter G --- Abbreviations --- p.155 / Bibliography --- p.158
258

Using duration information in HMM-based automatic speech recognition.

January 2005 (has links)
Zhu Yu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 100-104). / Abstracts in English and Chinese. / Chapter CHAPTER 1 --- lNTRODUCTION --- p.1 / Chapter 1.1. --- Speech and its temporal structure --- p.1 / Chapter 1.2. --- Previous work on the modeling of temporal structure --- p.1 / Chapter 1.3. --- Integrating explicit duration modeling in HMM-based ASR system --- p.3 / Chapter 1.4. --- Thesis outline --- p.3 / Chapter CHAPTER 2 --- BACKGROUND --- p.5 / Chapter 2.1. --- Automatic speech recognition process --- p.5 / Chapter 2.2. --- HMM for ASR --- p.6 / Chapter 2.2.1. --- HMM for ASR --- p.6 / Chapter 2.2.2. --- HMM-based ASR system --- p.7 / Chapter 2.3. --- General approaches to explicit duration modeling --- p.12 / Chapter 2.3.1. --- Explicit duration modeling --- p.13 / Chapter 2.3.2. --- Training of duration model --- p.16 / Chapter 2.3.3. --- Incorporation of duration model in decoding --- p.18 / Chapter CHAPTER 3 --- CANTONESE CONNECTD-DlGlT RECOGNITION --- p.21 / Chapter 3.1. --- Cantonese connected digit recognition --- p.21 / Chapter 3.1.1. --- Phonetics of Cantonese and Cantonese digit --- p.21 / Chapter 3.2. --- The baseline system --- p.24 / Chapter 3.2.1. --- Speech corpus --- p.24 / Chapter 3.2.2. --- Feature extraction --- p.25 / Chapter 3.2.3. --- HMM models --- p.26 / Chapter 3.2.4. --- HMM decoding --- p.27 / Chapter 3.3. --- Baseline performance and error analysis --- p.27 / Chapter 3.3.1. --- Recognition performance --- p.27 / Chapter 3.3.2. --- Performance for different speaking rates --- p.28 / Chapter 3.3.3. --- Confusion matrix --- p.30 / Chapter CHAPTER 4 --- DURATION MODELING FOR CANTONESE DIGITS --- p.41 / Chapter 4.1. --- Duration features --- p.41 / Chapter 4.1.1. --- Absolute duration feature --- p.41 / Chapter 4.1.2. --- Relative duration feature --- p.44 / Chapter 4.2. --- Parametric distribution for duration modeling --- p.47 / Chapter 4.3. --- Estimation of the model parameters --- p.51 / Chapter 4.4. --- Speaking-rate-dependent duration model --- p.52 / Chapter CHAPTER 5 --- USING DURATION MODELING FOR CANTONSE DIGIT RECOGNITION --- p.57 / Chapter 5.1. --- Baseline decoder --- p.57 / Chapter 5.2. --- Incorporation of state-level duration model --- p.59 / Chapter 5.3. --- Incorporation word-level duration model --- p.62 / Chapter 5.4. --- Weighted use of duration model --- p.65 / Chapter CHAPTER 6 --- EXPERIMENT RESULT AND ANALYSIS --- p.66 / Chapter 6.1. --- Experiments with speaking-rate-independent duration models --- p.66 / Chapter 6.1.1. --- Discussion --- p.68 / Chapter 6.1.2. --- Analysis of the error patterns --- p.71 / Chapter 6.1.3. --- "Reduction of deletion, substitution and insertion" --- p.72 / Chapter 6.1.4. --- Recognition performance at different speaking rates --- p.75 / Chapter 6.2. --- Experiments with speaking-rate-dependent duration models --- p.77 / Chapter 6.2.1. --- Using true speaking rate --- p.77 / Chapter 6.2.2. --- Using estimated speaking rate --- p.79 / Chapter 6.3. --- Evaluation on another speech database --- p.80 / Chapter 6.3.1. --- Experimental setup --- p.80 / Chapter 6.3.2. --- Experiment results and analysis --- p.82 / Chapter CHAPTER 7 --- CONCLUSIONS AND FUTUR WORK --- p.87 / Chapter 7.1. --- Conclusion and understanding of current work --- p.87 / Chapter 7.2. --- Future work --- p.89 / Chapter A --- APPENDIX --- p.90 / BIBLIOGRAPHY --- p.100
259

Model-based classification of speech audio

Unknown Date (has links)
This work explores the process of model-based classification of speech audio signals using low-level feature vectors. The process of extracting low-level features from audio signals is described along with a discussion of established techniques for training and testing mixture model-based classifiers and using these models in conjunction with feature selection algorithms to select optimal feature subsets. The results of a number of classification experiments using a publicly available speech database, the Berlin Database of Emotional Speech, are presented. This includes experiments in optimizing feature extraction parameters and comparing different feature selection results from over 700 candidate feature vectors for the tasks of classifying speaker gender, identity, and emotion. In the experiments, final classification accuracies of 99.5%, 98.0% and 79% were achieved for the gender, identity and emotion tasks respectively. / by Chris Thoman. / Thesis (M.S.C.S.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
260

An error detection and correction framework to improve large vocabulary continuous speech recognition. / CUHK electronic theses & dissertations collection

January 2009 (has links)
In addition to the ED-EC framework, this thesis proposes a discriminative lattice rescoring (DLR) algorithm to facilitate the investigation of the extensibility of the framework. The DLR method recasts a discriminative n-gram model as a pseudo-conventional n-gram model and then uses this recast model to perform lattice rescoring. DLR improves the efficiency of discriminative n-gram modeling and facilitates combined processing of discriminative n-gram modeling with other post-processing techniques such as the ED-EC framework. / This thesis proposes an error detection and correction (ED-EC) framework to incorporate advanced linguistic knowledge sources into large vocabulary continuous speech recognition. Previous efforts that apply sophisticated language models (LMs) in speech recognition normally face a serious efficiency problem due to the intense computation required by these models. The ED-EC framework aims to achieve the full benefit of complex linguistic sources while at the same time maximize efficiency. The framework attempts to only apply computationally expensive LMs where needed in input speech. First, the framework detects recognition errors in the output of an efficient state-of-the-art decoding procedure. Then, it corrects the detected errors with the aid of sophisticated LMs by (1) creating alternatives for each detected error and (2) applying advanced models to distinguish among the alternatives. In this thesis, we implement a prototype of the ED-EC framework on the task of Mandarin dictation. This prototype detects recognition errors based on generalized word posterior probabilities, selects alternatives for errors from recognition lattices generated during decoding and adopts an advanced LM that combines mutual information, word trigrams and POS trigrams. The experimental results indicate the practical feasibility of the ED-EC framework, for which the optimal gain of the focused LM is theoretically achievable at low computational cost. On a general-domain test set, a 6.0% relative reduction in character error rate (CER) over the performance of a state-of-the-art baseline recognizer is obtained. In terms of efficiency, while both the detection of errors and the creation of alternatives are efficient, the application of the computationally expensive LM is concentrated on less than 50% of the utterances. We further demonstrate that the potential benefit of using the ED-EC framework in improving the recognition performance is tremendous. If error detection is perfect and alternatives for an error are guaranteed to include the correct one, the relative CER reduction over the baseline performance will increase to 36.0%. We also illustrate that the ED-EC framework is robust on unseen data and can be conveniently extended to other recognition systems. / Zhou, Zhengyu. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 72-11, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 142-155). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.

Page generated in 0.1317 seconds