• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 343
  • 40
  • 24
  • 14
  • 10
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • 4
  • 3
  • Tagged with
  • 506
  • 506
  • 506
  • 181
  • 125
  • 103
  • 89
  • 50
  • 49
  • 44
  • 41
  • 41
  • 40
  • 39
  • 39
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
321

Useful Transcriptions of Webcast Lectures

Munteanu, Cosmin 25 September 2009 (has links)
Webcasts are an emerging technology enabled by the expanding availability and capacity of the World Wide Web. This has led to an increase in the number of lectures and academic presentations being broadcast over the Internet. Ideally, repositories of such webcasts would be used in the same manner as libraries: users could search for, retrieve, or browse through textual information. However, one major obstacle prevents webcast archives from becoming the digital equivalent of traditional libraries: information is mainly transmitted and stored in spoken form. Despite voice being currently present in all webcasts, users do not benefit from it beyond simple playback. My goal has been to exploit this information-rich resource and improve webcast users' experience in browsing and searching for specific information. I achieve this by combining research in Human-Computer Interaction and Automatic Speech Recognition that would ultimately see text transcripts of lectures being integrated into webcast archives. In this dissertation, I show that the usefulness of automatically-generated transcripts of webcast lectures can be improved by speech recognition techniques specifically addressed at increasing the accuracy of webcast transcriptions, and the development of an interactive collaborative interface that facilitates users' contributions to machine-generated transcripts. I first investigate the user needs for transcription accuracy in webcast archives and show that users' performance and transcript quality perception is affected by the Word Error Rate (WER). A WER equal to or less than 25% is acceptable for use in webcast archives. As current Automatic Speech Recognition (ASR) systems can only deliver, in realistic lecture conditions, WERs of around 45-50%, I propose and evaluate a webcast system extension that engages users to collaborate in a wiki manner on editing imperfect ASR transcripts. My research on ASR focuses on reducing the WER for lectures by making use of available external knowledge sources, such as documents on the World Wide Web and lecture slides, to better model the conversational and the topic-specific styles of lectures. I show that this approach results in relative WER reductions of 11%. Further ASR improvements are proposed that combine the research on language modelling with aspects of collaborative transcript editing. Extracting information about the most frequent ASR errors from user-edited partial transcripts, and attempting to correct such errors when they occur in the remaining transcripts, can lead to an additional 10 to 18% relative reduction in lecture WER.
322

A generalization of the minimum classification error (MCE) training method for speech recognition and detection

Fu, Qiang 15 January 2008 (has links)
The model training algorithm is a critical component in the statistical pattern recognition approaches which are based on the Bayes decision theory. Conventional applications of the Bayes decision theory usually assume uniform error cost and result in a ubiquitous use of the maximum a posteriori (MAP) decision policy and the paradigm of distribution estimation as practice in the design of a statistical pattern recognition system. The minimum classification error (MCE) training method is proposed to overcome some substantial limitations for the conventional distribution estimation methods. In this thesis, three aspects of the MCE method are generalized. First, an optimal classifier/recognizer design framework is constructed, aiming at minimizing non-uniform error cost.A generalized training criterion named weighted MCE is proposed for pattern and speech recognition tasks with non-uniform error cost. Second, the MCE method for speech recognition tasks requires appropriate management of multiple recognition hypotheses for each data segment. A modified version of the MCE method with a new approach to selecting and organizing recognition hypotheses is proposed for continuous phoneme recognition. Third, the minimum verification error (MVE) method for detection-based automatic speech recognition (ASR) is studied. The MVE method can be viewed as a special version of the MCE method which aims at minimizing detection/verification errors. We present many experiments on pattern recognition and speech recognition tasks to justify the effectiveness of our generalizations.
323

Statistical language modelling for large vocabulary speech recognition

McGreevy, Michael January 2006 (has links)
The move towards larger vocabulary Automatic Speech Recognition (ASR) systems places greater demands on language models. In a large vocabulary system, acoustic confusion is greater, thus there is more reliance placed on the language model for disambiguation. In addition to this, ASR systems are increasingly being deployed in situations where the speaker is not conscious of their interaction with the system, such as in recorded meetings and surveillance scenarios. This results in more natural speech, which contains many false starts and disfluencies. In this thesis we investigate a novel approach to the modelling of speech corrections. We propose a syntactic model of speech corrections, and seek to determine if this model can improve on the performance of standard language modelling approaches when applied to conversational speech. We investigate a number of related variations to our basic approach and compare these approaches against the class-based N-gram. We also investigate the modelling of styles of speech. Specifically, we investigate whether the incorporation of prior knowledge about sentence types can improve the performance of language models. We propose a sentence mixture model based on word-class N-grams, in which the sentence mixture models and the word-class membership probabilities are jointly trained. We compare this approach with word-based sentence mixture models.
324

An analysis of blind signal separation for real time application

Smith, Daniel. January 2006 (has links)
Thesis (Ph.D.)--University of Wollongong, 2006. / Typescript. Includes bibliographical references: leaf 236-258.
325

An exploration of the impact of speech recognition technologies on group efficiency and effectiveness during an electronic idea generation scenario

Prince, Bradley Justin. Cegielski, Casey. January 2006 (has links) (PDF)
Dissertation (Ph.D.)--Auburn University, 2006. / Abstract. Includes bibliographic references.
326

Word hypothesis of phonetic strings using hidden Markov models /

Engbrecht, Jeffery W. January 1990 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 1990. / Includes bibliographical references (leaves 51-53).
327

Aspects of intonation and prosody in Bininj gun-wok : autosegmental-metrical analysis /

Bishop, Judith Bronwyn. January 2002 (has links)
Thesis (Ph.D.)--University of Melbourne, Dept. of Linguistics and Applied Linguistics, 2003. / Typescript (photocopy). Includes bibliographical references (leaves 439-476).
328

FRIC : an expert system to recognize fricatives /

Atkinson, Karen A. January 1987 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 1987. / Typescript. Includes bibliographical references (leaves 54-56).
329

Prototype de système de reconnaissance de parole par réseau de neurones utilisant une analyse par démodulation /

Garcia, Miguel, January 1997 (has links)
Mémoire (M.Eng.)--Université du Québec à Chicoutimi, 1997. / Document électronique également accessible en format PDF. CaQCU
330

Discrimination parole/musique et étude de nouveaux paramètres et modèles pour un système d'identification du locuteur dans le contexte de conférences téléphoniques /

Ezzaidi, Hassan, January 2002 (has links)
Thèse (D.Eng.) -- Université du Québec à Chicoutimi, 2002. / Bibliogr.: f. 113-125. Document électronique également accessible en format PDF. CaQCU

Page generated in 0.087 seconds