Return to search

Useful Transcriptions of Webcast Lectures

Webcasts are an emerging technology enabled by the expanding availability and capacity of the World Wide Web. This has led to an increase in the number of lectures and academic presentations being broadcast over the Internet. Ideally, repositories of such webcasts would be used in the same manner as libraries: users could search for, retrieve, or browse through textual information. However, one major obstacle prevents webcast archives from becoming the digital equivalent of traditional libraries: information is mainly transmitted and stored in spoken form. Despite voice being currently present in all webcasts, users do not benefit from it beyond simple playback. My goal has been to exploit this information-rich resource and improve webcast users' experience in browsing and searching for specific information. I achieve this by combining research in Human-Computer Interaction and Automatic Speech Recognition that would ultimately see text transcripts of lectures being integrated into webcast archives.

In this dissertation, I show that the usefulness of automatically-generated transcripts of webcast lectures can be improved by speech recognition techniques specifically addressed at increasing the accuracy of webcast transcriptions, and the development of an interactive collaborative interface that facilitates users' contributions to machine-generated transcripts. I first investigate the user needs for transcription accuracy in webcast archives and show that users' performance and transcript quality perception is affected by the Word Error Rate (WER). A WER equal to or less than 25% is acceptable for use in webcast archives. As current Automatic Speech Recognition (ASR) systems can only deliver, in realistic lecture conditions, WERs of around 45-50%, I propose and evaluate a webcast system extension that engages users to collaborate in a wiki manner on editing imperfect ASR transcripts.

My research on ASR focuses on reducing the WER for lectures by making use of available external knowledge sources, such as documents on the World Wide Web and lecture slides, to better model the conversational and the topic-specific styles of lectures. I show that this approach results in relative WER reductions of 11%. Further ASR improvements are proposed that combine the research on language modelling with aspects of collaborative transcript editing. Extracting information about the most frequent ASR errors from user-edited partial transcripts, and attempting to correct such errors when they occur in the remaining transcripts, can lead to an additional 10 to 18% relative reduction in lecture WER.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/17804
Date25 September 2009
CreatorsMunteanu, Cosmin
ContributorsBaecker, Ronald, Penn, Gerald
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0023 seconds