Global ETD Search

Return to search

Effects of Transcription Errors on Supervised Learning in Speech Recognition

Supervised learning using Hidden Markov Models has been used to train acoustic models for automatic speech recognition for several years. Typically clean transcriptions form the basis for this training regimen. However, results have shown that using sources of readily available transcriptions, which can be erroneous at times (e.g., closed captions) do not degrade the performance significantly. This work analyzes the effects of mislabeled data on recognition accuracy. For this purpose, the training is performed using manually corrupted training data and the results are observed on three different databases: TIDigits, Alphadigits and SwitchBoard. For Alphadigits, with 16% of data mislabeled, the performance of the system degrades by 12% relative to the baseline results. For a complex task like SWITCHBOARD, at 16% mislabeled training data, the performance of the system degrades by 8.5% relative to the baseline results. The training process is more robust to mislabeled data because the Gaussian mixtures that are used to model the underlying distribution tend to cluster around the majority of the correct data. The outliers (incorrect data) do not contribute significantly to the reestimation process.

Baum Welch

SwitchBoard

HMM

Identifer	oai:union.ndltd.org:MSSTATE/oai:scholarsjunction.msstate.edu:td-2814
Date	13 December 2003
Creators	Sundaram, Ramasubramanian H
Publisher	Scholars Junction
Source Sets	Mississippi State University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations

Page generated in 0.0025 seconds

Effects of Transcription Errors on Supervised Learning in Speech Recognition

Description

Links & Downloads

Tags

Additional Fields