Return to search

An error detection and correction framework to improve large vocabulary continuous speech recognition. / CUHK electronic theses & dissertations collection

In addition to the ED-EC framework, this thesis proposes a discriminative lattice rescoring (DLR) algorithm to facilitate the investigation of the extensibility of the framework. The DLR method recasts a discriminative n-gram model as a pseudo-conventional n-gram model and then uses this recast model to perform lattice rescoring. DLR improves the efficiency of discriminative n-gram modeling and facilitates combined processing of discriminative n-gram modeling with other post-processing techniques such as the ED-EC framework. / This thesis proposes an error detection and correction (ED-EC) framework to incorporate advanced linguistic knowledge sources into large vocabulary continuous speech recognition. Previous efforts that apply sophisticated language models (LMs) in speech recognition normally face a serious efficiency problem due to the intense computation required by these models. The ED-EC framework aims to achieve the full benefit of complex linguistic sources while at the same time maximize efficiency. The framework attempts to only apply computationally expensive LMs where needed in input speech. First, the framework detects recognition errors in the output of an efficient state-of-the-art decoding procedure. Then, it corrects the detected errors with the aid of sophisticated LMs by (1) creating alternatives for each detected error and (2) applying advanced models to distinguish among the alternatives. In this thesis, we implement a prototype of the ED-EC framework on the task of Mandarin dictation. This prototype detects recognition errors based on generalized word posterior probabilities, selects alternatives for errors from recognition lattices generated during decoding and adopts an advanced LM that combines mutual information, word trigrams and POS trigrams. The experimental results indicate the practical feasibility of the ED-EC framework, for which the optimal gain of the focused LM is theoretically achievable at low computational cost. On a general-domain test set, a 6.0% relative reduction in character error rate (CER) over the performance of a state-of-the-art baseline recognizer is obtained. In terms of efficiency, while both the detection of errors and the creation of alternatives are efficient, the application of the computationally expensive LM is concentrated on less than 50% of the utterances. We further demonstrate that the potential benefit of using the ED-EC framework in improving the recognition performance is tremendous. If error detection is perfect and alternatives for an error are guaranteed to include the correct one, the relative CER reduction over the baseline performance will increase to 36.0%. We also illustrate that the ED-EC framework is robust on unseen data and can be conveniently extended to other recognition systems. / Zhou, Zhengyu. / Adviser: Helen Mei-Ling Meng. / Source: Dissertation Abstracts International, Volume: 72-11, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 142-155). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_344580
Date January 2009
ContributorsZhou, Zhengyu, Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, theses
Formatelectronic resource, microform, microfiche, 1 online resource (xiv, 155 leaves : ill. (some col.))
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0019 seconds