Return to search

Embedded speech recognition systems

Apart from recognition accuracy, decoding speed and vocabulary size, another point of consideration when developing a practical ASR application is the adaptability of the system. An ASR system is more useful if it can cope with changes that are introduced by users, for example, new words and new grammar rules. In addition, the system can also automatically update the underlying knowledge sources, such as language model probabilities, for better recognition accuracy. Since the knowledge sources need to be adaptable, it is in°exible to statically combine them. It is because on-line modi¯cation becomes di±cult once all the knowledge sources have been combined into one static search space. The second objective of the thesis is to develop an algorithm which allows dynamic integration of knowledge sources during decoding. In this approach, each knowledge source is represented by a weighted ¯nite state transducer (WFST). The knowledge source that is subject to adaptation is factorized from the entire search space. The adapted knowledge source is then combined with the others during decoding. In this thesis, we propose a generalized dynamic WFST composition algorithm, which avoids the creation of non- coaccessible paths, performs weight look-ahead and does not impose any constraints to the topology of the WFSTs. Experimental results on Wall Street Journal (WSJ1) 20k- word trigram task show that our proposed approach has a better word accuracy versus real-time factor characteristics than other dynamic composition approaches.

Identiferoai:union.ndltd.org:ADTP/275445
Date January 2008
CreatorsCheng, Octavian
PublisherResearchSpace@Auckland
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
RightsItems in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated., http://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm, Copyright: The author

Page generated in 0.0014 seconds