Global ETD Search

Return to search

The selective use of gaze in automatic speech recognition

The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between the ASR training and recognition conditions, which leads to considerable performance degradation. To improve noise-robustness, exploiting prior knowledge of the acoustic noise in speech enhancement, feature extraction and recognition models are popular approaches. An alternative approach presented in this thesis is to introduce eye gaze as an extra modality. Eye gaze behaviours have roles in interaction and contain information about cognition and visual attention; not all behaviours are relevant to speech. Therefore, gaze behaviours are used selectively to improve ASR performance. This is achieved by inference procedures using noise-dependant models of gaze behaviours and their temporal and semantic relationship with speech. `Selective gaze-contingent ASR' systems are proposed and evaluated on a corpus of eye movement and related speech in different clean, noisy environments. The best performing systems utilise both acoustic and language model adaptation.

https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.607322

006.4

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:607322
Date	January 2014
Creators	Shen, Ao
Publisher	University of Birmingham
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://etheses.bham.ac.uk//id/eprint/5202/

Page generated in 0.002 seconds

The selective use of gaze in automatic speech recognition

Description

Links & Downloads

Tags

Additional Fields