Global ETD Search

Return to search

Deriving and exploiting situational information in speech : investigations in a simulated search and rescue scenario

The need for automatic recognition and understanding of speech is emerging in tasks involving the processing of large volumes of natural conversations. In application domains such as Search and Rescue, exploiting automated systems for extracting mission-critical information from speech communications has the potential to make a real difference. Spoken language understanding has commonly been approached by identifying units of meaning (such as sentences, named entities, and dialogue acts) for providing a basis for further discourse analysis. However, this fine-grained identification of fundamental units of meaning is sensitive to high error rates in the automatic transcription of noisy speech. This thesis demonstrates that topic segmentation and identification techniques can be employed for information extraction from spoken conversations by being robust to such errors. Two novel topic-based approaches are presented for extracting situational information within the search and rescue context. The first approach shows that identifying the changes in the context and content of first responders' report over time can provide an estimation of their location. The second approach presents a speech-based topological map estimation technique that is inspired, in part, by automatic mapping algorithms commonly used in robotics. The proposed approaches are evaluated on a goal-oriented conversational speech corpus, which has been designed and collected based on an abstract communication model between a first responder and a task leader during a search process. Results have confirmed that a highly imperfect transcription of noisy speech has limited impact on the information extraction performance compared with that obtained on the transcription of clean speech data. This thesis also shows that speech recognition accuracy can benefit from rescoring its initial transcription hypotheses based on the derived high-level location information. A new two-pass speech decoding architecture is presented. In this architecture, the location estimation from a first decoding pass is used to dynamically adapt a general language model which is used for rescoring the initial recognition hypotheses. This decoding strategy has resulted in a statistically significant gain in the recognition accuracy of the spoken conversations in high background noise. It is concluded that the techniques developed in this thesis can be extended to more application domains that deal with large volumes of natural spoken conversations.

http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.707126

006.4

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:707126
Date	January 2017
Creators	Mokaram Ghotoorlar, Saeid
Contributors	Moore, Roger K. ; Barker, Jon
Publisher	University of Sheffield
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://etheses.whiterose.ac.uk/16769/

Page generated in 0.0018 seconds

Deriving and exploiting situational information in speech : investigations in a simulated search and rescue scenario

Description

Links & Downloads

Tags

Additional Fields