Return to search

Engaging Speech UI's - How to address a speech recognition interface

Speech recognition has existed for a long time in various shapes, often used for recognizing commands, performing text-to-speech transcription or a mix of the two. This thesis investigates how the input affordances for such speech based interactions should be designed to enable intuitive engagement in a multimodal user interface. At the time of writing, current efforts in user interface design typically revolves around the established desktop metaphor where vision is the primary sense. Since speech recognition is based on the sense of hearing, previous work related to GUI design cannot be applied directly to a speech interface. Similar to how traditional GUI’s have evolved to embrace the desktop metaphor and matured into supporting modern touch based experiences, speech interaction needs to undergo a similar evolutionary process before designers can begin to understand its inherent characteristics and make informed assumptions about appropriate interaction mechanics. In order to investigate interface addressability and affordance accessibility, a prototype speech interface for a Windows 8 tablet PC was created. The prototype extended Windows 8’s modern touch optimized interface with speech interaction. The thesis’ outcome is based on a user centered evaluation of the aforementioned prototype. The outcome consists of additional knowledge surrounding foundational interaction mechanics regarding the matter of addressing and engaging a speech interface. These mechanics are important key aspects to consider when developing full featured speech recognition interfaces. This thesis aims to provide a first stepping stone towards understanding how speech interfaces should be designed. Additionally, the thesis’ has also investigated related interaction aspects such as required feedback and considerations when designing a multimodal user interface that includes touch and speech input methods. It has also been identified that a speech transcription or dictating interface needs more interaction mechanics than its inherent start and stop to become usable and useful.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:mau-20591
Date January 2014
CreatorsSöderberg, Hampus
PublisherMalmö högskola, Fakulteten för teknik och samhälle (TS), Malmö högskola/Teknik och samhälle
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0129 seconds