This work systematically investigates when and how combining speech output and visual text may facilitate processing and comprehension of sentences. It is proposed that a redundant multimodal presentation of speech and text has the potential for improving sentence processing but also for severely disrupting it. The effectiveness of the presentation is assumed to depend on the linguistic complexity of the sentence, the memory demands incurred by the selected multimodal configuration and the characteristics of the user. The thesis employs both theoretical and empirical methods to examine this claim. At the theoretical front, the research makes explicit features of multimodal sentence presentation and of structures and processes involved in multimodal language processing. Two entities are presented: a multimodal design space (MMDS) and a multimodal user model (MMUM). The dimensions of the MMDS include aspects of (i) the sentence (linguistic complexity, c.f., Gibson, 1991), (ii) the presentation (configurations of media), and (iii) user cost (a function of the first two dimensions). The second entity, the MMUM, is a cognitive model of the user. The MMUM attempts to characterise the cognitive structures and processes underlying multimodal language processing, including the supervisory attentional mechanisms that coordinate the processing of language in parallel modalities. The model includes an account of individual differences in verbal working memory (WM) capacity (c.f. Just and Carpenter, 1992) and can predict the variation in the cognitive cost experienced by the user when presented with different contents in a variety of multimodal configurations. The work attempts to validate through 3 controlled studies with users the central propositions of the MMUM. The experimental findings indicate the validity of some features of the MMUM but also the need for further refinement. Overall, they suggest that a durable text may reduce the processing cost of demanding sentences delivered by speech, whereas adding speech to such sentences when presented visually increases processing cost. Speech can be added to various visual forms of text only if the linguistic complexity of the sentence imposes a low to moderate load on the user. These conclusions are translated to a set of guidelines for effective multimodal presentation of sentences. A final study then examines the validity of some of these guidelines in an applied setting. Results highlight the need for an enhanced experimental control. However, they also demonstrate that the approach used in this research can validate specific assumptions regarding the relationship between cognitive cost, sentence complexity and multimodal configuration aspects and thereby to inform the design process of effective multimodal user interfaces.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:419482 |
Date | January 2005 |
Creators | Shmueli, Yael |
Publisher | University College London (University of London) |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://discovery.ucl.ac.uk/1446688/ |
Page generated in 0.0019 seconds