Return to search

Realising affect-sensitive multimodal human-computer interface : hardware and software infrastructure

With the industry's recent paradigm shift from PC-centred applications to services delivered through ubiquitous computing in a more human centred manner, multimodal human-computer interfaces (MHCI) became an emerging research topic. As an important but often neglected aspect, the lack of appropriate system integration tools hinders the development of MHCI systems. Therefore, the work presented in this thesis aims at delivering hardware / software infrastructure to facilitate the full development cycle of MHCI systems. Specifically, we first built a hardware platform for synchronised, multimodal-data capturing to support and facilitate automatic human behaviour understanding from multiple audiovisual sensors. Then we developed a software framework, called the HCI^2 Framework, to facilitate the modular development and rapid prototyping of readily-applicable MHCI systems. As a proof of concept, we also present an affect-sensitive game with humanoid robot NAO developed using the HCI^2 Framework. Studies on automatic human behaviour understanding require high-bandwidth recording from multiple cameras, as well as from other sensors such as microphones and eye-gaze trackers. In addition, sensor fusion should be realised with high accuracy as to achieve tight synchronisation between sensors and, in turn, enable studies of correlation between various behavioural signals. Using commercial off-the-shelf components may compromise quality and accuracy due to several issues including handling the combined data rate from multiple sensors, unknown offset and rate discrepancies between independent hardware clocks, the absence of trigger inputs or -outputs in the hardware, as well as the existence of different methods for time-stamping the recorded data. To achieve accurate synchronisation, we centralise the synchronisation task by recording all trigger or timestamp signals with a multi-channel audio interface. For sensors not having an external trigger signal, we let the computer that captures the sensor data periodically generate timestamp signals from its serial port output. These signals can also be used as a common time base to synchronise multiple asynchronous audio interfaces. The resulted data recording platform, which is built upon two consumer-grade PCs, is capable of capturing 8-bit video data with 1024 x 1024 spatial- and 59.1 Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96 kHz and eye-gaze tracking result sampled at a frequency of 60 or 120 Hz. The attained synchronisation accuracy is unprecedented up to date. To facilitate rapid development of readily-applicable MHCI systems using algorithms designed to detect and track behavioural signals (e.g. face detector, facial fiducially points tracker, expression recogniser, etc.), a software integration framework is required. The proposed software framework, which is called the HCI^2 Framework, is built upon publish/subscribe (P/S) architecture. It implements a shared-memory-based data transport protocol for message delivery and a TCP-based system management protocol. The latter ensures that the integrity of system structure is maintained at runtime. With the inclusion of 'bridging modules', the HCI^2 Framework is interoperable with other software frameworks including Psyclone and ActiveMQ. In addition to the core communication middleware, we also present the integrated development environment (IDE) of the HCI^2 Framework. It provides a complete graphical environment to support every step in a typical MHCI system development process, including module development, debugging, packaging, and management, as well as the whole system management and testing. The quantitative evaluation indicates that our framework outperforms other similar tools in terms of average message latency and maximum data throughput under a typical single PC scenario. To demonstrate HCI^2 Framework's capabilities in integrating heterogeneous modules, we present several example modules working with a variety of hardware and software. We also present two use cases of the HCI^2 Framework: a computer game, called CamGame, based on hand-held marker(s) and low-cost camera(s) and the human affective signal analysis component of the Fun Robotic Outdoor Guide (FROG) project ( Using the HCI^2 Framework, we further developed the Mimic-Me Game, which consists of an interactive game played with the NAO humanoid robot. The game involves the robot 'mimicking' the player's facial expression using a combination of body gestures and audio cues. A multimodal dialogue model has been designed and implemented to enable the robot to interact with the human player in a naturalistic way using only natural language, head movement and facial expressions.
Date January 2014
CreatorsShen, Jie
ContributorsPantic, Maja
PublisherImperial College London
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0092 seconds