In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address three main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts using the supervisory information available from subtitles; and (iii) generalisation given sign examples from one signer to recognition of signs from different signers. Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. Lastly, we show that automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer, we train discriminative classifiers and show that these can successfully classify and localise signs in new signers. This demonstrates that the descriptor we extract for each frame (i.e. hand position, hand shape, and hand orientation) generalises between different signers.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:526577 |
Date | January 2010 |
Creators | Buehler, Patrick |
Contributors | Zisserman, Andrew ; Everingham, Mark ; Brady, Michael |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:2930e980-4307-41bf-b4ff-87e8c4d0d722 |
Page generated in 0.002 seconds