Return to search

Automated Audio-visual Activity Analysis

Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/30568
Date20 September 2005
CreatorsStauffer, Chris
Source SetsM.I.T. Theses and Dissertation
Languageen_US
Detected LanguageEnglish
Format9 p., 32903979 bytes, 1153580 bytes, application/postscript, application/pdf
RelationMassachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Page generated in 0.0014 seconds