Global ETD Search

Return to search

Automated Audio-visual Activity Analysis

Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/30568
Date	20 September 2005
Creators	Stauffer, Chris
Source Sets	M.I.T. Theses and Dissertation
Language	en_US
Detected Language	English
Format	9 p., 32903979 bytes, 1153580 bytes, application/postscript, application/pdf
Relation	Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Page generated in 0.0014 seconds

Automated Audio-visual Activity Analysis

Description

Links & Downloads

Tags

Additional Fields