Return to search

From multitarget tracking to event recognition in videos

This dissertation addresses two fundamental problems in computer vision—namely,
multitarget tracking and event recognition in videos. These problems are challenging
because uncertainty may arise from a host of sources, including motion blur,
occlusions, and dynamic cluttered backgrounds. We show that these challenges can be
successfully addressed by using a multiscale, volumetric video representation, and
taking into account various constraints between events offered by domain knowledge.
The dissertation presents our two alternative approaches to multitarget tracking. The
first approach seeks to transitively link object detections across consecutive video
frames by finding the maximum independent set of a graph of all object detections.
Two maximum-independent-set algorithms are specified, and their convergence
properties theoretically analyzed. The second approach hierarchically partitions the
space-time volume of a video into tracks of objects, producing a segmentation graph of
that video. The resulting tracks encode rich contextual cues between salient video parts
in space and time, and thus facilitate event recognition, and segmentation in space and
time.
We also describe our two alternative approaches to event recognition. The first
approach seeks to learn a structural probabilistic model of an event class from training
videos represented by hierarchical segmentation graphs. The graph model is then used
for inference of event occurrences in new videos. Learning and inference algorithms
are formulated within the same framework, and their convergence rates theoretically
analyzed. The second approach to event recognition uses probabilistic first-order logic
for reasoning over continuous time intervals. We specify the syntax, learning, and
inference algorithms of this probabilistic event logic.
Qualitative and quantitative results on benchmark video datasets are also presented.
The results demonstrate that our approaches provide consistent video interpretation
with respect to acquired domain knowledge. We outperform most of the state-of-the-art
approaches on benchmark datasets. We also present our new basketball dataset that
complements existing benchmarks with new challenges. / Graduation date: 2011 / Access restricted to the OSU Community at author's request from May 12, 2011 - May 12, 2012

Identiferoai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/21315
Date12 May 2011
CreatorsBrendel, William
ContributorsTodorovic, Sinisa
Source SetsOregon State University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.002 seconds