Recognition of complex events in consumer uploaded Internet videos, captured under realworld settings, has emerged as a challenging area of research across both computer vision and multimedia community. In this dissertation, we present a systematic decomposition of complex events into hierarchical components and make an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy and identify three key stages where we make novel contributions, keeping complex events in focus. These are listed as follows: (a) Extraction of novel semi-global features – firstly, we introduce a Lie-algebra based representation of dominant camera motion present while capturing videos and show how this can be used as a complementary feature for video analysis. Secondly, we propose compact clip level descriptors of a video based on covariance of appearance and motion features which we further use in a sparse coding framework to recognize realistic actions and gestures. (b) Construction of intermediate representations – We propose an efficient probabilistic representation from low-level features computed from videos, based on Maximum Likelihood Estimates which demonstrates state of the art performance in large scale visual concept detection, and finally, (c) Modeling temporal interactions between intermediate concepts – Using block Hankel matrices and harmonic analysis of slowly evolving Linear Dynamical Systems, we propose two new discriminative feature spaces for complex event recognition and demonstrate significantly improved recognition rates over previously proposed approaches.
Identifer | oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:etd-3604 |
Date | 01 January 2013 |
Creators | Bhattacharya, Subhabrata |
Publisher | STARS |
Source Sets | University of Central Florida |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Electronic Theses and Dissertations |
Page generated in 0.0022 seconds