1 |
Trajectories As a Unifying Cross Domain Feature for Surveillance SystemsWan, Yiwen 12 1900 (has links)
Manual video analysis is apparently a tedious task. An efficient solution is of highly importance to automate the process and to assist operators. A major goal of video analysis is understanding and recognizing human activities captured by surveillance cameras, a very challenging problem; the activities can be either individual or interactional among multiple objects. It involves extraction of relevant spatial and temporal information from visual images. Most video analytics systems are constrained by specific environmental situations. Different domains may require different specific knowledge to express characteristics of interesting events. Spatial-temporal trajectories have been utilized to capture motion characteristics of activities. The focus of this dissertation is on how trajectories are utilized in assist in developing video analytic system in the context of surveillance. The research as reported in this dissertation begins real-time highway traffic monitoring and dynamic traffic pattern analysis and in the end generalize the knowledge to event and activity analysis in a broader context. The main contributions are: the use of the graph-theoretic dominant set approach to the classification of traffic trajectories; the ability to first partition the trajectory clusters using entry and exit point awareness to significantly improve the clustering effectiveness and to reduce the computational time and complexity in the on-line processing of new trajectories; A novel tracking method that uses the extended 3-D Hungarian algorithm with a Kalman filter to preserve the smoothness of motion; a novel camera calibration method to determine the second vanishing point with no operator assistance; and a logic reasoning framework together with a new set of context free LLEs which could be utilized across different domains. Additional efforts have been made for three comprehensive surveillance systems together with main contributions mentioned above.
|
2 |
Efficient Localization of Human Actions and Moments in VideosEscorcia, Victor 07 1900 (has links)
We are stumbling across a video tsunami flooding our communication channels.
The ubiquity of digital cameras and social networks has increased the amount of visual
media content generated and shared by people, in particular videos. Cisco reports
that 82% of the internet traffic would be in the form of videos by 2022. The computer
vision community has embraced this challenge by offering the first building blocks to
translate the visual data in segmented video clips into semantic tags. However, users
usually require to go beyond tagging at the video level. For example, someone may
want to retrieve important moments such as the “first steps of her child” from a large
collection of untrimmed videos; or retrieving all the instances of a home-run from an
unsegmented video of baseball. In the face of this data deluge, it becomes crucial
to develop efficient and scalable algorithms that can intelligently localize semantic
visual content in untrimmed videos.
In this work, I address three different challenges on the localization of actions in
videos. First, I develop deep-based action proposals and detection models that take a
video and generate action-agnostic and class-specific temporal segments, respectively.
These models retrieve temporal locations with high accuracy in an efficient manner,
faster than real-time. Second, I propose the new task to retrieve and localize temporal
moments from a collection of videos given a natural language query. To tackle this
challenge, I introduce an efficient and effective model that aligns the text query to
individual clips of fixed length while still retrieves moments spanning multiple clips.
This approach not only allows smooth interactions with users via natural languagequeries but also reduce the index size and search time for retrieving the moments.
Lastly, I introduce the concept of actor-supervision that exploits the inherent compo
sitionality of actions, in terms of transformations of actors, to achieve spatiotemporal
localization of actions without the need of action box annotations. By designing ef
ficient models to scan a single video in real-time; retrieve and localizing moments of
interest from multiple videos; and an effective strategy to localize actions without
resorting in action box annotations, this thesis provides insights that put us closer to
the goal of general video understanding.
|
Page generated in 0.103 seconds