Return to search

High-level activity learning and recognition in structured environments

Automatic recognition of events in video is an immensly challenging problem. If solved, the number of potential domains in which such a system could be deployed is vast and growing; including traffic monitoring, surveillance, security, elderly care and semantic video search to name but a few. Much prior research in the area has focused on producing a solution that is tailored towards one of these applications, applying methods which are most appropriate given the constraints of the target domain. For the moment, this remains to some extent the only practical way to approach the problem. The aim in this thesis is to build a high-level framework for event recognition which is in the main generic and widely transferrable, yet allows domain-appropriate elements to be incorporated. A detector is constructed for low-level events which is based on dense extraction of Histograms of Optical Flow. This descriptor has only recently been adopted by the event detection community, and as such there are aspects of the features which have not been optimized. This thesis performs extensive experimentation on normalization scheme and finds that the strategy most widely in use is suboptimal compared to one of the alternatives proposed. The detector is then trained on a challenging real world domain to run in a sliding window fashion on continuous video input. A high level model which exploits temporal relations between different event types is constructed. The model is designed with transferrability and computational tractability in mind. Several methods are benchmarked for learning the distributions over time differences between pairs of events. Three different connection strategies are proposed and evaluated for creating a tree structured prior that permits fast, exact inference. An efficient iterative optimization scheme is presented for handling scenarios which contain unknown numbers of event instances. Finally, the model is extended in a Conditional Random Field framework that allows weights to be learned to balance the response from independent detectors with the pairwise temporal relationships.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:564189
Date January 2012
CreatorsGreenall, John Patrick
ContributorsCohn, A. ; Hogg, D.
PublisherUniversity of Leeds
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://etheses.whiterose.ac.uk/3231/

Page generated in 0.0299 seconds