Video is a sequence of 2D images of the 3D world generated by a camera. As the camera moves relative to the real scene and elements of that scene themselves move, correlated frame-to-frame changes in the video images are induced. Humans easily identify such changes as scene motion and can readily assess attempts to quantify it. For a machine, the identification of the 2D frame-to-frame motion is difficult. This problem is addressed by the computer vision process of tracking. Tracking underpins the solution to the problem of augmenting general video sequences with artificial imagery, a staple task in the visual effects industry. The problem is difficult because tracking in general video sequences is complicated by the presence of non-rigid motion, repeated texture and arbitrary occlusions. Existing methods provide solutions that rely on imposing limitations on the scenes that can be processed or that rely on human artistry and hard work. I introduce new paradigms, frameworks and algorithms for overcoming the challenges of processing general video and thus provide solutions that fill the gap between the `automated' and `manual' approaches. The work is easily sectioned into three parts, which can be considered separately or taken together for dealing with video without limitations. The initial focus is on directly addressing practical issues of human interaction in the tracking process: a new solution is developed by explicitly incorporating the user into an interactive algorithm. It is a novel tracking system based on fast full-frame patch searching and high-speed optimal track determination. This approach makes only minimal assumptions about motion and appearance, making it suitable for the widest variety of input video. I detail an implementation of the new system using k-d trees and dynamic programming. The second distinct contribution is an important extension to tracking algorithms in general. It can be noted that existing tracking algorithms occupy a spectrum in their use of global motion information. Local methods are easily confused by occlusions, repeated texture and image noise. Global motion models offer strong predictions to see through these difficulties and have been used in restricted circumstances, but are defeated by scenes containing independently moving objects or modest levels of non-rigid motion. I present a well principled way of combining local and global models to improve tracking, especially in these highly problematic cases. By viewing rank-constrained tracking as a probabilistic model of 2D tracks instead of 3D motion, I show how one can obtain a robust motion prior that can be easily incorporated in any existing tracking algorithm. The development of the global motion prior is based on rank-constrained factorization of measurement matrices. A common difficulty comes from the frequent occurrence of occlusions in video, which means that the relevant matrices are often not complete due to missing data. This defeats standard factorization algorithms. To fully explain and understand the algorithmic complexities of factorization in this practical context, I present a common notation for the direct comparison of existing algorithms and propose a new family of hybrid approaches that combine the superb initial performance of alternation methods with the convergence power of the Newton algorithm. Together, these investigations provide a wide-ranging, yet coherent exploration of tracking non-rigid objects in video.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:491325 |
Date | January 2008 |
Creators | Buchanan, Aeron Morgan |
Contributors | Fitzgibbon, A. W. |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:82efb277-abc9-4725-9506-5d114a83bd96 |
Page generated in 0.0015 seconds