The recognition of time-varying geometry (TVG) objects (in particular, humans) and their actions is a complex task due to common real-world sensing challenges, such as obstacles and environmental variations, as well as due to issues specific to TVG objects, such as self-occlusion. Herein, it is proposed that a multi-camera active-vision system, which dynamically selects camera poses in real-time, be used to improve TVG action sensing performance by selecting camera views on-line for near-optimal sensing-task performance. Active vision for TVG objects requires an on-line sensor-planning strategy that incorporates information about the object itself, including its current action, and information about the state of the environment, including obstacles, into the pose-selection process. Thus, the focus of this research is the development of a novel methodology for real-time sensing-system reconfiguration (active vision), designed specifically for the recognition of a single TVG object and its actions in a cluttered, dynamic environment, which may contain multiple other dynamic (maneuvering) obstacles.
The proposed methodology was developed as a complete, customizable sensing-system framework which can be readily modified to suit a variety of specific TVG action-sensing tasks – a 10-stage pipeline real-time architecture. Sensor Agents capture and correct camera images, removing noise and lens distortion, and segment the images into regions of interest. A Synchronization Agent aligns multiple images from different cameras to a single ‘world-time.’ Point Tracking and De-Projection Agents detect, identify, and track points of interest in the resultant 2-D images, and form constraints in normalized camera coordinates using the tracked pixel coordinates. A 3-D Solver Agent combines all constraints to estimate world-coordinate positions for all visible features of the object-of-interest (OoI) 3-D articulated model. A Form-Recovery Agent uses an iterative process to combine model constraints, detected feature points, and other contextual information to produce an estimate of the OoI’s current form. This estimate is used by an Action-Recognition Agent to determine which action the OoI is performing, if any, from a library of known actions, using a feature-vector descriptor for identification. A Prediction Agent provides estimates of future OoI and obstacle poses, given past detected locations, and estimates of future OoI forms given the current action and past forms. Using all of the data accumulated in the pipeline, a Central Planning Agent implements a formal, mathematical optimization developed from the general sensing problem. The agent seeks to optimize a visibility metric, which is positively related to sensing-task performance, to select desirable, feasible, and achievable camera poses for the next sensing instant. Finally, a Referee Agent examines the complete set of chosen poses for consistency, enforces global rules not captured through the optimization, and maintains system functionality if a suitable solution cannot be determined.
In order to validate the proposed methodology, rigorous experiments are also presented herein. They confirm the basic assumptions of active vision for TVG objects, and characterize the gains in sensing-task performance. Simulated experiments provide a method for rapid evaluation of new sensing tasks. These experiments demonstrate a tangible increase in single-action recognition performance over the use of a static-camera sensing system. Furthermore, they illustrate the need for feedback in the pose-selection process, allowing the system to incorporate knowledge of the OoI’s form and action. Later real-world, multi-action and multi-level action experiments demonstrate the same tangible increase when sensing real-world objects that perform multiple actions which may occur simultaneously, or at differing levels of detail.
A final set of real-world experiments characterizes the real-time performance of the proposed methodology in relation to several important system design parameters, such as the number of obstacles in the environment, and the size of the action library. Overall, it is concluded that the proposed system tangibly increases TVG action-sensing performance, and can be generalized to a wide range of applications, including human-action sensing. Future research is proposed to develop similar methods to address deformable objects and multiple objects of interest.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/31849 |
Date | 10 January 2012 |
Creators | Mackay, Matthew Donald |
Contributors | Benhabib, Beno |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0023 seconds