• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Observations on Cortical Mechanisms for Object Recognition andsLearning

Poggio, Tomaso, Hurlbert, Anya 01 December 1993 (has links)
This paper sketches a hypothetical cortical architecture for visual 3D object recognition based on a recent computational model. The view-centered scheme relies on modules for learning from examples, such as Hyperbf-like networks. Such models capture a class of explanations we call Memory-Based Models (MBM) that contains sparse population coding, memory-based recognition, and codebooks of prototypes. Unlike the sigmoidal units of some artificial neural networks, the units of MBMs are consistent with the description of cortical neurons. We describe how an example of MBM may be realized in terms of cortical circuitry and biophysical mechanisms, consistent with psychophysical and physiological data.
2

Study Of Human Activity In Video Data With An Emphasis On View-invariance

Ashraf, Nazim 01 January 2012 (has links)
The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations, and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, this means that in order to define a new action, the user has to record a number of videos from different viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is broken down into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point iii planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on “fundamental ratios.” We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based on the concept of “Projective Depth”. Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using “projective depth:” (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the “projective depth” of the body points with respect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person - We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of body points, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants. iv
3

Multi-view Geometric Constraints For Human Action Recognition And Tracking

Gritai, Alexei 01 January 2007 (has links)
Human actions are the essence of a human life and a natural product of the human mind. Analysis of human activities by a machine has attracted the attention of many researchers. This analysis is very important in a variety of domains including surveillance, video retrieval, human-computer interaction, athlete performance investigation, etc. This dissertation makes three major contributions to automatic analysis of human actions. First, we conjecture that the relationship between body joints of two actors in the same posture can be described by a 3D rigid transformation. This transformation simultaneously captures different poses and various sizes and proportions. As a consequence of this conjecture, we show that there exists a fundamental matrix between the imaged positions of the body joints of two actors, if they are in the same posture. Second, we propose a novel projection model for cameras moving at a constant velocity in 3D space, \emph cameras, and derive the Galilean fundamental matrix and apply it to human action recognition. Third, we propose a novel use for the invariant ratio of areas under an affine transformation and utilizing the epipolar geometry between two cameras for 2D model-based tracking of human body joints. In the first part of the thesis, we propose an approach to match human actions using semantic correspondences between human bodies. These correspondences are used to provide geometric constraints between multiple anatomical landmarks ( e.g. hands, shoulders, and feet) to match actions observed from different viewpoints and performed at different rates by actors of differing anthropometric proportions. The fact that the human body has approximate anthropometric proportion allows for innovative use of the machinery of epipolar geometry to provide constraints for analyzing actions performed by people of different anthropometric sizes, while ensuring that changes in viewpoint do not affect matching. A novel measure in terms of rank of matrix constructed only from image measurements of the locations of anatomical landmarks is proposed to ensure that similar actions are accurately recognized. Finally, we describe how dynamic time warping can be used in conjunction with the proposed measure to match actions in the presence of nonlinear time warps. We demonstrate the versatility of our algorithm in a number of challenging sequences and applications including action synchronization , odd one out, following the leader, analyzing periodicity etc. Next, we extend the conventional model of image projection to video captured by a camera moving at constant velocity. We term such moving camera Galilean camera. To that end, we derive the spacetime projection and develop the corresponding epipolar geometry between two Galilean cameras. Both perspective imaging and linear pushbroom imaging form specializations of the proposed model and we show how six different ``fundamental" matrices including the classic fundamental matrix, the Linear Pushbroom (LP) fundamental matrix, and a fundamental matrix relating Epipolar Plane Images (EPIs) are related and can be directly recovered from a Galilean fundamental matrix. We provide linear algorithms for estimating the parameters of the the mapping between videos in the case of planar scenes. For applying fundamental matrix between Galilean cameras to human action recognition, we propose a measure that has two important properties. First property makes it possible to recognize similar actions, if their execution rates are linearly related. Second property allows recognizing actions in video captured by Galilean cameras. Thus, the proposed algorithm guarantees that actions can be correctly matched despite changes in view, execution rate, anthropometric proportions of the actor, and even if the camera moves with constant velocity. Finally, we also propose a novel 2D model based approach for tracking human body parts during articulated motion. The human body is modeled as a 2D stick figure of thirteen body joints and an action is considered as a sequence of these stick figures. Given the locations of these joints in every frame of a model video and the first frame of a test video, the joint locations are automatically estimated throughout the test video using two geometric constraints. First, invariance of the ratio of areas under an affine transformation is used for initial estimation of the joint locations in the test video. Second, the epipolar geometry between the two cameras is used to refine these estimates. Using these estimated joint locations, the tracking algorithm determines the exact location of each landmark in the test video using the foreground silhouettes. The novelty of the proposed approach lies in the geometric formulation of human action models, the combination of the two geometric constraints for body joints prediction, and the handling of deviations in anthropometry of individuals, viewpoints, execution rate, and style of performing action. The proposed approach does not require extensive training and can easily adapt to a wide variety of articulated actions.

Page generated in 0.0757 seconds