221 |
Learning Probabilistic Models for Visual MotionRoss, David A. 26 February 2009 (has links)
A fundamental goal of computer vision is the ability to analyze motion. This can range from the simple task of locating or tracking a single rigid object as it moves across an image plane, to recovering the full pose parameters of a collection of nonrigid objects interacting in a scene. The current state of computer vision research, as with the preponderance of challenges that comprise "artificial intelligence", is that the abilities of humans can only be matched in very narrow domains by carefully and specifically engineered systems.
The key to broadening the applicability of these successful systems is to imbue them with the flexibility to handle new inputs, and to adapt automatically without the manual intervention of human engineers. In this research we attempt to address this challenge by proposing solutions to motion analysis tasks that are based on machine learning.
We begin by addressing the challenge of tracking a rigid object in video, presenting two complementary approaches. First we explore the problem of learning a particular choice of appearance model---principal components analysis (PCA)---from a very limited set of training data. However, PCA is far from the only appearance model available. This raises the question: given a new tracking task, how should one select the most-appropriate models of appearance and dynamics? Our second approach proposes a data-driven solution to this problem, allowing the choice of models, along with their parameters, to be learned from a labelled video sequence.
Next we consider motion analysis at a higher-level of organization. Given a set of trajectories obtained by tracking various feature points, how can we discover the underlying non-rigid structure of the object or objects? We propose a solution that models the observed sequence in terms of probabilistic "stick figures", under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed.
We demonstrate the ability to recover the invariant structure and the pose of articulated objects from a number of challenging datasets.
|
222 |
Visual Object Recognition Using Generative Models of ImagesNair, Vinod 01 September 2010 (has links)
Visual object recognition is one of the key human capabilities that we would like machines to have. The problem is the following: given an image of an object (e.g. someone's face), predict its label (e.g. that person's name) from a set of possible object labels. The predominant approach to solving the recognition problem has been to learn a discriminative model, i.e. a model of the conditional probability $P(l|v)$ over possible object labels $l$ given an image $v$.
Here we consider an alternative class of models, broadly referred to as \emph{generative models}, that learns the latent structure of the image so as to explain how it was generated. This is in contrast to discriminative models, which dedicate their parameters exclusively to representing the conditional distribution $P(l|v)$. Making finer distinctions among generative models, we consider a supervised generative model of the joint distribution $P(v,l)$ over image-label pairs, an unsupervised generative model of the distribution $P(v)$ over images alone, and an unsupervised \emph{reconstructive} model, which includes models such as autoencoders that can reconstruct a given image, but do not define a proper distribution over images. The goal of this thesis is to empirically demonstrate various ways of using these models for object recognition. Its main conclusion is that such models are not only useful for recognition, but can even outperform purely discriminative models on difficult recognition tasks.
We explore four types of applications of generative/reconstructive models for recognition: 1) incorporating complex domain knowledge into the learning by inverting a synthesis model, 2) using the latent image representations of generative/reconstructive models for recognition, 3) optimizing a hybrid generative-discriminative loss function, and 4) creating additional synthetic data for training more accurate discriminative models. Taken together, the results for these applications support the idea that generative/reconstructive models and unsupervised learning have a key role to play in building object recognition systems.
|
223 |
Learning Probabilistic Models for Visual MotionRoss, David A. 26 February 2009 (has links)
A fundamental goal of computer vision is the ability to analyze motion. This can range from the simple task of locating or tracking a single rigid object as it moves across an image plane, to recovering the full pose parameters of a collection of nonrigid objects interacting in a scene. The current state of computer vision research, as with the preponderance of challenges that comprise "artificial intelligence", is that the abilities of humans can only be matched in very narrow domains by carefully and specifically engineered systems.
The key to broadening the applicability of these successful systems is to imbue them with the flexibility to handle new inputs, and to adapt automatically without the manual intervention of human engineers. In this research we attempt to address this challenge by proposing solutions to motion analysis tasks that are based on machine learning.
We begin by addressing the challenge of tracking a rigid object in video, presenting two complementary approaches. First we explore the problem of learning a particular choice of appearance model---principal components analysis (PCA)---from a very limited set of training data. However, PCA is far from the only appearance model available. This raises the question: given a new tracking task, how should one select the most-appropriate models of appearance and dynamics? Our second approach proposes a data-driven solution to this problem, allowing the choice of models, along with their parameters, to be learned from a labelled video sequence.
Next we consider motion analysis at a higher-level of organization. Given a set of trajectories obtained by tracking various feature points, how can we discover the underlying non-rigid structure of the object or objects? We propose a solution that models the observed sequence in terms of probabilistic "stick figures", under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed.
We demonstrate the ability to recover the invariant structure and the pose of articulated objects from a number of challenging datasets.
|
224 |
Visual Object Recognition Using Generative Models of ImagesNair, Vinod 01 September 2010 (has links)
Visual object recognition is one of the key human capabilities that we would like machines to have. The problem is the following: given an image of an object (e.g. someone's face), predict its label (e.g. that person's name) from a set of possible object labels. The predominant approach to solving the recognition problem has been to learn a discriminative model, i.e. a model of the conditional probability $P(l|v)$ over possible object labels $l$ given an image $v$.
Here we consider an alternative class of models, broadly referred to as \emph{generative models}, that learns the latent structure of the image so as to explain how it was generated. This is in contrast to discriminative models, which dedicate their parameters exclusively to representing the conditional distribution $P(l|v)$. Making finer distinctions among generative models, we consider a supervised generative model of the joint distribution $P(v,l)$ over image-label pairs, an unsupervised generative model of the distribution $P(v)$ over images alone, and an unsupervised \emph{reconstructive} model, which includes models such as autoencoders that can reconstruct a given image, but do not define a proper distribution over images. The goal of this thesis is to empirically demonstrate various ways of using these models for object recognition. Its main conclusion is that such models are not only useful for recognition, but can even outperform purely discriminative models on difficult recognition tasks.
We explore four types of applications of generative/reconstructive models for recognition: 1) incorporating complex domain knowledge into the learning by inverting a synthesis model, 2) using the latent image representations of generative/reconstructive models for recognition, 3) optimizing a hybrid generative-discriminative loss function, and 4) creating additional synthetic data for training more accurate discriminative models. Taken together, the results for these applications support the idea that generative/reconstructive models and unsupervised learning have a key role to play in building object recognition systems.
|
225 |
Polygonal meshing for stereo video surface reconstruction /Gill, Sunbir. January 2007 (has links)
Thesis (M.Sc.)--York University, 2007. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves118-124). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR38774
|
226 |
A region merging methodology for color and texture image segmentationTan, Zhigang, January 2009 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2010. / Includes bibliographical references (p. 139-144). Also available in print.
|
227 |
GPU acceleration of object classification algorithms using NVIDIA CUDA /Harvey, Jesse Patrick. January 2009 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 2009. / Typescript. Includes bibliographical references (leaves 76-80).
|
228 |
3D shape recovery under multiple viewpoints and single viewpointChen, Zhihu, 陈志湖 January 2012 (has links)
This thesis introduces novel algorithms for 3D shape recovery under multiple viewpoints and single viewpoint. Surface of a 3D object is reconstructed by either graph-cuts using images under multiple viewpoints, depth from reflection under a fixed viewpoint, or depth from refraction under a fixed viewpoint.
The first part of this thesis revisits the graph-cuts based approach for solving the multi-view stereo problem and proposes a novel foreground / background energy. Unlike traditional graph-cuts based methods which focus on the photo-consistency energy, this thesis targets at deriving a robust and unbiased foreground / background energy which depends on data. It is shown that by using the proposed foreground / background energy, it is possible to recover the object surface from noisy depth maps even in the absence of the photo-consistency energy, which demonstrates the effectiveness of the proposed energy.
In the second part of this thesis, a novel method for shape recovery is proposed based on reflection of light using a spherical mirror. Unlike other existing methods which require the prior knowledge of the position and the radius of the spherical mirror, it is shown in this thesis that the object can be reconstructed up to an unknown scale using an unknown spherical mirror.
This thesis finally considers recovering object surfaces based on refraction of light and presents a novel depth from refraction method. A scene is captured several times by a fixed camera, with the first image (referred to as the direct image) captured directly by the camera and the others (referred to as the refracted images) by placing a transparent medium with two parallel planar faces between the scene and the camera. With a known pose and refractive index of the medium, a depth map of the scene is then recovered from the displacements of scene points in the images. Unlike traditional depth from refraction methods which require extra steps to estimate the pose and the refractive index of the medium, this thesis presents a novel method to estimate them from the direct and refracted images of the scene. It is shown that the pose of the medium can be recovered from one direct image and one refracted image. It is also shown that the refractive index of the medium can be recovered with a third image captured with the medium placed in a different pose. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
229 |
Learning structural SVMs and its applications in computer visionKuang, Zhanghui, 旷章辉 January 2014 (has links)
Many computer vision problems involve building automatic systems by extracting complex high-level information from visual data. Such problems can often be modeled using structural models, which relate raw input variables to structural high-level output variables. Structural support vector machine is a discriminative method for learning structural models. It allows a flexible feature construction with good robustness against overfitting, and thus provides state-of-the-art prediction accuracies for structural prediction tasks in computer vision.
This thesis first studies the application of structural SVMs in interactive image segmentation. A novel interactive image segmentation technique that automatically learns segmentation parameters tailored for each and every image is proposed. Unlike existing work, the proposed method does not require any offline parameter tuning or training stage, and is capable of determining image-specific parameters according to some simple user interactions with the target image. The segmentation problem is modeled as an inference of a conditional random field (CRF) over a segmentation mask and the target image.
This CRF is parametrized by the weights for different terms (e.g., color, texture and smoothing). These weight parameters are learned via a one-slack structural SVM, which is solved using a constraint approximation scheme and the cutting plane algorithm. Experimental results show that the proposed method, by learning image-specific parameters automatically, outperforms other state-of-the-art interactive
image segmentation techniques.
This thesis then uses structural SVMs to speed up large scale relatively-paired space analysis. A new multi-modality analysis technique based on relatively-paired observations from multiple modalities is proposed. Relative-pairing information is encoded using relative proximities of observations in a latent common space. By building a discriminative model and maximizing a distance margin, a projection function that maps observations into the latent common space is learned for each modality. However, training based on large scale relatively-paired observations could be extremely time consuming. To this end, the training is reformulated as learning a structural model, which can be optimized by the cutting plane algorithm where only a few training samples are involved in each iteration. Experimental results validate the effectiveness and efficiency of the proposed technique. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
230 |
The Design of a Fully Autonomous RC RacecarBlack, Richard A. 10 1900 (has links)
This paper discusses the design of an autonomous remote-controlled racecar to play a one-on-one match of capture the flag. A competition was held, and the results are presented and conclusions are made.
|
Page generated in 0.1006 seconds