Global ETD Search

1	Some topics on similarity metric learning Cao, Qiong January 2015 (has links) The success of many computer vision problems and machine learning algorithms critically depends on the quality of the chosen distance metrics or similarity functions. Due to the fact that the real-data at hand is inherently task- and data-dependent, learning an appropriate distance metric or similarity function from data for each specific task is usually superior to the default Euclidean distance or cosine similarity. This thesis mainly focuses on developing new metric and similarity learning models for three tasks: unconstrained face verification, person re-identification and kNN classification. Unconstrained face verification is a binary matching problem, the target of which is to predict whether two images/videos are from the same person or not. Concurrently, person re-identification handles pedestrian matching and ranking across non-overlapping camera views. Both vision problems are very challenging because of the large transformation differences in images or videos caused by pose, expression, occlusion, problematic lighting and viewpoint. To address the above concerns, two novel methods are proposed. Firstly, we introduce a new dimensionality reduction method called Intra-PCA by considering the robustness to large transformation differences. We show that Intra-PCA significantly outperforms the classic dimensionality reduction methods (e.g. PCA and LDA). Secondly, we propose a novel regularization framework called Sub-SML to learn distance metrics and similarity functions for unconstrained face verifica- tion and person re-identification. The main novelty of our formulation is to incorporate both the robustness of Intra-PCA to large transformation variations and the discriminative power of metric and similarity learning, a property that most existing methods do not hold. Working with the task of kNN classification which relies a distance metric to identify the nearest neighbors, we revisit some popular existing methods for metric learning and develop a general formulation called DMLp for learning a distance metric from data. To obtain the optimal solution, a gradient-based optimization algorithm is proposed which only needs the computation of the largest eigenvector of a matrix per iteration. Although there is a large number of studies devoted to metric/similarity learning based on different objective functions, few studies address the generalization analysis of such methods. We describe a novel approch for generalization analysis of metric/similarity learning which can deal with general matrix regularization terms including the Frobenius norm, sparse L1-norm, mixed (2, 1)-norm and trace-norm. The novel models developed in this thesis are evaluated on four challenging databases: the Labeled Faces in the Wild dataset for unconstrained face verification in still images; the YouTube Faces database for video-based face verification in the wild; the Viewpoint Invariant Pedestrian Recognition database for person re-identification; the UCI datasets for kNN classification. Experimental results show that the proposed methods yield competitive or state-of-the-art performance. 004 Machine Learning ; Computer Vision
2	Performance characterization of boosting in computer vision / Li, Weiliang. January 2005 (has links) Thesis (Ph. D.)--Lehigh University, 2005. / Includes vita. Includes bibliographical references (leaves 163-177).
3	Autonomous visual learning for robotic systems Beale, Dan January 2012 (has links) This thesis investigates the problem of visual learning using a robotic platform. Given a set of objects the robots task is to autonomously manipulate, observe, and learn. This allows the robot to recognise objects in a novel scene and pose, or separate them into distinct visual categories. The main focus of the work is in autonomously acquiring object models using robotic manipulation. Autonomous learning is important for robotic systems. In the context of vision, it allows a robot to adapt to new and uncertain environments, updating its internal model of the world. It also reduces the amount of human supervision needed for building visual models. This leads to machines which can operate in environments with rich and complicated visual information, such as the home or industrial workspace; also, in environments which are potentially hazardous for humans. The hypothesis claims that inducing robot motion on objects aids the learning process. It is shown that extra information from the robot sensors provides enough information to localise an object and distinguish it from the background. Also, that decisive planning allows the object to be separated and observed from a variety of dierent poses, giving a good foundation to build a robust classication model. Contributions include a new segmentation algorithm, a new classication model for object learning, and a method for allowing a robot to supervise its own learning in cluttered and dynamic environments. 629.892637
4	Graph based semi-supervised learning in computer vision Huang, Ning, January 2009 (has links) Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Biomedical Engineering." Includes bibliographical references (p. 54-55).
5	Long term appearance-based mapping with vision and laser Paul, Rohan January 2012 (has links) This thesis is about appearance-based topological mapping for mobile robots using vision and laser. Our goal is life-long continual operation in outdoor unstruc- tured workspaces. We present a new probabilistic framework for appearance-based mapping and navigation incorporating spatial and visual appearance. Locations are encoded prob- abilistically as random graphs possessing latent distributions over visual features and pair-wise euclidean distances generating observations modeled as 3D constellations of features observed via noisy range and visual detectors. Multi-modal distributions over inter-feature distances are learnt using non-parametric kernel density estima- tion. Inference is accelerated by executing a Delaunay tessellation of the observed graph with minimal loss in performance, scaling log-linearly with scene complexity. Next, we demonstrate how a robot can, through introspection and then targeted data retrieval, improve its own place recognition performance. We introduce the idea of a dynamic sampling set, the onboard workspace representation, that adapts with increasing visual experience of continually operating robot. Based on a topic based probabilistic model of images, we use a measure of perplexity to evaluate how well a working set of background images explains the robot’s online view of the world. O/ine, the robot then searches an external resource to seek additional background images that bolster its ability to localize in its environment when used next. Finally, we present an online and incremental approach allowing an exploring robot to generate apt and compact summaries of its life experience using canon- ical images that capture the essence of the robot’s visual experience-illustrating both what was ordinary and what was extraordinary. Leveraging probabilistic topic models and an incremental graph clustering technique we present an algorithm that scales well with time and variation of experience, generating a summary that evolves incrementally with the novelty of data. 629.8932
6	Learning real-time object detectors probabilistic generative approaches / Fasel, Ian Robert. January 2006 (has links) Thesis (Ph. D.)--University of California, San Diego, 2006. / Title from first page of PDF file (viewed July 24, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 87-91).
7	Discriminative hand-object pose estimation from depth images using convolutional neural networks Goudie, Duncan January 2018 (has links) This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released. 004
8	Deep Learning of Human Emotion Recognition in Videos Li, Yuqing January 2017 (has links) No description available. Annan elektroteknik och elektronik
9	Lokalizace obličejů ve video sekvencích v reálném čase / Real time face recognizer Juráček, Aleš January 2009 (has links) My diploma thesis deals about face detection in picture. I try to outline problems of computer vision, artificial intelligence and machine learning. I described in details the proposed detection by Viola and Jones, which uses AdaBoost learning algorithm. This method was deliberately chosen for speed and detection accuracy. This detector was made in programming language C / C + + using the OpenCV library. To a final learning was used database of faces images „MIT CVCL Face Database“. The main goal was to propose the face detector utilizable also in video-sequences.
10	Learning to Predict Dense Correspondences for 6D Pose Estimation Brachmann, Eric 17 January 2018 (has links) Object pose estimation is an important problem in computer vision with applications in robotics, augmented reality and many other areas. An established strategy for object pose estimation consists of, firstly, finding correspondences between the image and the object’s reference frame, and, secondly, estimating the pose from outlier-free correspondences using Random Sample Consensus (RANSAC). The first step, namely finding correspondences, is difficult because object appearance varies depending on perspective, lighting and many other factors. Traditionally, correspondences have been established using handcrafted methods like sparse feature pipelines. In this thesis, we introduce a dense correspondence representation for objects, called object coordinates, which can be learned. By learning object coordinates, our pose estimation pipeline adapts to various aspects of the task at hand. It works well for diverse object types, from small objects to entire rooms, varying object attributes, like textured or texture-less objects, and different input modalities, like RGB-D or RGB images. The concept of object coordinates allows us to easily model and exploit uncertainty as part of the pipeline such that even repeating structures or areas with little texture can contribute to a good solution. Although we can train object coordinate predictors independent of the full pipeline and achieve good results, training the pipeline in an end-to-end fashion is desirable. It enables the object coordinate predictor to adapt its output to the specificities of following steps in the pose estimation pipeline. Unfortunately, the RANSAC component of the pipeline is non-differentiable which prohibits end-to-end training. Adopting techniques from reinforcement learning, we introduce Differentiable Sample Consensus (DSAC), a formulation of RANSAC which allows us to train the pose estimation pipeline in an end-to-end fashion by minimizing the expectation of the final pose error. info:eu-repo/classification/ddc/004 ddc:004

Search results