Spelling suggestions: "subject:"computer vision cachine learning"" "subject:"computer vision amachine learning""
1 |
Multi-cue visual tracking: feature learning and fusionLan, Xiangyuan 10 August 2016 (has links)
As an important and active research topic in computer vision community, visual tracking is a key component in many applications ranging from video surveillance and robotics to human computer. In this thesis, we propose new appearance models based on multiple visual cues and address several research issues in feature learning and fusion for visual tracking. Feature extraction and feature fusion are two key modules to construct the appearance model for the tracked target with multiple visual cues. Feature extraction aims to extract informative features for visual representation of the tracked target, and many kinds of hand-crafted feature descriptors which capture different types of visual information have been developed. However, since large appearance variations, e.g. occlusion, illumination may occur during tracking, the target samples may be contaminated/corrupted. As such, the extracted raw features may not be able to capture the intrinsic properties of the target appearance. Besides, without explicitly imposing the discriminability, the extracted features may potentially suffer background distraction problem. To extract uncontaminated discriminative features from multiple visual cues, this thesis proposes a novel robust joint discriminative feature learning framework which is capable of 1) simultaneously and optimally removing corrupted features and learning reliable classifiers, and 2) exploiting the consistent and feature-specific discriminative information of multiple feature. In this way, the features and classifiers learned from potentially corrupted tracking samples can be better utilized for target representation and foreground/background discrimination. As shown by the Data Processing Inequality, information fusion in feature level contains more information than that in classifier level. In addition, not all visual cues/features are reliable, and thereby combining all the features may not achieve a better tracking performance. As such, it is more reasonable to dynamically select and fuse multiple visual cues for visual tracking. Based on aforementioned considerations, this thesis proposes a novel joint sparse representation model in which feature selection, fusion, and representation are performed optimally in a unified framework. By taking advantages of sparse representation, unreliable features are detected and removed while reliable features are fused on feature level for target representation. In order to capture the non-linear similarity of features, the model is further extended to perform feature fusion in kernel space. Experimental results demonstrate the effectiveness of the proposed model. Since different visual cues extracted from the same object should share some commonalities in their representations and each feature should also have some diversities to reflect its complementarity in appearance modeling, another important problem in feature fusion is how to learn the commonality and diversity in the fused representations of multiple visual cues to enhance the tracking accuracy. Different from existing multi-cue sparse trackers which only consider the commonalities among the sparsity patterns of multiple visual cues, this thesis proposes a novel multiple sparse representation model for multi-cue visual tracking which jointly exploits the underlying commonalities and diversities of different visual cues by decomposing multiple sparsity patterns. Moreover, this thesis introduces a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple visual cues are more representative. Experimental results on tracking benchmark videos and other challenging videos show that the proposed tracker achieves better performance than the existing sparsity-based trackers and other state-of-the-art trackers.
|
2 |
Geometry and uncertainty in deep learning for computer visionKendall, Alex Guy January 2019 (has links)
Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. However, the science of computer vision aims to build machines which can see. This requires models which can extract richer information than recognition, from images and video. In general, applying these deep learning models from recognition to other problems in computer vision is significantly more challenging. This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation. Our models outperform traditional approaches and advance state-of-the-art on a number of challenging computer vision benchmarks. However, these end-to-end models are often not interpretable and require enormous quantities of training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the physical world, and (ii) we cannot know everything from data, our models should be aware of what they do not know. This thesis explores these ideas using concepts from geometry and uncertainty. Specifically, we show how to improve end-to-end deep learning models by leveraging the underlying geometry of the problem. We explicitly model concepts such as epipolar geometry to learn with unsupervised learning, which improves performance. Secondly, we introduce ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models. We show how to quantify different types of uncertainty, improving safety for real world applications.
|
3 |
Vision-based human directed robot guidance /Arthur, Richard B., January 2004 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2004. / Includes bibliographical references (p. 103-107).
|
4 |
Activity recognition in desktop environments /Shen, Jianqiang. January 1900 (has links)
Thesis (Ph. D.)--Oregon State University, 2009. / Printout. Includes bibliographical references (leaves 129-138). Also available on the World Wide Web.
|
5 |
A multi-level machine learning system for attention-based object recognitionHan, Ji Wan January 2011 (has links)
This thesis develops a trainable object-recognition algorithm. This algorithm represents objects using their salient features. The algorithm applies an attention mechanism to speed up feature detection. A trainable component-based object recognition system which implements the developed algorithm has been created. This system has two layers. The first layer contains several individual feature classifiers. They detect salient features which compose higher level objects from input images. The second layer judges if those detected features form a valid object. An object is represented by a feature map which stores the geometrical and hierarchical relations among features and higher level objects. It is the input to the second layer. The attention mechanism is applied to improve feature detection speed. This mechanism will lead the system to areas with a higher likelihood of containing features when a few features are detected. Therefore the feature detection will be sped up. Two major experiments are conducted. These experiments applied the de- veloped system to discriminate faces from non-faces and to discriminate people from backgrounds in thermal images. The results of these experiments show the success of the implemented system. The attention mechanism displays a positive effect on feature detection. It can save feature detection time, especially in terms of classifier calls.
|
6 |
Deep learning based facial expression recognition and its applicationsJan, Asim January 2017 (has links)
Facial expression recognition (FER) is a research area that consists of classifying the human emotions through the expressions on their face. It can be used in applications such as biometric security, intelligent human-computer interaction, robotics, and clinical medicine for autism, depression, pain and mental health problems. This dissertation investigates the advanced technologies for facial expression analysis and develops the artificial intelligent systems for practical applications. The first part of this work applies geometric and texture domain feature extractors along with various machine learning techniques to improve FER. Advanced 2D and 3D facial processing techniques such as Edge Oriented Histograms (EOH) and Facial Mesh Distances (FMD) are then fused together using a framework designed to investigate their individual and combined domain performances. Following these tests, the face is then broken down into facial parts using advanced facial alignment and localising techniques. Deep learning in the form of Convolutional Neural Networks (CNNs) is also explored also FER. A novel approach is used for the deep network architecture design, to learn the facial parts jointly, showing an improvement over using the whole face. Joint Bayesian is also adapted in the form of metric learning, to work with deep feature representations of the facial parts. This provides a further improvement over using the deep network alone. Dynamic emotion content is explored as a solution to provide richer information than still images. The motion occurring across the content is initially captured using the Motion History Histogram descriptor (MHH) and is critically evaluated. Based on this observation, several improvements are proposed through extensions such as Average Spatial Pooling Multi-scale Motion History Histogram (ASMMHH). This extension adds two modifications, first is to view the content in different spatial dimensions through spatial pooling; influenced by the structure of CNNs. The other modification is to capture motion at different speeds. Combined, they have provided better performance over MHH, and other popular techniques like Local Binary Patterns - Three Orthogonal Planes (LBP-TOP). Finally, the dynamic emotion content is observed in the feature space, with sequences of images represented as sequences of extracted features. A novel technique called Facial Dynamic History Histogram (FDHH) is developed to capture patterns of variations within the sequence of features; an approach not seen before. FDHH is applied in an end to end framework for applications in Depression analysis and evaluating the induced emotions through a large set of video clips from various movies. With the combination of deep learning techniques and FDHH, state-of-the-art results are achieved for Depression analysis.
|
7 |
Expert object recognition in video /McEuen, Matt. January 2005 (has links)
Thesis (M.S.)--Rochester Institute of Technology, 2005. / Typescript. Includes bibliographical references (p. 91-93).
|
8 |
Facing uncertainty 3D face tracking and learning with generative models /Marks, Tim K. January 2006 (has links)
Thesis (Ph. D.)--University of California, San Diego, 2006. / Title from first page of PDF file (viewed February 27, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 143-148).
|
9 |
Detection of Non-Ferrous Materials with Computer VisionAlmin, Fredrik January 2020 (has links)
In one of the facilities at the Stena Recycling plant in Halmstad, Sweden, about 300 tonnes of metallic waste is processed each day with the aim of sorting out all non-ferrous material. At the end of this process, non-ferrous materials are manually sorted out from the ferrous materials. This thesis investigates a computer vision based approach to identify and localize the non-ferrous materials and eventually automate the sorting.Images were captured of ferrous and non-ferrous materials. The images areprocessed and segmented to be used as annotation data for a deep convolutionalneural segmentation network. Network models have been trained on different kinds and amounts of data. The resulting models are evaluated and tested in ac-cordance with different evaluation metrics. Methods of creating advanced train-ing data by merging imaging information were tested. Experiments with using classifier prediction confidence to identify objects of unknown classes were per-formed. This thesis shows that it is possible to discern ferrous from non-ferrous mate-rial with a purely vision based system. The thesis also shows that it is possible to automatically create annotated training data. It becomes evident that it is possi-ble to create better training data, tailored for the task at hand, by merging image data. A segmentation network trained on more than two classes yields lowerprediction confidence for objects unknown to the classifier.Substituting manual sorting with a purely vision based system seems like aviable approach. Before a substitution is considered, the automatic system needsto be evaluated in comparison to the manual sorting.
|
10 |
Hypothesis Generation for Object Pose Estimation From local sampling to global reasoningMichel, Frank 14 February 2019 (has links)
Pose estimation has been studied since the early days of computer vision. The task of object pose estimation is to determine the transformation that maps an object from it's inherent coordinate system into the camera-centric coordinate system. This transformation describes the translation of the object relative to the camera and the orientation of the object in three dimensional space. The knowledge of an object's pose is a key ingredient in many application scenarios like robotic grasping, augmented reality, autonomous navigation and surveillance. A general estimation pipeline consists of the following four steps: extraction of distinctive points, creation of a hypotheses pool, hypothesis verification and, finally, the hypotheses refinement. In this work, we focus on the hypothesis generation process. We show that it is beneficial to utilize geometric knowledge in this process.
We address the problem of hypotheses generation of articulated objects. Instead of considering each object part individually we model the object as a kinematic chain. This enables us to use the inner-part relationships when sampling pose hypotheses. Thereby we only need K correspondences for objects consisting of K parts. We show that applying geometric knowledge about part relationships improves estimation accuracy under severe self-occlusion and low quality correspondence predictions. In an extension we employ global reasoning within the hypotheses generation process instead of sampling 6D pose hypotheses locally. We therefore formulate a Conditional-Random-Field operating on the image as a whole inferring those pixels that are consistent with the 6D pose. Within the CRF we use a strong geometric check that is able to assess the quality of correspondence pairs. We show that our global geometric check improves the accuracy of pose estimation under heavy occlusion.
|
Page generated in 0.1184 seconds