361 |
Object Part Localization Using Exemplar-based ModelsLiu, Jiongxin January 2017 (has links)
Object part localization is a fundamental problem in computer vision, which aims to let machines understand object in an image as a configuration of parts. As the visual features at parts are usually weak and misleading, spatial models are needed to constrain the part configuration, ensuring that the estimated part locations respect both image cue and shape prior. Unlike most of the state-of-the-art techniques that employ parametric spatial models, we turn to non-parametric exemplars of part configurations. The benefit is twofold: instead of assuming any parametric yet imprecise distributions on the spatial relations of parts, exemplars literally encode such relations present in the training samples; exemplars allow us to prune the search space of part configurations with high confidence.
This thesis consists of two parts: fine-grained classification and object part localization. We first verify the efficacy of parts in fine-grained classification, where we build working systems that automatically identify dog breeds, fish species, and bird species using localized parts on the object. Then we explore multiple ways to enhance exemplar-based models, such that they can be well applied to deformable objects such as bird and human body. Specifically, we propose to enforce pose and subcategory consistency in exemplar matching, thus obtaining more reliable hypotheses of configuration. We also propose part-pair representation that features novel shape composing with multiple promising hypotheses. In the end, we adapt exemplars to hierarchical representation, and design a principled formulation to predict the part configuration based on multi-scale image cues and multi-level exemplars. These efforts consistently improve the accuracy of object part localization.
|
362 |
Geometry and uncertainty in deep learning for computer visionKendall, Alex Guy January 2019 (has links)
Deep learning and convolutional neural networks have become the dominant tool for computer vision. These techniques excel at learning complicated representations from data using supervised learning. In particular, image recognition models now out-perform human baselines under constrained settings. However, the science of computer vision aims to build machines which can see. This requires models which can extract richer information than recognition, from images and video. In general, applying these deep learning models from recognition to other problems in computer vision is significantly more challenging. This thesis presents end-to-end deep learning architectures for a number of core computer vision problems; scene understanding, camera pose estimation, stereo vision and video semantic segmentation. Our models outperform traditional approaches and advance state-of-the-art on a number of challenging computer vision benchmarks. However, these end-to-end models are often not interpretable and require enormous quantities of training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the physical world, and (ii) we cannot know everything from data, our models should be aware of what they do not know. This thesis explores these ideas using concepts from geometry and uncertainty. Specifically, we show how to improve end-to-end deep learning models by leveraging the underlying geometry of the problem. We explicitly model concepts such as epipolar geometry to learn with unsupervised learning, which improves performance. Secondly, we introduce ideas from probabilistic modelling and Bayesian deep learning to understand uncertainty in computer vision models. We show how to quantify different types of uncertainty, improving safety for real world applications.
|
363 |
Detecting irregularity in videos using spatiotemporal volumes.January 2007 (has links)
Li, Yun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2007. / Includes bibliographical references (leaves 68-72). / Abstracts in English and Chinese. / Abstract --- p.I / 摘要 --- p.III / Acknowledgments --- p.IV / List of Contents --- p.VI / List of Figures --- p.VII / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Visual Detection --- p.2 / Chapter 1.2 --- Irregularity Detection --- p.4 / Chapter Chapter 2 --- System Overview --- p.7 / Chapter 2.1 --- Definition of Irregularity --- p.7 / Chapter 2.2 --- Contributions --- p.8 / Chapter 2.3 --- Review of previous work --- p.9 / Chapter 2.3.1 --- Model-based Methods --- p.9 / Chapter 2.3.2 --- Statistical Methods --- p.11 / Chapter 2.4 --- System Outline --- p.14 / Chapter Chapter 3 --- Background Subtraction --- p.16 / Chapter 3.1 --- Related Work --- p.17 / Chapter 3.2 --- Adaptive Mixture Model --- p.18 / Chapter 3.2.1 --- Online Model Update --- p.20 / Chapter 3.2.2 --- Background Model Estimation --- p.22 / Chapter 3.2.3 --- Foreground Segmentation --- p.24 / Chapter Chapter 4 --- Feature Extraction --- p.28 / Chapter 4.1 --- Various Feature Descriptors --- p.29 / Chapter 4.2 --- Histogram of Oriented Gradients --- p.30 / Chapter 4.2.1 --- Feature Descriptor --- p.31 / Chapter 4.2.2 --- Feature Merits --- p.33 / Chapter 4.3 --- Subspace Analysis --- p.35 / Chapter 4.3.1 --- Principal Component Analysis --- p.35 / Chapter 4.3.2 --- Subspace Projection --- p.37 / Chapter Chapter 5 --- Bayesian Probabilistic Inference --- p.39 / Chapter 5.1 --- Estimation of PDFs --- p.40 / Chapter 5.1.1 --- K-Means Clustering --- p.40 / Chapter 5.1.2 --- Kernel Density Estimation --- p.42 / Chapter 5.2 --- MAP Estimation --- p.44 / Chapter 5.2.1 --- ML Estimation & MAP Estimation --- p.44 / Chapter 5.2.2 --- Detection through MAP --- p.46 / Chapter 5.3 --- Efficient Implementation --- p.47 / Chapter 5.3.1 --- K-D Trees --- p.48 / Chapter 5.3.2 --- Nearest Neighbor (NN) Algorithm --- p.49 / Chapter Chapter 6 --- Experiments and Conclusion --- p.51 / Chapter 6.1 --- Experiments --- p.51 / Chapter 6.1.1 --- Outdoor Video Surveillance - Exp. 1 --- p.52 / Chapter 6.1.2 --- Outdoor Video Surveillance - Exp. 2 --- p.54 / Chapter 6.1.3 --- Outdoor Video Surveillance - Exp. 3 --- p.56 / Chapter 6.1.4 --- Classroom Monitoring - Exp.4 --- p.61 / Chapter 6.2 --- Algorithm Evaluation --- p.64 / Chapter 6.3 --- Conclusion --- p.66 / Bibliography --- p.68
|
364 |
Architecture distribuée dédiée aux applications de réalité augmentée mobile / Distributed architected dedicated to mobile augmented reality applicationsChouiten, Mehdi 31 January 2013 (has links)
La réalité augmentée (RA) mobile consiste à faire coexister en temps-réel des mondes virtuel et réel. La mobilité est facilitée par l'utilisation de nouveaux dispositifs de types smartphones, et mini-PC embarquant un certain nombre de capteurs (visuels, inertiels,…). Ces dispositifs disposent toutefois d'une puissance de calcul limitée, qui peut s'avérer critique au vu des applications envisagées. L'une des solutions est de recourir à des mécanismes de distributions pour répartir les calculs sur un ensemble hétérogène de machines (serveurs ou autre terminaux mobiles). L'objectif de cette thèse est de concevoir une architecture logicielle dédiée à la réalité augmentée distribuée et plus particulièrement aux applications distribuées capable de fonctionner sur des réseaux ad-hoc constitués de terminaux hétérogènes déployées au travers d'un réseau dans un premier temps. Dans un deuxième temps, il conviendra de démontrer l'applicabilité de la solution proposée à des applications concrètes et d'explorer différentes possibilités d'exploitation originales de la distribution dans les applications de Réalité Augmentée en mettant l'accent sur la plus value apportée en terme de fonctionnalités ou d'opérations possibles en comparaison avec une solution de Réalité Augmentée classique (non distribuée) et en comparaison avec les performances des environnements dédiés à la RA existants offrant la possibilité de créer des applications distribuées. / Mobile Augmented Reality (AR) consists in achieving to make co-existing virtual and real worlds in real time. Mobility is made easier by the use of new devices such as smartphones and wearable computers or smart objects (ex. Glasses) involving sensors (inertial, visual...). These devices have lower computation capabilities that can be critical to some Augmented Reality applications. One of the solutions is to use distribution mechanisms to distribute processing on several and heterogeneous machines. The goal of this thesis is to design a software architecture dedicated to distributed Augmented Reality and more precisely to distributed applications that can run on ad-hoc networks including heterogeneous terminals deployed in a network. The second part of the thesis is to prove the feasibility and the efficiency of the proposed architecture on real AR applications and explore different original uses of distribution for AR applications. Focusing on the added value in terms of features and possible opérations compared to non-distributed AR applications and compared to existing frameworks offering distributed AR components.
|
365 |
Wavelet based image texture segementation using a modified K-means algorithmNg, Brian Walter. January 2003 (has links) (PDF)
"August, 2003" Bibliography: p. 261-268. In this thesis, wavelet transforms are chosen as the primary analytical tool for texture analysis. Specifically, Dual-Tree Complex Wavelet Transform is applied to the texture segmentation problem. Several possibilities for feature extraction and clustering steps are examined, new schemes being introduced and compared to known techniques.
|
366 |
The detection of 2D image features using local energyRobbins, Benjamin John January 1996 (has links)
Accurate detection and localization of two dimensional (2D) image features (or 'key-points') is important for vision tasks such as structure from motion, stereo matching, and line labeling. 2D image features are ideal for these vision tasks because 2D image features are high in information and yet they occur sparsely in typical images. Several methods for the detection of 2D image features have already been developed. However, it is difficult to assess the performance of these methods because no one has produced an adequate definition of corners that encompasses all types of 2D luminance variations that make up 2D image features. The fact that there does not exist a consensus on the definition of 2D image features is not surprising given the confusion surrounding the definition of 1D image features. The general perception of 1D image features has been that they correspond to 'edges' in an image and so are points where the intensity gradient in some direction is a local maximum. The Sobel [68], Canny [7] and Marr-Hildreth [37] operators all use this model of 1D features, either implicitly or explicitly. However, other profiles in an image also make up valid 1D features, such as spike and roof profiles, as well as combinations of all these feature types. Spike and roof profiles can also be found by looking for points where the rate of change of the intensity gradient is locally maximal, as Canny did in defining a 'roof-detector' in much the same way he developed his 'edge-detector'. While this allows the detection of a wider variety of 1D features profiles, it comes no closer to the goal of unifying these different feature types to an encompassing definition of 1D features. The introduction of the local energy model of image features by Marrone and Owens [45] in 1987 provided a unified definition of 1D image features for the first time. They postulated that image features correspond to points in an image where there is maximal phase congruency in the frequency domain representation of the image. That is, image features correspond to points of maximal order in the phase domain of the image signal. These points of maximal phase congruency correspond to step-edge, roof, and ramp intensity profiles, and combinations thereof. They also correspond to the Mach bands perceived by humans in trapezoidal feature profiles. This thesis extends the notion of phase congruency to 2D image features. As 1D image features correspond to points of maximal 1D order in the phase domain of the image signal, this thesis contends that 2D image features correspond to maximal 2D order in this domain. These points of maximal 2D phase congruency include all the different types of 2D image features, including grey-level corners, line terminations, blobs, and a variety of junctions. Early attempts at 2D feature detection were simple 'corner detectors' based on a model of a grey-level corner in much the same way that early 1D feature detectors were based on a model of step-edges. Some recent attempts have included more complex models of 2D features, although this is basically a more complex a priori judgement of the types of luminance profiles that are to be labeled as 2D features. This thesis develops the 2D local energy feature detector based on a new, unified definition of 2D image features that marks points of locally maximum 2D order in the phase domain representation of the image as 2D image features. The performance of an implementation of 2D local energy is assessed, and compared to several existing methods of 2D feature detection. This thesis also shows that in contrast to most other methods of 2D feature detection, 2D local energy is an idempotent operator. The extension of phase congruency to 2D image features also unifies the detection of image features. 1D and 2D image features correspond to 1D and 2D order in the phase domain respresentation of the image respectively. This definition imposes a hierarchy of image features, with 2D image features being a subset of 1D image features. This ordering of image features has been implied ever since 1D features were used as candidate points for 2D feature detection by Kitchen [28] and others. Local energy enables the extraction of both 1D and 2D image features in a consistent manner; 2D image features are extracted from the 1D image features using the same operations that are used to extract 1D image features from the input image. The consistent approach to the detection of image features presented in this thesis allows the hierarchy of primitive image features to be naturally extended to higher order image features. These higher order image features can then also be extracted from higher order image data using the same hierarchical approach. This thesis shows how local energy can be naturally extended to the detection of 1D (surface) and higher order image features in 3D data sets. Results are presented for the detection of 1D image features in 3D confocal microscope images, showing superior performance to the 3D extension of the Sobel operator [74].
|
367 |
Wavelet based image texture segementation using a modified K-means algorithm / by Brian W. Ng.Ng, Brian Walter January 2003 (has links)
"August, 2003" / Bibliography: p. 261-268. / xxvi, 268 p. : ill. (some col.) ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / In this thesis, wavelet transforms are chosen as the primary analytical tool for texture analysis. Specifically, Dual-Tree Complex Wavelet Transform is applied to the texture segmentation problem. Several possibilities for feature extraction and clustering steps are examined, new schemes being introduced and compared to known techniques. / Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 2003
|
368 |
Analysis of bio-based composites for image segmentation with the aid of gamesInouye, Jennifer A. 25 May 2012 (has links)
A fundamental problem in computer vision is to partition an image into meaningful segments. While image segmentation is required by many applications, the thesis focuses on segmentation of computed tomography (CT) images for analysis and quality control of composite materials. The key research contribution of this thesis is a novel image segmentation framework for including end-users in computation. This represents a departure from the traditional methods, which segment images without considering domain knowledge, and access to user feedback. Given a set of CT images of three different composite materials, we would like to create a database of annotated images for all the regions of interest. The annotated images can be used to check the accuracy of segmentation algorithms. Because of how time consuming and mundane image annotation is for a person to do, we propose to turn this task into a game. The game is aimed at making the annotation task easier, because it engages imagination, creativity, fellowship of all subjects involved. In particular, we are interested in games that can be played on the internet by many people like those in Amazon Turk, so that the broader public can get involved. We create a Game with a Purpose (GWAP) called ESP 2.0 for creating image annotations, and thus enable benchmarking of existing segmentation algorithms on our database. / Graduation date: 2012
|
369 |
The Combinatorics of Heuristic Search Termination for Object Recognition in Cluttered EnvironmentsGrimson, W. Eric L. 01 May 1989 (has links)
Many recognition systems use constrained search to locate objects in cluttered environments. Earlier analysis showed that the expected search is quadratic in the number of model and data features, if all the data comes from one object, but is exponential when spurious data is included. To overcome this, many methods terminate search once an interpretation that is "good enough" is found. We formally examine the combinatorics of this, showing that correct termination procedures dramatically reduce search. We provide conditions on the object model and the scene clutter such that the expected search is quartic. These results are shown to agree with empirical data for cluttered object recognition.
|
370 |
Three-Dimensional Recognition of Solid Objects from a Two-Dimensional ImageHuttenlocher, Daniel Peter 01 October 1988 (has links)
This thesis addresses the problem of recognizing solid objects in the three-dimensional world, using two-dimensional shape information extracted from a single image. Objects can be partly occluded and can occur in cluttered scenes. A model based approach is taken, where stored models are matched to an image. The matching problem is separated into two stages, which employ different representations of objects. The first stage uses the smallest possible number of local features to find transformations from a model to an image. This minimizes the amount of search required in recognition. The second stage uses the entire edge contour of an object to verify each transformation. This reduces the chance of finding false matches.
|
Page generated in 0.0935 seconds