Global ETD Search

21	Advancing large scale object retrieval Arandjelovic, Relja January 2013 (has links) The objective of this work is object retrieval in large scale image datasets, where the object is specified by an image query and retrieval should be immediate at run time. Such a system has a wide variety of applications including object or location recognition, video search, near duplicate detection and 3D reconstruction. The task is very challenging because of large variations in the imaged object appearance due to changes in lighting conditions, scale and viewpoint, as well as partial occlusions. A starting point of established systems which tackle the same task is detection of viewpoint invariant features, which are then quantized into visual words and efficient retrieval is performed using an inverted index. We make the following three improvements to the standard framework: (i) a new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements; (ii) a novel discriminative method for query expansion; (iii) a new feature augmentation method. Scaling up to searching millions of images involves either distributing storage and computation across many computers, or employing very compact image representations on a single computer combined with memory-efficient approximate nearest neighbour search (ANN). We take the latter approach and improve VLAD, a popular compact image descriptor, using: (i) a new normalization method to alleviate the burstiness effect; (ii) vocabulary adaptation to reduce influence of using a bad visual vocabulary; (iii) extraction of multiple VLADs for retrieval and localization of small objects. We also propose a method, SCT, for extremely low bit-rate compression of descriptor sets in order to reduce the memory footprint of ANN. The problem of finding images of an object in an unannotated image corpus starting from a textual query is also considered. Our approach is to first obtain multiple images of the queried object using textual Google image search, and then use these images to visually query the target database. We show that issuing multiple queries significantly improves recall and enables the system to find quite challenging occurrences of the queried object. Current retrieval techniques work only for objects which have a light coating of texture, while failing completely for smooth (fairly textureless) objects best described by shape. We present a scalable approach to smooth object retrieval and illustrate it on sculptures. A smooth object is represented by its imaged shape using a set of quantized semi-local boundary descriptors (a bag-of-boundaries); the representation is suited to the standard visual word based object retrieval. Furthermore, we describe a method for automatically determining the title and sculptor of an imaged sculpture using the proposed smooth object retrieval system. 006
22	Geometric context from single and multiple views Flint, Alexander John January 2012 (has links) In order for computers to interact with and understand the visual world, they must be equipped with reasoning systems that include high–level quantities such as objects, actions, and scenes. This thesis is concerned with extracting such representations of the world from visual input. The first part of this thesis describes an approach to scene understanding in which texture characteristics of the visual world are used to infer scene categories. We show that in the context of a moving camera, it is common to observe images containing very few individually salient image regions, yet overall texture structure often allows our system to derive powerful contextual cues about the environment. Our approach builds on ideas from texture recognition, and we show that our algorithm out–performs the well–known Gist descriptor on several classification tasks. In the second part of this thesis we we are interested in scene understanding in the context of multiple calibrated views of a scene, as might be obtained from a Structure–from–Motion or Simultaneous Localization and Mapping (SLAM) system. Though such systems are capable of localizing the camera robustly and efficiently, the maps produced are typically sparse point-clouds that are difficult to interpret and of little use for higher–level reasoning tasks such as scene understanding or human-machine interaction. In this thesis we begin to address this deficiency, presenting progress towards modeling scenes using semantically meaningful primitives such as floor, wall, and ceiling planes. To this end we adopt the indoor Manhattan representation, which was recently proposed for single–view reconstruction. This thesis presents the first in–depth description and analysis of this model in the literature. We describe a probabilistic model relating photometric features, stereo photo–consistencies, and 3D point clouds to Manhattan scene structure in a Bayesian framework. We then present a fast dynamic programming algorithm that solves exact MAP inference in this model in time linear in image size. We show detailed comparisons with the state–of–the art in both the single– and multiple–view contexts. Finally, we present a framework for learning within the indoor Manhattan hypothesis class. Our system is capable of extrapolating from labelled training examples to predict scene structure for unseen images. We cast learning as a structured prediction problem and show how to optimize with respect to two realistic loss functions. We present experiments in which we learn to recover scene structure from both single and multiple views — from the perspective of our learning algorithm these problems differ only by a change of feature space. This work constitutes one of the most complicated output spaces (in terms of internal constraints) yet considered within a structure prediction framework. 006.37
23	Image Chunking: Defining Spatial Building Blocks for Scene Analysis Mahoney, James V. 01 August 1987 (has links) Rapid judgments about the properties and spatial relations of objects are the crux of visually guided interaction with the world. Vision begins, however, with essentially pointwise representations of the scene, such as arrays of pixels or small edge fragments. For adequate time-performance in recognition, manipulation, navigation, and reasoning, the processes that extract meaningful entities from the pointwise representations must exploit parallelism. This report develops a framework for the fast extraction of scene entities, based on a simple, local model of parallel computation.sAn image chunk is a subset of an image that can act as a unit in the course of spatial analysis. A parallel preprocessing stage constructs a variety of simple chunks uniformly over the visual array. On the basis of these chunks, subsequent serial processes locate relevant scene components and assemble detailed descriptions of them rapidly. This thesis defines image chunks that facilitate the most potentially time-consuming operations of spatial analysis---boundary tracing, area coloring, and the selection of locations at which to apply detailed analysis. Fast parallel processes for computing these chunks from images, and chunk-based formulations of indexing, tracing, and coloring, are presented. These processes have been simulated and evaluated on the lisp machine and the connection machine. machine vision chunking segmentation tracing blobsdetection image understanding visual routines region growing
24	Sparse Methods in Image Understanding and Computer Vision January 2013 (has links) abstract: Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse modeling, data is represented as a sparse linear combination of atoms from a "dictionary" matrix. This dissertation focuses on understanding different aspects of sparse learning, thereby enhancing the use of sparse methods by incorporating tools from machine learning. With the growing need to adapt models for large scale data, it is important to design dictionaries that can model the entire data space and not just the samples considered. By exploiting the relation of dictionary learning to 1-D subspace clustering, a multilevel dictionary learning algorithm is developed, and it is shown to outperform conventional sparse models in compressed recovery, and image denoising. Theoretical aspects of learning such as algorithmic stability and generalization are considered, and ensemble learning is incorporated for effective large scale learning. In addition to building strategies for efficiently implementing 1-D subspace clustering, a discriminative clustering approach is designed to estimate the unknown mixing process in blind source separation. By exploiting the non-linear relation between the image descriptors, and allowing the use of multiple features, sparse methods can be made more effective in recognition problems. The idea of multiple kernel sparse representations is developed, and algorithms for learning dictionaries in the feature space are presented. Using object recognition experiments on standard datasets it is shown that the proposed approaches outperform other sparse coding-based recognition frameworks. Furthermore, a segmentation technique based on multiple kernel sparse representations is developed, and successfully applied for automated brain tumor identification. Using sparse codes to define the relation between data samples can lead to a more robust graph embedding for unsupervised clustering. By performing discriminative embedding using sparse coding-based graphs, an algorithm for measuring the glomerular number in kidney MRI images is developed. Finally, approaches to build dictionaries for local sparse coding of image descriptors are presented, and applied to object recognition and image retrieval. / Dissertation/Thesis / Ph.D. Electrical Engineering 2013 Electrical engineering Computer science Computer Vision Dictionary Learning Image Understanding Machine Learning Optimization Sparse Representations
25	How polarimetry may contribute to understand reflective road scenes : theory and applications / Comment la polarimétrie contribue à comprendre les scènes routières : théorie et applications Wang, Fan 16 June 2016 (has links) Les systèmes d'aide à la conduite (ADAS) visent à automatiser/ adapter/ améliorer les systèmes de transport pour une meilleure sécurité et une conduite plus sûre. Plusieurs thématiques de recherche traitent des problématiques autour des ADAS, à savoir la détection des obstacles, la reconnaissance de formes, la compréhension des images, la stéréovision, etc. La présence des réflexions spéculaires limite l'efficacité et la précision de ces algorithmes. Elles masquent les textures de l'image originale et contribuent à la perte de l'information utile. La polarisation de la lumière traduit implicitement l'information attachée à l'objet, telle que la direction de la surface, la nature de la matière, sa rugosité etc. Dans le contexte des ADAS, l'imagerie polarimétrique pourrait être utilisée efficacement pour éliminer les réflexions parasites des images et analyser d'une manière précise les scènes routières. Dans un premier temps, nous proposons dans cette thèse de supprimer les réflexions spéculaires des images via la polarisation en appliquant une minimisation d'énergie globale. L'information polarimétrique fournit une contrainte qui réduit les distorsions couleurs et produit une image diffuse beaucoup plus améliorée. Nous avons ensuite proposé d'utiliser les images de polarisation comme une caractéristique vu que dans les scènes routières, les hautes réflexions proviennent particulièrement de certains objets telles que les voitures. Les attributs polarimétriques sont utilisés pour la compréhension de la scène et la détection des voitures. Les résultats expérimentaux montrent que, une fois correctement fusionnés avec les caractéristiques couleur, les attributs polarimétriques offrent une information complémentaire qui améliore considérablement les résultats de la détection.Nous avons enfin testé l'imagerie de polarisation pour l'estimation de la carte de disparité. Une méthode d'appariement est proposée et validée d'abord sur une base de données couleur. Ensuite, Une règle de fusion est proposée afin d'utiliser l'imagerie polarimétrique comme une contrainte pour le calcul de la carte de disparité. A partir des différents résultats obtenus, nous avons prouvé le potentiel et la faisabilité d'appliquer l'imagerie de polarisation dans différentes applications liées aux systèmes d’aide à la conduite. / Advance Driver Assistance Systems (ADAS) aim to automate/adapt/enhance trans-portation systems for safety and better driving. Various research topics are emerged to focus around the ADAS, including the object detection and recognition, image understanding, disparity map estimation etc. The presence of the specular highlights restricts the accuracy of such algorithms, since it covers the original image texture and leads to the lost of information. Light polarization implicitly encodes the object related information, such as the surface direction, material nature, roughness etc. Under the context of ADAS, we are inspired to further inspect the usage of polarization imaging to remove image highlights and analyze the road scenes.We firstly propose in this thesis to remove the image specularity through polarization by applying a global energy minimization. Polarization information provides a color constraint that reduces the color distortion of the results. The global smoothness assumption further integrates the long range information in the image and produces an improved diffuse image.We secondly propose to use polarization images as a new feature, since for the road scenes, the high reflection appears only upon certain objects such as cars. Polarization features are applied in image understanding and car detection in two different ways. The experimental results show that, once properly fused with rgb-based features, the complementary information provided by the polarization images improve the algorithm accuracy. We finally test the polarization imaging for depth estimation. A post-aggregation stereo matching method is firstly proposed and validated on a color database. A fusion rule is then proposed to use the polarization imaging as a constraint to the disparity map estimation. From these applications, we proved the potential and the feasibility to apply polariza-tion imaging in outdoor tasks for ADAS. Polarisation Compréhension de la scène Détection des voitures Fusion Carte de disparité Polarization Image understanding Car detection Fusion Disparity
26	COPING WITH LIMITED DATA: MACHINE-LEARNING-BASED IMAGE UNDERSTANDING APPLICATIONS TO FASHION AND INKJET IMAGERY Zhi Li (8067608) 02 December 2019 (has links) <div>Machine learning has been revolutionizing our approach to image understanding problems. However, due to the unique nature of the problem, finding suitable data or learning from limited data properly is a constant challenge. In this work, we focus on building machine learning pipelines for fashion and inkjet image analysis with limited data. </div><div><br></div><div>We first look into the dire issue of missing and incorrect information on online fashion marketplace. Unlike professional online fashion retailers, sellers on P2P marketplaces tend not to provide correct color categorical information, which is pivotal for fashion shopping. Therefore, to assist users to provide correct color information, we aim to build an image understanding pipeline that can extract garment region in the fashion image and match the color of the fashion item to a pre-defined color categories on the fashion marketplace. To cope with the challenges of lack of suitable data, we propose an autonomous garment color extraction system that uses both clustering and semantic segmentation algorithm to extract the identify fashion garments in the image. In addition, a psychophysical experiment is designed to collect human subjects' color naming schema, and a random forest classifier is trained to given close prediction of color label for the fashion item. Our system is able to perform pixel level segmentation on fashion product portraits and parse human body parts and various fashion categories with human presence. </div><div><br></div><div>We also develop an inkjet printing analysis pipeline using pre-trained neural network. Our pipeline is able to learn to perceive print quality, namely high frequency noise level, of the test targets, without intense training. Our research also suggests that in spite of being trained on large scale dataset for object recognition, features generated from neural networks reacts to textural component of the image without any localized features. In addition, we expand our pipeline to printer forensics, and the pipeline is able to identify the printer model by examining the inkjet dot pattern at a microscopic level. Overall, the data-driven computer vision approach presents great value and potential to improve future inkjet imaging technology, even when the data source is limited.</div> machine learning-based image processing applications image understanding color science
27	Data augmentation and image understanding / Datenerweiterung und Bildverständnis Hernandez-Garcia, Alex 18 February 2022 (has links) Interdisciplinary research is often at the core of scientific progress. This dissertation explores some advantageous synergies between machine learning, cognitive science and neuroscience. In particular, this thesis focuses on vision and images. The human visual system has been widely studied from both behavioural and neuroscientific points of view, as vision is the dominant sense of most people. In turn, machine vision has also been an active area of research, currently dominated by the use of artificial neural networks. This work focuses on learning representations that are more aligned with visual perception and the biological vision. For that purpose, I have studied tools and aspects from cognitive science and computational neuroscience, and attempted to incorporate them into machine learning models of vision. A central subject of this dissertation is data augmentation, a commonly used technique for training artificial neural networks to augment the size of data sets through transformations of the images. Although often overlooked, data augmentation implements transformations that are perceptually plausible, since they correspond to the transformations we see in our visual world–changes in viewpoint or illumination, for instance. Furthermore, neuroscientists have found that the brain invariantly represents objects under these transformations. Throughout this dissertation, I use these insights to analyse data augmentation as a particularly useful inductive bias, a more effective regularisation method for artificial neural networks, and as the framework to analyse and improve the invariance of vision models to perceptually plausible transformations. Overall, this work aims to shed more light on the properties of data augmentation and demonstrate the potential of interdisciplinary research. data augmentation image understanding deep learning neural networks visual perception computational neuroscience ddc:004
28	Constraint-Based Interpolation Goggins, Daniel David 22 July 2005 (has links) (PDF) Image reconstruction is the process of converting a sampled image into a continuous one prior to transformation and resampling. This reconstruction can be more accurate if two things are known: the process by which the sampled image was obtained and the general characteristics of the original image. We present a new reconstruction algorithm known as Constraint-Based Interpolation, which estimates the sampling functions found in cameras and analyzes properties of real world images in order to produce quality real-world image magnifications. To accomplish this, Constraint-Based Interpolation uses a sensor model that pushes the pixels in an interpolation to more closely match the data in the sampled image. Real-world image properties are ensured with a level-set smoothing model that smooths "jaggies" and a sharpening model that alleviates blurring. This thesis describes the three models, their methods and constraints. The effects of the various models and constraints are also shown, as well as a human observer test. A variation of a previous interpolation technique, Quad-based Interpolation, and a new metric, gradient weighted contour curvature, is presented and analyzed. computer computer vision image processing magnification graphics image understanding Computer Sciences
29	Visual Geo-Localization and Location-Aware Image Understanding Zamir, Amir Roshan 01 January 2014 (has links) Geo-localization is the problem of discovering the location where an image or video was captured. Recently, large scale geo-localization methods which are devised for ground-level imagery and employ techniques similar to image matching have attracted much interest. In these methods, given a reference dataset composed of geo-tagged images, the problem is to estimate the geo-location of a query by finding its matching reference images. In this dissertation, we address three questions central to geo-spatial analysis of ground-level imagery: 1) How to geo-localize images and videos captured at unknown locations? 2) How to refine the geo-location of already geo-tagged data? 3) How to utilize the extracted geo-tags? We present a new framework for geo-locating an image utilizing a novel multiple nearest neighbor feature matching method using Generalized Minimum Clique Graphs (GMCP). First, we extract local features (e.g., SIFT) from the query image and retrieve a number of nearest neighbors for each query feature from the reference data set. Next, we apply our GMCP-based feature matching to select a single nearest neighbor for each query feature such that all matches are globally consistent. Our approach to feature matching is based on the proposition that the first nearest neighbors are not necessarily the best choices for finding correspondences in image matching. Therefore, the proposed method considers multiple reference nearest neighbors as potential matches and selects the correct ones by enforcing the consistency among their global features (e.g., GIST) using GMCP. Our evaluations using a new data set of 102k Street View images shows the proposed method outperforms the state-of-the-art by 10 percent. Geo-localization of images can be extended to geo-localization of a video. We have developed a novel method for estimating the geo-spatial trajectory of a moving camera with unknown intrinsic parameters in a city-scale. The proposed method is based on a three step process: 1) individual geo-localization of video frames using Street View images to obtain the likelihood of the location (latitude and longitude) given the current observation, 2) Bayesian tracking to estimate the frame location and video's temporal evolution using previous state probabilities and current likelihood, and 3) applying a novel Minimum Spanning Trees based trajectory reconstruction to eliminate trajectory loops or noisy estimations. Thus far, we have assumed reliable geo-tags for reference imagery are available through crowdsourcing. However, crowdsourced images are well known to suffer from the acute shortcoming of having inaccurate geo-tags. We have developed the first method for refinement of GPS-tags which automatically discovers the subset of corrupted geo-tags and refines them. We employ Random Walks to discover the uncontaminated subset of location estimations and robustify Random Walks with a novel adaptive damping factor that conforms to the level of noise in the input. In location-aware image understanding, we are interested in improving the image analysis by putting it in the right geo-spatial context. This approach is of particular importance as the majority of cameras and mobile devices are now being equipped with GPS chips. Therefore, developing techniques which can leverage the geo-tags of images for improving the performance of traditional computer vision tasks is of particular interest. We have developed a location-aware multimodal approach which incorporates business directories, textual information, and web images to identify businesses in a geo-tagged query image. Computer vision localization location aware image understanding geospatial Electrical and Computer Engineering Electrical and Electronics Engineering
30	Deformable lung registration for pulmonary image analysis of MRI and CT scans Heinrich, Mattias Paul January 2013 (has links) Medical imaging has seen a rapid development in its clinical use in assessment of treatment outcome, disease monitoring and diagnosis over the last few decades. Yet, the vast amount of available image data limits the practical use of this potentially very valuable source of information for radiologists and physicians. Therefore, the design of computer-aided medical image analysis is of great importance to imaging in clinical practice. This thesis deals with the problem of deformable image registration in the context of lung imaging, and addresses three of the major challenges involved in this challenging application, namely: designing an image similarity for multi-modal scans or scans of locally changing contrast, modelling of complex lung motion, which includes sliding motion, and approximately globally optimal mathematical optimisation to deal with large motion of small anatomical features. The two most important contributions made in this thesis are: the formulation of a multi-dimensional structural image representation, which is independent of modality, robust to intensity distortions and very discriminative for different image features, and a discrete optimisation framework, based on an image-adaptive graph structure, which enables a very efficient optimisation of large dense displacement spaces and deals well with sliding motion. The derived methods are applied to two different clinical applications in pulmonary image analysis: motion correction for breathing-cycle computed tomography (CT) volumes, and deformable multi-modal fusion of CT and magnetic resonance imaging chest scans. The experimental validation demonstrates improved registration accuracy, a high quality of the estimated deformations, and much lower computational complexity, all compared to several state-of-the-art deformable registration techniques. 616.07548

Search results