Global ETD Search

81	Region detection and matching for object recognition Kim, Jaechul 20 September 2013 (has links) In this thesis, I explore region detection and consider its impact on image matching for exemplar-based object recognition. Detecting regions is important to provide semantically meaningful spatial cues in images. Matching establishes similarity between visual entities, which is crucial for recognition. My thesis starts by detecting regions in both local and object level. Then, I leverage geometric cues of the detected regions to improve image matching for the ultimate goal of object recognition. More specifically, my thesis considers four key questions: 1) how can we extract distinctively-shaped local regions that also ensure repeatability for robust matching? 2) how can object-level shape inform bottom-up image segmentation? 3) how should the spatial layout imposed by segmented regions influence image matching for exemplar-based recognition? and 4) how can we exploit regions to improve the accuracy and speed of dense image matching? I propose novel algorithms to tackle these issues, addressing region-based visual perception from low-level local region extraction, to mid-level object segmentation, to high-level region-based matching and recognition. First, I propose a Boundary Preserving Local Region (BPLR) detector to extract local shapes. My approach defines a novel spanning-tree based image representation whose structure reflects shape cues combined from multiple segmentations, which in turn provide multiple initial hypotheses of the object boundaries. Unlike traditional local region detectors that rely on local cues like color and texture, BPLRs explicitly exploit the segmentation that encodes global object shape. Thus, they respect object boundaries more robustly and reduce noisy regions that straddle object boundaries. The resulting detector yields a dense set of local regions that are both distinctive in shape as well as repeatable for robust matching. Second, building on the strength of the BPLR regions, I develop an approach for object-level segmentation. The key insight of the approach is that objects shapes are (at least partially) shared among different object categories--for example, among different animals, among different vehicles, or even among seemingly different objects. This shape sharing phenomenon allows us to use partial shape matching via BPLR-detected regions to predict global object shape of possibly unfamiliar objects in new images. Unlike existing top-down methods, my approach requires no category-specific knowledge on the object to be segmented. In addition, because it relies on exemplar-based matching to generate shape hypotheses, my approach overcomes the viewpoint sensitivity of existing methods by allowing shape exemplars to span arbitrary poses and classes. For the ultimate goal of region-based recognition, not only is it important to detect good regions, but we must also be able to match them reliably. A matching establishes similarity between visual entities (images, objects or scenes), which is fundamental for visual recognition. Thus, in the third major component of this thesis, I explore how to leverage geometric cues of the segmented regions for accurate image matching. To this end, I propose a segmentation-guided local feature matching strategy, in which segmentation suggests spatial layout among the matched local features within each region. To encode such spatial structures, I devise a string representation whose 1D nature enables efficient computation to enforce geometric constraints. The method is applied for exemplar-based object classification to demonstrate the impact of my segmentation-driven matching approach. Finally, building on the idea of regions for geometric regularization in image matching, I consider how a hierarchy of nested image regions can be used to constrain dense image feature matches at multiple scales simultaneously. Moving beyond individual regions, the last part of my thesis studies how to exploit regions' inherent hierarchical structure to improve the image matching. To this end, I propose a deformable spatial pyramid graphical model for image matching. The proposed model considers multiple spatial extents at once--from an entire image to grid cells to every single pixel. The proposed pyramid model strikes a balance between robust regularization by larger spatial supports on the one hand and accurate localization by finer regions on the other. Further, the pyramid model is suitable for fast coarse-to-fine hierarchical optimization. I apply the method to pixel label transfer tasks for semantic image segmentation, improving upon the state-of-the-art in both accuracy and speed. Throughout, I provide extensive evaluations on challenging benchmark datasets, validating the effectiveness of my approach. In contrast to traditional texture-based object recognition, my region-based approach enables to use strong geometric cues such as shape and spatial layout that advance the state-of-the-art of object recognition. Also, I show that regions' inherent hierarchical structure allows fast image matching for scalable recognition. The outcome realizes the promising potential of region-based visual perception. In addition, all my codes for local shape detector, object segmentation, and image matching are publicly available, which I hope will serve as useful new additions for vision researchers' toolbox. / text Computer vision Object recognition Feature detection Segmentation Image matching Shape
82	Object recognition and pose estimation for nuclear manipulation in nuclear materials handling applications O'Neil, Brian Erick 17 October 2013 (has links) This dissertation advances the capability of autonomous or semiautonomous robotic manipulation systems by providing the tools required to turn depth sensor measurements into a meaningful representation of the objects present in the robot's environment. This process happens in two steps. First, the points from depth imagery are separated into clusters representing individual objects by a Euclidean clustering scheme. Each cluster is then passed to a recognition algorithm that determines what it is, and where it is. This information allows the robot to determine a pose of the object for grasp planning or obstacle avoidance. To accomplish this, the recognition system must extract mathematical representation of each point cluster. To this end, this dissertation presents a new feature descriptor, the Cylindrical Projection Histogram which captures the shape, size, and viewpoint of the object while maintaining invariance to image scale. These features are used to train a classifier which can then determine the label and pose of each cluster identified in a scene. The results are used to inform a probabilistic model of the object, that quantifies uncertainty and allows Bayesian update of the object's label and position. Experimental results on live data show a 97.2% correct recognition rate for a classifier based on the Cylindrical Projection Histogram. This is a significant improvement over another state-of-the art feature that gives an 89.6% recognition rate on the same object set. With statistical filtering over 10 frames, the raw recognition rate improve to 100% and 92.3% respectively. For pose estimation, both features offe rrotational pose estimation performance from 12° to 30°, and pose errors below 1 cm. This work supports deployment of robotic manipulation systems in unstructured glovebox environments in US Department of Energy facilities. The recognition performance of the CPH classifier is adequate for this purpose. The pose estimation performance is sufficient for gross pick-and-place tasks of simple objects, but not sufficient for dexterous manipulation. However, the pose estimation, along with the probabilistic model, support post-recognition pose refinement techniques. / text Computer vision Robotics Nuclear engineering Object recognition Manipulation
83	HOW MANIPULABILITY (GRASPABILITY AND FUNCTIONAL USAGE) INFLUENCES OBJECT IDENTIFICATION Salmon, Joshua 25 June 2013 (has links) In our environment we do two things with objects: identify them, and act on them. Perhaps not coincidentally, research has shown that the brain appears to have two distinct visual streams, one that is engaged during the identification of objects, and one that is associated with action. Although these visual streams are distinct, there has been increasing interest in how the action and identification systems interact during grasping and identification tasks. In particular, the current research explored the role that previous motor experience with familiar manipulable objects might have on the time it takes healthy participants to identify these objects (relative to non-manipulable objects). Furthermore, previous research has shown that there are multiple, computationally and neuro-anatomically different, action systems. The current research was particularly interested in the action systems involved in 1) grasping, and 2) functionally using an object. Work began by developed a new stimulus set of black & white photographs of manipulable and non-manipulable objects, and collecting ‘graspability’ and ‘functional usage’ ratings (chapter 2). This stimulus set was then used to show that high manipulability was related to faster naming but slower categorization (chapter 3). In chapter 4, the nature of these effects was explored by extending a computational model by Yoon, Heinke and Humphreys (2002). Results from chapter 5 indicated independent roles of graspability and functional usage during tasks that required identification of objects presented either with or without a concurrent mask. Specifically, graspaility effects were larger for items that were not masked; and functional use effects were larger for items that were masked. Finally, chapter 6 indicated that action effects during identification tasks are partly based on how realistic the depictions of the objects are. That is, results from chapter 6 indicated the manipulability effects are larger for photographs than they are for line-drawings of the same objects. These results have direct implications for the design of future identification tasks, but, more broadly, they speak to the interactive nature of the human mind: Action representations can be invoked and measured during simple identification tasks, even where acting on the object is not required. / Manuscript-based dissertation. One introductory chapter, one concluding chapter, and five manuscripts (seven chapters in total). object recognition identification action manipulable manipulability naming categorization photographs
84	Biologically-inspired machine vision Tsitiridis, Aristeidis January 2013 (has links) This thesis summarises research on the improved design, integration and expansion of past cortex-like computer vision models, following biologically-inspired methodologies. By adopting early theories and algorithms as a building block, particular interest has been shown for algorithmic parameterisation, feature extraction, invariance properties and classification. Overall, the major original contributions of this thesis have been: 1. The incorporation of a salient feature-based method for semantic feature extraction and refinement in object recognition. 2. The design and integration of colour features coupled with the existing morphological-based features for efficient and improved biologically-inspired object recognition. 3. The introduction of the illumination invariance property with colour constancy methods under a biologically-inspired framework. 4. The development and investigation of rotation invariance methods to improve robustness and compensate for the lack of such a mechanism in the original models. 5. Adaptive Gabor filter design that captures texture information, enhancing the morphological description of objects in a visual scene and improving the overall classification performance. 6. Instigation of pioneering research on Spiking Neural Network classification for biologically-inspired vision. Most of the above contributions have also been presented in two journal publications and five conference papers. The system has been fully developed and tested in computers using MATLAB under a variety of image datasets either created for the purposes of this work or obtained from the public domain. 006.37
85	Accounting for Aliasing in Correlation Filters : Zero-Aliasing and Partial-Aliasing Correlation Filters Fernandez, Joseph A. 01 May 2014 (has links) Correlation filters (CFs) are well established and useful tools for a variety of tasks in signal processing and pattern recognition, including automatic target recognition and tracking, biometrics, landmark detection, and human action recognition. Traditionally, CFs have been designed and implemented efficiently in the frequency domain using the discrete Fourier transform (DFT). However, the element-wise multiplication of two DFTs in the frequency domain corresponds to a circular correlation, which results in aliasing (i.e., distortion) in the correlation output. Prior CF research has largely ignored these aliasing effects by making the assumption that linear correlation is approximated by circular correlation. In this work, we investigate in detail the topic of aliasing in CFs. First, we illustrate that the current formulation of CFs in the frequency domain is inherently flawed, as it unintentionally assumes circular correlation during the design phase. This means that existing CFs are not truly optimal. We introduce zero-aliasing correlation filters (ZACFs) which fix this formulation issue by ensuring that each CF formulation problem corresponds to a linear correlation rather than a circular correlation. By adopting the ZACF design modifications, we show that the recognition and localization performance of conventional CF designs can be significantly improved. We demonstrate these benefits using a variety of data sets and present solutions to the computational challenges associated with computing ZACFs. After a CF is designed, it is used for object recognition by correlating it with a test signal. We investigate the use of the well-known overlap-add (OLA) and overlap-save (OLS) algorithms to improve the computation and memory requirements of this correlation operation for high dimensional applications (e.g., video). Through this process, we highlight important tradeoffs between these two algorithms that have previously been undocumented. To improve the computation and memory requirements of OLA and OLS, we introduce a new block filtering scheme, denoted partial-aliasing OLA (PAOLA) that intentionally introduces aliasing into the output correlation. This aliasing causes conventional CFs to perform poorly. To remedy this, we introduce partial-aliasing correlation filters (PACFs), which are specifically designed to minimize this aliasing. We demonstrate through numerical results that PACFs outperform conventional CFs in the presence of aliasing. correlation filter circular correlation aliasing object recognition automatic target recognition
86	Multiple-Cue Object Recognition for Interactionable Objects Aboutalib, Sarah 08 December 2010 (has links) Category-level object recognition is a fundamental capability for the potential use of robots in the assistance of humans in useful tasks. There have been numerous vision-based object recognition systems yielding fast and accurate results in constrained environments. However, by depending on visual cues, these techniques are susceptible to object variations in size, lighting, rotation, and pose, all of which cannot be avoided in real video data. Thus, the task of object recognition still remains very challenging. My thesis work builds upon the fact that robots can observe humans interacting with the objects in their environment. We refer to the set of objects, which can be involved in the interaction as `interactionable' objects. The interaction of humans with the `interactionable' objects provides numerous nonvisual cues to the identity of objects. In this thesis, I will introduce a flexible object recognition approach called Multiple-Cue Object Recognition (MCOR) that can use multiple cues of any predefined type, whether they are cues intrinsic to the object or provided by observation of a human. In pursuit of this goal, the thesis will provide several contributions: A representation for the multiple cues including an object definition that allows for the flexible addition of these cues; Weights that reflect the various strength of association between a particular cue and a particular object using a probabilistic relational model, as well as object displacement values for localizing the information in an image; Tools for defining visual features, segmentation, tracking, and the values for the non-visual cues; Lastly, an object recognition algorithm for the incremental discrimination of potential object categories. We evaluate these contributions through a number of methods including simulation to demonstrate the learning of weights and recognition based on an analytical model, an analytical model that demonstrates the robustness of the MCOR framework, and recognition results on real video data using a number of datasets including video taken from a humanoid robot (Sony QRIO), video captured from a meeting setting, scripted scenarios from outside universities, and unscripted TV cooking data. Using the datasets, we demonstrate the basic features of the MCOR algorithm including its ability to use multiple cues of different types. We demonstrate the applicability of MCOR to an outside dataset. We show that MCOR has better recognition results over vision-only recognition systems, and show that performance only improves with the addition of more cue types. Computer Vision Multi-modal Human Interaction Object Recognition
87	Searching for the Visual Components of Object Perception Leeds, Daniel Demeny 01 July 2013 (has links) The nature of visual properties used for object perception in mid- and high-level vision areas of the brain is poorly understood. Past studies have employed simplistic stimuli probing models limited in descriptive power and mathematical under-pinnings. Unfortunately, pursuit of more complex stimuli and properties requires searching through a wide, unknown space of models and of images. The difficulty of this pursuit is exacerbated in brain research by the limited number of stimulus responses that can be collected for a given human subject over the course of an experiment. To more quickly identify complex visual features underlying cortical object perception, I develop, test, and use a novel method in which stimuli for use in the ongoing study are selected in realtime based on fMRI-measured cortical responses to recently-selected and displayed stimuli. A variation of the simplex method controls this ongoing selection as part of a search in visual space for images producing maximal activity — measured in realtime — in a pre-determined 1 cm3 brain region. I probe cortical selectivities during this search using photographs of real-world objects and synthetic “Fribble” objects. Real-world objects are used to understand perception of naturally-occurring visual properties. These objects are characterized based on feature descriptors computed from the scale invariant feature transform (SIFT), a popular computer vision method that is well established in its utility for aiding in computer object recognition and that I recently found to account for intermediate-level representations in the visual object processing pathway in the brain. Fribble objects are used to study object perception in an arena in which visual properties are well defined a priori. They are constructed from multiple well-defined shapes, and variation of each of these component shapes produces a clear space of visual stimuli. I study the behavior of my novel realtime fMRI search method, to assess its value in the investigation of cortical visual perception, and I study the complex visual properties my method identifies as highly-activating selected brain regions in the visual object processing pathway. While there remain further technical and biological challenges to overcome, my method uncovers reliable and interesting cortical properties for most subjects — though only for selected searches performed for each subject. I identify brain regions selective for holistic and component object shapes and for varying surface properties, providing examples of more precise selectivities within classes of visual properties previously associated with cortical object representation. I also find examples of “surround suppression,” in which cortical activity is inhibited upon viewing stimuli slightly deviation from the visual properties preferred by a brain region, expanding on similar observations at lower levels of vision. Realtime fMRI optimization object recognition computational modeling intermediate feature representation
88	Article identification for inventory list in a warehouse environment Gao, Yang January 2014 (has links) In this paper, an object recognition system has been developed that uses local image features. In the system, multiple classes of objects can be recognized in an image. This system is basically divided into two parts: object detection and object identification. Object detection is based on SIFT features, which are invariant to image illumination, scaling and rotation. SIFT features extracted from a test image are used to perform a reliable matching between a database of SIFT features from known object images. Method of DBSCAN clustering is used for multiple object detection. RANSAC method is used for decreasing the amount of false detection. Object identification is based on 'Bag-of-Words' model. The 'BoW' model is a method based on vector quantization of SIFT descriptors of image patches. In this model, K-means clustering and Support Vector Machine (SVM) classification method are applied. Object recognition SIFT feature Feature matching DBSCAN RANSAC Bag of Words
89	"Flobject" Analysis: Learning about Static Images from Motion Li, Patrick 14 December 2011 (has links) A critical practical problem in the field of object recognition is an insufficient number of labeled training images, as manually labeling images is a time consuming task. For this reason, unsupervised learning techniques are used to take advantage of unlabeled training images to extract image representations that are useful for classification. However, unsupervised learning is in general difficult. We propose simplifying the unsupervised training problem considerably by taking the advance of motion information. The output of our method is a model that can generate a vector representation from any static image. However, the model is trained using images with additional motion information. To demonstrate the flobject analysis framework, we extend the latent Dirichlet allocation model to account for word-specific flow vectors. We show that the static image representations extracted using our model achieve higher classification rates and better generalization than standard topic models, spatial pyramid matching, and Gist descriptors. Machine Learning Computer Vision Object Recognition Flobjects 0800
90	"Flobject" Analysis: Learning about Static Images from Motion Li, Patrick 14 December 2011 (has links) A critical practical problem in the field of object recognition is an insufficient number of labeled training images, as manually labeling images is a time consuming task. For this reason, unsupervised learning techniques are used to take advantage of unlabeled training images to extract image representations that are useful for classification. However, unsupervised learning is in general difficult. We propose simplifying the unsupervised training problem considerably by taking the advance of motion information. The output of our method is a model that can generate a vector representation from any static image. However, the model is trained using images with additional motion information. To demonstrate the flobject analysis framework, we extend the latent Dirichlet allocation model to account for word-specific flow vectors. We show that the static image representations extracted using our model achieve higher classification rates and better generalization than standard topic models, spatial pyramid matching, and Gist descriptors. Machine Learning Computer Vision Object Recognition Flobjects 0800

Search results