Global ETD Search

161	3d Object Recognition Using Scale Space Of Curvatures Akagunduz, Erdem 01 January 2011 (has links) (PDF) In this thesis, a generic, scale and resolution invariant method to extract 3D features from 3D surfaces, is proposed. Features are extracted with their scale (metric size and resolution) from range images using scale-space of 3D surface curvatures. Different from previous scale-space approaches / connected components within the classified curvature scale-space are extracted as features. Furthermore, scales of features are extracted invariant of the metric size or the sampling of the range images. Geometric hashing is used for object recognition where scaled, occluded and both scaled and occluded versions of range images from a 3D object database are tested. The experimental results under varying scale and occlusion are compared with SIFT in terms of recognition capabilities. In addition, to emphasize the importance of using scale space of curvatures, the comparative recognition results obtained with single scale features are also presented.
162	Channel-Coded Feature Maps for Computer Vision and Machine Learning Jonsson, Erik January 2008 (has links) <p>This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function.</p><p>The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated.</p><p>This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented.</p><p>Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration.</p><p>Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed.</p><p>All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system.</p> computer vision machine learning object recognition pose estimation Image analysis Bildanalys
163	Terrain Object recognition and Context Fusion for Decision Support Lantz, Fredrik January 2008 (has links) <p>A laser radar can be used to generate 3D data about the terrain in a very high resolution. The development of new support technologies to analyze these data is critical to the effective and efficient use of these data in decision support systems, due to the large amounts of data that are generated. Adequate technology in this regard is currently not available and development of new methods and algorithms to this end are important goals of this work.</p><p>A semi-qualitative data structure for terrain surface modelling has been developed. A categorization and triangulation process has also been developed to substitute the high resolution 3D model for this data structure. The qualitative part of the structure can be used for detection and recognition of terrain features. The quantitative part of the structure is, together with the qualitative part, used for visualization of the terrain surface. Substituting the 3D model for the semi-qualitative structures means that a data reduction is performed.</p><p>A number of algorithms for detection and recognition of different terrain objects have been developed. The algorithms use the qualitative part of the previously developed semi-qualitative data structure as input. The taken approach is based on matching of symbols and syntactic pattern recognition. Results regarding the accuracy of the implemented algorithms for detection and recognition of terrain objects are visualized.</p><p>A further important goal has been to develop a methodology for determining driveability using 3D-data and other geographic data. These data must be fused with vehicle data to determine the properties of the terrain context of our operations with respect to driveability. This fusion process is therefore called context fusion. The recognized terrain objects are used together with map data in this method. The uncertainty associated with the imprecision of the data has been taken into account as well.</p> / Report code: LiU-Tek-Lic-2008:29. nformation Fusion Terrain Elevation Model Driveability Context Fusion Terrain Object Recognition Computer science Datalogi
164	Security with visual understanding : Kinect human recognition capabilities applied in a home security system / Kinect human recognition capabilities applied in a home security system Fluckiger, S Joseph 08 August 2012 (has links) Vision is the most celebrated human sense. Eighty percent of the information humans receive is obtained through vision. Machines capable of capturing images are now ubiquitous, but until recently, they have been unable to recognize objects in the images they capture. In effect, machines have been blind. This paper explores the revolutionary new capability of a camera to recognize whether a human is present in an image and take detailed measurements of the person’s dimensions. It explains how the hardware and software of the camera work to provide this remarkable capability in just 200 milliseconds per image. To demonstrate these capabilities, a home security application has been built called Security with Visual Understanding (SVU). SVU is a hardware/software solution that detects a human and then performs biometric authentication by comparing the dimensions of the seen person against a database of known people. If the person is unrecognized, an alarm is sounded, and a picture of the intruder is sent via SMS text message to the home owner. Analysis is performed to measure the tolerance of the SVU algorithm for differentiating between two people based on their body dimensions. / text Machine vision Human pose recognition Object recognition Kinect Security camera SVU
165	Validation of an animal model of cognitive dysfunction associated with schizophrenia : development and validation of the novel object recognition task using behavioural manipulations and psychotomimetic dosing regimens to induce cognitive deficits of relevance to schizophrenia in hooded-Lister rats Grayson, Ben January 2012 (has links) Phencyclidine (PCP) is a non-competitive NMDA receptor antagonist that has been shown to induce schizophrenia-like psychotic symptoms that are clinically indistinguishable from schizophrenia in patients. When administered to rodents, PCP produces an array of behaviours that are characteristic of schizophrenia. Schizophrenia is associated with continual and treatment resistant cognitive deficits which are now recognised as a core feature of the disease. The aim of the studies reported in chapter 3 were to establish a set of objects with equal preference in the NOR (novel object recognition) test. Furthermore, the inter-trial-interval (ITI) of the NOR test was investigated in an attempt to elucidate the effects of time and location of the rats during the ITI on the cognitive impairments following sub-chronic PCP treatment. The experiments in chapter 4 were designed to compare the performance of male and female rats in the NOR test following treatment with acute d-amphetamine (d-amph), PCP and sub-chronic PCP treatment. In chapter 5, validation of the cognitive deficits induced by sub-chronic PCP treatment was assessed using carefully selected pharmacological agents. The aim of the studies in chapter 6 was to determine the effects of isolation rearing on cognitive performance in the NOR test following increasing ITIs. Additionally, the sensitivity of isolation reared rats compared to social controls following acute administration of PCP and d-amph was assessed using the NOR test. Studies in chapter 8 utilised the 16-holeboard maze to determine the effects of acute treatment with d-amphetamine, PCP and scopolamine on working memory in the rat. NOR is a visual learning and memory test that measures recognition memory which is impaired in patients with schizophrenia. Studies presented in this thesis demonstrate the importance of careful pilot studies when selecting objects for use in the NOR test. Initial studies in sub-chronic PCP (2 mg/kg for 7 days followed by 7 days drug free) treated female hooded-Lister rats revealed a preference of the rats for the wooden cone object; subsequently this object was eliminated from further NOR experiments. Sub-chronic PCP treated rats were found to be highly susceptible to the disruptive influence of distraction during the short 1 min inter-trial-interval (ITI) in the NOR test. These results are consistent with clinical findings of the effects of distraction on cognition in schizophrenia patients. Following the initial validation experiments, a 1 min ITI in the home cage was selected for all subsequent NOR studies. Further experiments provided evidence to confirm that information presented in the acquisition trial is encoded but not retained in the retention trial of the NOR test by IV PCP-treated rats. Male rats were less sensitive to the recognition memory deficits induced by acute treatment with PCP and d-amphetamine compared with females. Following sub-chronic PCP treatment, both males and females showed object recognition deficits, however, the impairments were more robust in female rats. Female rats were therefore selected for all subsequent experiments. Pharmacological validation was carried out using carefully selected agents which were assessed for their ability to restore the sub-chronic PCP induced cognitive deficit in the object recognition test. It was found that the classical antipsychotic agents haloperidol and fluphenazine, the benzodiazepine anxiolytic chlordiazepoxide and the SSRI antidepressant fluoxetine were ineffective. Further studies showed that the atypical antipsychotic agents, clozapine and risperidone, the analeptic agent modafinil, the nAChR full agonist nicotine, and full agonist and positive allosteric modulator of the α7 nAChR (PNU-282987 and PNU120596 respectively) reversed the recognition memory deficit induced by sub-chronic PCP treatment in the NOR test. Isolation rearing of rats at weaning is an environmental stressor that has relevance for modelling the symptomatology and pathology of schizophrenia. Isolates had a significantly increased locomotor activity (LMA) response to a novel environment and enhanced sensitivity to time delay-induced recognition memory deficits, compared with their socially reared counterparts. Isolates were less sensitive to an acute PCP-induced recognition memory deficit but more sensitive to an acute d-amphetamine induced recognition memory deficit in the NOR test compared to social controls. Preliminary results from the 16-holeboard maze experiments reveal that acute administration of the mAChR antagonist scopolamine, d-amphetamine, PCP and sub-chronic PCP treatment reduced working memory scores compared to vehicle treated controls. Taken together, these findings suggest that sub-chronic treatment with PCP induces cognitive deficits in behavioural tests of relevance to cognition associated with schizophrenia. This may allow the detection of novel pharmacotherapies to alleviate these cognitive deficits and exploration of the nature of cognitive disturbances in these patients. 615.1
166	Top-Down Bayesian Modeling and Inference for Indoor Scenes Del Pero, Luca January 2013 (has links) People can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods. Bayesian inference Computer Vision Indoor scenes Object recognition Scene understanding Computer Science 3D reconstruction
167	Learning 3-D Models of Object Structure from Images Schlecht, Joseph January 2010 (has links) Recognizing objects in images is an effortless task for most people.Automating this task with computers, however, presents a difficult challengeattributable to large variations in object appearance, shape, and pose. The problemis further compounded by ambiguity from projecting 3-D objects into a 2-D image.In this thesis we present an approach to resolve these issues by modeling objectstructure with a collection of connected 3-D geometric primitives and a separatemodel for the camera. From sets of images we simultaneously learn a generative,statistical model for the object representation and parameters of the imagingsystem. By learning 3-D structure models we are going beyond recognitiontowards quantifying object shape and understanding its variation.We explore our approach in the context of microscopic images of biologicalstructure and single view images of man-made objects composed of block-likeparts, such as furniture. We express detected features from both domains asstatistically generated by an image likelihood conditioned on models for theobject structure and imaging system. Our representation of biological structurefocuses on Alternaria, a genus of fungus comprising ellipsoid and cylindershaped substructures. In the case of man-made furniture objects, we representstructure with spatially contiguous assemblages of blocks arbitrarilyconstructed according to a small set of design constraints.We learn the models with Bayesian statistical inference over structure andcamera parameters per image, and for man-made objects, across categories, suchas chairs. We develop a reversible-jump MCMC sampling algorithm to exploretopology hypotheses, and a hybrid of Metropolis-Hastings and stochastic dynamicsto search within topologies. Our results demonstrate that we can infer both 3-Dobject and camera parameters simultaneously from images, and that doing soimproves understanding of structure in images. We further show how 3-D structuremodels can be inferred from single view images, and that learned categoryparameters capture structure variation that is useful for recognition. 3-D Object recognition Computer vision Machine learning MCMC sampling Statistical inference
168	Predictive eyes precede retrieval : visual recognition as hypothesis testing Holm, Linus January 2007 (has links) Does visual recognition entail verifying an idea about what is perceived? This question was addressed in the three studies of this thesis. The main hypothesis underlying the investigation was that visual recognition is an active process involving hypothesis testing. Recognition of faces (Study 1), scenes (Study 2) and objects (Study 3) was investigated using eye movement registration as a window on the recognition process. In Study 1, a functional relationship between eye movements and face recognition was established. Restricting the eye movements reduced recognition performance. In addition, perceptual reinstatement as indicated by eye movement consistency across study and test was related to recollective experience at test. Specifically, explicit recollection was related to higher eye movement consistency than familiarity-based recognition and false rejections (Studies 1-2). Furthermore, valid expectations about a forthcoming stimulus scene produced eye movements which were more similar to those of an earlier study episode, compared to invalid expectations (Study 2). In Study 3 participants recognized fragmented objects embedded in nonsense fragments. Around 8 seconds prior to explicit recognition, participants began to fixate the object region rather than a similar control region in the stimulus pictures. Before participants’ indicated awareness of the object, they fixated it with an average of 9 consecutive fixations. Hence, participants were looking at the object as if they had recognized it before they became aware of its identity. Furthermore, prior object information affected eye movement sampling of the stimulus, suggesting that semantic memory was involved in guiding the eyes during object recognition even before the participants were aware of its presence. Collectively, the studies support the view that gaze control is instrumental to visual recognition performance and that visual recognition is an interactive process between memory representation and information sampling. declarative memory face perception object recognition scene recognition eye movements visual awareness recollection familiarity Psychology Psykologi
169	Biologically-Based Interactive Neural Network Models for Visual Attention and Object Recognition Saifullah, Mohammad January 2012 (has links) The main focus of this thesis is to develop biologically-based computational models for object recognition. A series of models for attention and object recognition were developed in the order of increasing functionality and complexity. These models are based on information processing in the primate brain, and specially inspired from the theory of visual information processing along the two parallel processing pathways of the primate visual cortex. To capture the true essence of incremental, constraint satisfaction style processing in the visual system, interactive neural networks were used for implementing our models. Results from eye-tracking studies on the relevant visual tasks, as well as our hypothesis regarding the information processing in the primate visual system, were implemented in the models and tested with simulations. As a first step, a model based on the ventral pathway was developed to recognize single objects. Through systematic testing, structural and algorithmic parameters of these models were fine tuned for performing their task optimally. In the second step, the model was extended by considering the dorsal pathway, which enables simulation of visual attention as an emergent phenomenon. The extended model was then investigated for visual search tasks. In the last step, we focussed on occluded and overlapped object recognition. A couple of eye-tracking studies were conducted in this regard and on the basis of the results we made some hypotheses regarding information processing in the primate visual system. The models were further advanced on the lines of the presented hypothesis, and simulated on the tasks of occluded and overlapped object recognition. On the basis of the results and analysis of our simulations we have further found that the generalization performance of interactive hierarchical networks improves with the addition of a small amount of Hebbian learning to an otherwise pure error-driven learning. We also concluded that the size of the receptive fields in our networks is an important parameter for the generalization task and depends on the object of interest in the image. Our results show that networks using hard coded feature extraction perform better than the networks that use Hebbian learning for developing feature detectors. We have successfully demonstrated the emergence of visual attention within an interactive network and also the role of context in the search task. Simulation results with occluded and overlapped objects support our extended interactive processing approach, which is a combination of the interactive and top-down approach, to the segmentation-recognition issue. Furthermore, the simulation behavior of our models is in line with known human behavior for similar tasks. In general, the work in this thesis will improve the understanding and performance of biologically-based interactive networks for object recognition and provide a biologically-plausible solution to recognition of occluded and overlapped objects. Moreover, our models provide some suggestions for the underlying neural mechanism and strategies behind biological object recognition. Biologically-Based Models Object Recognition Visual Attention Interactive Neural Network Occlusion Overlapping
170	Contributions to a fast and robust object recognition in images Revaud, Jérôme 27 May 2011 (has links) (PDF) In this thesis, we first present a contribution to overcome this problem of robustness for the recognition of object instances, then we straightly extend this contribution to the detection and localization of classes of objects. In a first step, we have developed a method inspired by graph matching to address the problem of fast recognition of instances of specific objects in noisy conditions. This method allows to easily combine any types of local features (eg contours, textures ...) less affected by noise than keypoints, while bypassing the normalization problem and without penalizing too much the detection speed. Unlike other methods based on a global rigid transformation, our approach is robust to complex deformations such as those due to perspective or those non-rigid inherent to the model itself (e.g. a face, a flexible magazine). Our experiments on several datasets have showed the relevance of our approach. It is overall slightly less robust to occlusion than existing approaches, but it produces better performances in noisy conditions. In a second step, we have developed an approach for detecting classes of objects in the same spirit as the bag-of-visual-words model. For this we use our cascaded micro-classifiers to recognize visual words more distinctive than the classical words simply based on visual dictionaries. Training is divided into two parts: First, we generate cascades of micro-classifiers for recognizing local parts of the model pictures and then in a second step, we use a classifier to model the decision boundary between images of class and those of non-class. We show that the association of classical visual words (from keypoints patches) and our disctinctive words results in a significant improvement. The computation time is generally quite low, given the structure of the cascades that minimizes the detection time and the form of the classifier is extremely fast to evaluate. [INFO:INFO_OH] Computer Science/Other Computer science Pattern recognition Specific object recognition Graph matching Optimization Mobile robotics

Search results