121 |
Reconnaissance de forme dans cybersécuritéVashaee, Ali January 2014 (has links)
Résumé : L’expansion des images sur le Web a provoqué le besoin de mettre en œuvre des méthodes de classement d’images précises pour plusieurs applications notamment la cybersécurité. L’extraction des caractéristiques est une étape primordiale dans la procédure du classement des images vu son impact direct sur la performance de la catégorisation finale des images et de leur classement. L’objectif de cette étude est d’analyser l’état de l’art des différents espaces de caractéristiques pour évaluer leur
efficacité dans le contexte de la reconnaissance de forme pour les applications de
cybersécurité. Les expériences ont montré que les descripteurs de caractéristiques
HOG et GIST ont une performance élevée. Par contre, cette dernière se dégrade
face aux transformations géométriques des objets dans les images. Afin d’obtenir
des systèmes de classement d’image plus fiables basés sur ces descripteurs, nous proposons deux méthodes. Dans la première méthode (PrMI) nous nous concentrons
sur l’amélioration de la propriété d’invariance du système de classement par tout
en maintenant la performance du classement. Dans cette méthode, un descripteur
invariant par rapport à la rotation dérivé de HOG est utilisé (RIHOG) dans une technique de recherche "top-down" pour le classement des images. La méthode (PrMI)
proposée donne non seulement une robustesse face aux transformations géométriques des objets, mais aussi une performance élevée similaire à celle de HOG. Elle est aussi efficace en terme de coût de calcul avec une complexité de l’ordre de O(n). Dans la deuxième méthode proposée (PrMII), nous nous focalisons sur la performance du classement en maintenant la propriété d’invariance du système de classement. Les objets sont localisés d’une façon invariante aux changement d’échelle dans l’espace de caractéristiques de covariance par région. Ensuite elles sont décrites avec les descripteurs HOG et GIST. Cette méthode procure une performance de classement meilleure en comparaison avec les méthodes implémentées dans l’étude et quelques méthodes CBIR expérimentées sur les données Caltech-256 dans les travaux antérieurs. // Abstract : The tremendous growth of accessible online images (Web images), provokes the need to perform accurate image ranking for applications like cyber-security. Feature extraction is an important step in image ranking procedures due to its direct impact on final categorization and ranking performance. The goal of this study is to analyse the state of the art feature spaces in order to evaluate their efficiency in the abject recognition context and image ranking framework for cyber-security applications. Experiments show that HOG and GIST feature descriptors exhibit high ranking performance. Whereas, these features are not rotation and scale invariant. In order to obtain more reliable image ranking systems based on these feature spaces, we proposed two methods. In the first method (PrMI) we focused on improving the invariance property of the ranking system while maintaining the ranking performance. In this method, a rotation invariant feature descriptor is derived from HOC (RIHOC). This descriptor is used in a top-down searching technique to caver the scale variation of the abjects in the images. The proposed method (PrMI) not only pro vides robustness against geometrical transformations of objects but also provides high ranking performance close to HOC performance. It is also computationally efficient with complexity around O(n). In the second proposed method (PrMII) we focused on the ranking performance while maintaining the invariance property of the ranking system. Objects are localized in a scale invariant fashion under a Region Covariance feature space, then they are described using HOC and CIST features. Finally to ob tain better evaluation over the performance of proposed method we compare it with existing research in the similar domain(CBIR) on Caltech-256. Proposed methods provide highest ranking performance in comparison with implemented methods in this study, and some of the CBIR methods on Caltech-256 dataset in previous works.
|
122 |
Channel-Coded Feature Maps for Computer Vision and Machine LearningJonsson, Erik January 2008 (has links)
This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function. The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated. This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented. Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration. Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method -- that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed. All presented methods have been evaluated experimentally. The work has been conducted within the cognitive systems research project COSPAL funded by EC FP6, and much of the contents has been put to use in the final COSPAL demonstrator system.
|
123 |
Computer vision-based detection of fire and violent actions performed by individuals in videos acquired with handheld devicesMoria, Kawther 28 July 2016 (has links)
Advances in social networks and multimedia technologies greatly facilitate the recording and sharing of video data on violent social and/or political events via In- ternet. These video data are a rich source of information in terms of identifying the individuals responsible for damaging public and private property through vio- lent behavior. Any abnormal, violent individual behavior could trigger a cascade of undesirable events, such as vandalism and damage to stores and public facilities. When such incidents occur, investigators usually need to analyze thousands of hours of videos recorded using handheld devices in order to identify suspects. The exhaus- tive manual investigation of these video data is highly time and resource-consuming. Automated detection techniques of abnormal events and actions based on computer vision would o↵er a more e cient solution to this problem.
The first contribution described in this thesis consists of a novel method for fire detection in riot videos acquired with handheld cameras and smart-phones. This is a typical example of computer vision in the wild, where we have no control over the data acquisition process, and where the quality of the video data varies considerably. The proposed spatial model is based on the Mixtures of Gaussians model and exploits color adjacency in the visible spectrum of incandescence. The experimental results demonstrate that using this spatial model in concert with motion cues leads to highly accurate results for fire detection in noisy, complex scenes of rioting crowds.
The second contribution consists in a method for detecting abnormal, violent actions that are performed by individual subjects and witnessed by passive crowds. The problem of abnormal individual behavior, such as a fight, witnessed by passive bystanders gathered into a crowd has not been studied before. We show that the presence of a passive, standing crowd is an important indicator that an abnormal action might occur. Thus, detecting the standing crowd improves the performance of detecting the abnormal action. The proposed method performs crowd detection first, followed by the detection of abnormal motion events. Our main theoretical contribution consists in linking crowd detection to abnormal, violent actions, as well as in defining novel sets of features that characterize static crowds and abnormal individual actions in both spatial and spatio-temporal domains. Experimental results are computed on a custom dataset, the Vancouver Riot Dataset, that we generated using amateur video footage acquired with handheld devices and uploaded on public social network sites. Our approach achieves good precision and recall values, which validates our system’s reliability of localizing the crowds and the abnormal actions.
To summarize, this thesis focuses on the detection of two types of abnormal events occurring in violent street movements. The data are gathered by passive participants to these movements using handheld devices. Although our data sets are drawn from one single social movement (the Vancouver 2011 Stanley cup riot) we are confident that our approaches would generalize well and would be helpful to forensic activities performed in the context of other similar violent occasions. / Graduate
|
124 |
Interactive image search with attributesKovashka, Adriana Ivanova 18 September 2014 (has links)
An image retrieval system needs to be able to communicate with people using a common language, if it is to serve its user's information need. I propose techniques for interactive image search with the help of visual attributes, which are high-level semantic visual properties of objects (like "shiny" or "natural"), and are understandable by both people and machines. My thesis explores attributes as a novel form of user input for search. I show how to use attributes to provide relevance feedback for image search; how to optimally choose what to seek feedback on; how to ensure that the attribute models learned by a system align with the user's perception of these attributes; how to automatically discover the shades of meaning that users employ when applying an attribute term; and how attributes can help learn object category models. I use attributes to provide a channel on which the user of an image retrieval system can communicate her information need precisely and with as little effort as possible. One-shot retrieval is generally insufficient, so interactive retrieval systems seek feedback from the user on the currently retrieved results, and adapt their relevance ranking function accordingly. In traditional interactive search, users mark some images as "relevant" and others as "irrelevant", but this form of feedback is limited. I propose a novel mode of feedback where a user directly describes how high-level properties of retrieved images should be adjusted in order to more closely match her envisioned target images, using relative attribute feedback statements. For example, when conducting a query on a shopping website, the user might state: "I want shoes like these, but more formal." I demonstrate that relative attribute feedback is more powerful than traditional binary feedback. The images believed to be most relevant need not be most informative for reducing the system's uncertainty, so it might be beneficial to seek feedback on something other than the top-ranked images. I propose to guide the user through a coarse-to-fine search using a relative attribute image representation. At each iteration of feedback, the user provides a visual comparison between the attribute in her envisioned target and a "pivot" exemplar, where a pivot separates all database images into two balanced sets. The system actively determines along which of multiple such attributes the user's comparison should next be requested, based on the expected information gain that would result. The proposed attribute search trees allow us to limit the scan for candidate images on which to seek feedback to just one image per attribute, so it is efficient both for the system and the user. No matter what potentially powerful form of feedback the system offers the user, search efficiency will suffer if there is noise on the communication channel between the user and the system. Therefore, I also study ways to capture the user's true perception of the attribute vocabulary used in the search. In existing work, the underlying assumption is that an image has a single "true" label for each attribute that objective viewers could agree upon. However, multiple objective viewers frequently have slightly different internal models of a visual property. I pose user-specific attribute learning as an adaptation problem in which the system leverages any commonalities in perception to learn a generic prediction function. Then, it uses a small number of user-labeled examples to adapt that model into a user-specific prediction function. To further lighten the labeling load, I introduce two ways to extrapolate beyond the labels explicitly provided by a given user. While users differ in how they use the attribute vocabulary, there exist some commonalities and groupings of users around their attribute interpretations. Automatically discovering and exploiting these groupings can help the system learn more robust personalized models. I propose an approach to discover the latent factors behind how users label images with the presence or absence of a given attribute, from a sparse label matrix. I then show how to cluster users in this latent space to expose the underlying "shades of meaning" of the attribute, and subsequently learn personalized models for these user groups. Discovering the shades of meaning also serves to disambiguate attribute terms and expand a core attribute vocabulary with finer-grained attributes. Finally, I show how attributes can help learn object categories faster. I develop an active learning framework where the computer vision learning system actively solicits annotations from a pool of both object category labels and the objects' shared attributes, depending on which will most reduce total uncertainty for multi-class object predictions in the joint object-attribute model. Knowledge of an attribute's presence in an image can immediately influence many object models, since attributes are by definition shared across subsets of the object categories. The resulting object category models can be used when the user initiates a search via keywords such as "Show me images of cats" and then (optionally) refines that search with the attribute-based interactions I propose. My thesis exploits properties of visual attributes that allow search to be both effective and efficient, in terms of both user time and computation time. Further, I show how the search experience for each individual user can be improved, by modeling how she uses attributes to communicate with the retrieval system. I focus on the modes in which an image retrieval system communicates with its users by integrating the computer vision perspective and the information retrieval perspective to image search, so the techniques I propose are a promising step in closing the semantic gap. / text
|
125 |
Sprite learning and object category recognition using invariant featuresAllan, Moray January 2007 (has links)
This thesis explores the use of invariant features for learning sprites from image sequences, and for recognising object categories in images. A popular framework for the interpretation of image sequences is the layers or sprite model of e.g. Wang and Adelson (1994), Irani et al. (1994). Jojic and Frey (2001) provide a generative probabilistic model framework for this task, but their algorithm is slow as it needs to search over discretised transformations (e.g. translations, or affines) for each layer. We show that by using invariant features (e.g. Lowe’s SIFT features) and clustering their motions we can reduce or eliminate the search and thus learn the sprites much faster. The algorithm is demonstrated on example image sequences. We introduce the Generative Template of Features (GTF), a parts-based model for visual object category detection. The GTF consists of a number of parts, and for each part there is a corresponding spatial location distribution and a distribution over ‘visual words’ (clusters of invariant features). We evaluate the performance of the GTF model for object localisation as compared to other techniques, and show that such a relatively simple model can give state-of- the-art performance. We also discuss the connection of the GTF to Hough-transform-like methods for object localisation.
|
126 |
Perceptual Mnemonic Medial Temporal Lobe Function in Individuals with Down SyndromeSpanò, Goffredina January 2012 (has links)
Behavioral data in individuals with Down syndrome (DS) and mouse models of the syndrome suggest impaired object processing. In this study we examined the component processes that may contribute to object memory deficits. A neuropsychological test battery was administered to individuals with DS (n=28), including tests targeting perirhinal cortex (PRC) and prefrontal cortex (PFC) function, tests of perception (i.e., convexity based figure ground perception), and tests of memory (object recognition and object-in-place learning). To compare to individuals with DS, the same number of typically developing chronological age (CA, n=28) and mental age-matched (MA, n=28) controls were recruited. We observed object memory deficits in DS (p<0.001). In contrast, the DS group showed relatively intact use of convexity when making figure-ground judgments and spared PRC-dependent function, as compared to MA control. In addition, measures of PFC function seemed to be related to performance on object recognition tasks. These findings suggest that the inputs into the MTL from low and high level perceptual processing streams may be intact in DS. The object memory deficits we observed might reflect impaired PFC function.
|
127 |
The influence of surface detail on object identification in Alzheimer's patients and healthy participantsAdlington, R. L. January 2009 (has links)
Image format (Laws, Adlington, Gale, Moreno-Martínez, & Sartori, 2007), ceiling effects in controls (Fung et al., 2001; Laws et al., 2005; Moreno-Martínez, & Laws, 2007; 2008), and nuisance variables (Funnell & De Mornay Davis, 1996; Funnell & Sheridan, 1992; Stewart, Parkin & Hunkin, 1992) all influence the emergence of category specific deficits in Alzheimer‟s dementia (AD). Thus, the predominant use of line drawings of familiar, everyday items in category specific research is problematic. Moreover, this does not allow researchers to explore the extent to which format may influence object recognition. As such, the initial concern of this thesis was the development of a new corpus of 147 colour images of graded naming difficulty, the Hatfield Image Test (HIT; Adlington, Laws, & Gale, 2009), and the collection of relevant normative data including ratings of: age of acquisition, colour diagnosticity, familiarity, name agreement, visual complexity, and word frequency. Furthermore, greyscale and line-drawn versions of the HIT corpus were developed (and again, the associated normative data obtained), to permit research into the influence of image format on the emergence of category specific effects in patients with AD, and in healthy controls. Using the HIT, several studies were conducted including: (i) a normative investigation of the effects of category and image format on naming accuracy and latencies in healthy controls; (ii) an exploration of the effects of image format (using the HIT images presented in colour, greyscale, and line-drawn formats) and category on the naming performance of AD patients, and age-matched controls performing below ceiling; (iii) a longitudinal investigation comparing AD patient performance to that of age-matched controls, on a range of semantic tasks (naming, sorting, word-picture matching), using colour, greyscale, and line-drawn versions of the HIT; (iv) a comparison of naming in AD patients and age-matched controls on the HIT and the (colour, greyscale and line-drawn) images from the Snodgrass and Vanderwart (1980) corpus; and (v) a meta-analysis to explore category specific naming in AD using the Snodgrass and Vanderwart (1980) versus other corpora. Taken together, the results of these investigations showed first, that image format interacts with category. For both AD patients and controls, colour is more important for the recognition of living things, with a significant nonliving advantage emerging for the line-drawn images, but not the colour images. Controls benefitted more from additional surface information than AD patients, which chapter 6 shows results from low-level visual cortical impairment in AD. For controls, format was also more important for the recognition of low familiarity, low frequency items. In addition, the findings show that adequate control data affects the emergence of category specific deficits in AD. Specifically, based on within-group comparison chapters 6, 7, and 8 revealed a significant living deficit in AD patients. However, when compared to controls performing below ceiling, as demonstrated in chapters 7 and 8, this deficit was only significant for the line drawings, showing that the performance observed in AD patients is simply an exaggeration of the norm.
|
128 |
A multi-level machine learning system for attention-based object recognitionHan, Ji Wan January 2011 (has links)
This thesis develops a trainable object-recognition algorithm. This algorithm represents objects using their salient features. The algorithm applies an attention mechanism to speed up feature detection. A trainable component-based object recognition system which implements the developed algorithm has been created. This system has two layers. The first layer contains several individual feature classifiers. They detect salient features which compose higher level objects from input images. The second layer judges if those detected features form a valid object. An object is represented by a feature map which stores the geometrical and hierarchical relations among features and higher level objects. It is the input to the second layer. The attention mechanism is applied to improve feature detection speed. This mechanism will lead the system to areas with a higher likelihood of containing features when a few features are detected. Therefore the feature detection will be sped up. Two major experiments are conducted. These experiments applied the de- veloped system to discriminate faces from non-faces and to discriminate people from backgrounds in thermal images. The results of these experiments show the success of the implemented system. The attention mechanism displays a positive effect on feature detection. It can save feature detection time, especially in terms of classifier calls.
|
129 |
A cortical model of object perception based on Bayesian networks and belief propagationDurá-Bernal, Salvador January 2011 (has links)
Evidence suggests that high-level feedback plays an important role in visual perception by shaping the response in lower cortical levels (Sillito et al. 2006, Angelucci and Bullier 2003, Bullier 2001, Harrison et al. 2007). A notable example of this is reflected by the retinotopic activation of V1 and V2 neurons in response to illusory contours, such as Kanizsa figures, which has been reported in numerous studies (Maertens et al. 2008, Seghier and Vuilleumier 2006, Halgren et al. 2003, Lee 2003, Lee and Nguyen 2001). The illusory contour activity emerges first in lateral occipital cortex (LOC), then in V2 and finally in V1, strongly suggesting that the response is driven by feedback connections. Generative models and Bayesian belief propagation have been suggested to provide a theoretical framework that can account for feedback connectivity, explain psychophysical and physiological results, and map well onto the hierarchical distributed cortical connectivity (Friston and Kiebel 2009, Dayan et al. 1995, Knill and Richards 1996, Geisler and Kersten 2002, Yuille and Kersten 2006, Deneve 2008a, George and Hawkins 2009, Lee and Mumford 2003, Rao 2006, Litvak and Ullman 2009, Steimer et al. 2009). The present study explores the role of feedback in object perception, taking as a starting point the HMAX model, a biologically inspired hierarchical model of object recognition (Riesenhuber and Poggio 1999, Serre et al. 2007b), and extending it to include feedback connectivity. A Bayesian network that captures the structure and properties of the HMAX model is developed, replacing the classical deterministic view with a probabilistic interpretation. The proposed model approximates the selectivity and invariance operations of the HMAX model using the belief propagation algorithm. Hence, the model not only achieves successful feedforward recognition invariant to position and size, but is also able to reproduce modulatory effects of higher-level feedback, such as illusory contour completion, attention and mental imagery. Overall, the model provides a biophysiologically plausible interpretation, based on state-of-theart probabilistic approaches and supported by current experimental evidence, of the interaction between top-down global feedback and bottom-up local evidence in the context of hierarchical object perception.
|
130 |
Ground Object Recognition using Laser Radar Data : Geometric Fitting, Performance Analysis, and ApplicationsGrönwall, Christna January 2006 (has links)
This thesis concerns detection and recognition of ground object using data from laser radar systems. Typical ground objects are vehicles and land mines. For these objects, the orientation and articulation are unknown. The objects are placed in natural or urban areas where the background is unstructured and complex. The performance of laser radar systems is analyzed, to achieve models of the uncertainties in laser radar data. A ground object recognition method is presented. It handles general, noisy 3D point cloud data. The approach is based on the fact that man-made objects on a large scale can be considered be of rectangular shape or can be decomposed to a set of rectangles. Several approaches to rectangle fitting are presented and evaluated in Monte Carlo simulations. There are error-in-variables present and thus, geometric fitting is used. The objects can have parts that are subject to articulation. A modular least squares method with outlier rejection, that can handle articulated objects, is proposed. This method falls within the iterative closest point framework. Recognition when several similar models are available is discussed. The recognition method is applied in a query-based multi-sensor system. The system covers the process from sensor data to the user interface, i.e., from low level image processing to high level situation analysis. In object detection and recognition based on laser radar data, the range value’s accuracy is important. A general direct-detection laser radar system applicable for hard-target measurements is modeled. Three time-of-flight estimation algorithms are analyzed; peak detection, constant fraction detection, and matched filter. The statistical distribution of uncertainties in time-of-flight range estimations is determined. The detection performance for various shape conditions and signal-tonoise ratios are analyzed. Those results are used to model the properties of the range estimation error. The detector’s performances are compared with the Cramér-Rao lower bound. The performance of a tool for synthetic generation of scanning laser radar data is evaluated. In the measurement system model, it is possible to add several design parameters, which makes it possible to test an estimation scheme under different types of system design. A parametric method, based on measurement error regression, that estimates an object’s size and orientation is described. Validations of both the measurement system model and the measurement error model, with respect to the Cramér-Rao lower bound, are presented.
|
Page generated in 0.0717 seconds