Global ETD Search

1	From visual saliency to video behaviour understanding Hung, Hayley Shi Wen January 2007 (has links) In a world of ever increasing amounts of video data, we are forced to abandon traditional methods of scene interpretation by fully manual means. Under such circumstances, some form of automation is highly desirable but this can be a very open ended issue with high complexity. Dealing with such large amounts of data is a non-trivial task that requires efficient selective extraction of parts of a scene which have the potential to develop a higher semantic meaning, alone, or in combination with others. In particular, the types of video data that are in need of automated analysis tend to be outdoor scenes with high levels of activity generated from either foreground or background. Such dynamic scenes add considerable complexity to the problem since we cannot rely on motion energy alone to detect regions of interest. Furthermore, the behaviour of these regions of motion can differ greatly, while still being highly dependent, both spatially and temporally on the movement of other objects within the scene. Modelling these dependencies, whilst eliminating as much redundancy from the feature extraction process as possible are the challenges addressed by this thesis. In the first half, finding the right mechanism to extract and represent meaningful features from dynamic scenes with no prior knowledge is investigated. Meaningful or salient information is treated as the parts of a scene that stand out or seem unusual or interesting to us. The novelty of the work is that it is able to select salient scales in both space and time in which a particular spatio-temporal volume is considered interesting relative to the rest of the scene. By quantifying the temporal saliency values of regions of motion, it is possible to consider their importance in terms of both the long and short-term. Variations in entropy over spatio-temporal scales are used to select a context dependent measure of the local scene dynamics. A method of quantifying temporal saliency is devised based on the variation of the entropy of the intensity distribution in a spatio-temporal volume over incraeasing scales. Entropy is used over traditional filter methods since the stability or predictability of the intensity distribution over scales of a local spatio-temporal region can be defined more robustly relative to the context of its neighbourhood, even for regions exhibiting high intensity variation due to being extremely textured. Results show that it is possible to extract both locally salient features as well as globally salient temporal features from contrasting scenerios. In the second part of the thesis, focus will shift towards binding these spatio-temporally salient features together so that some semantic meaning can be inferred from their interaction. Interaction in this sense, refers to any form of temporally correlated behaviour between any salient regions of motion in a scene. Feature binding as a mechanism for interactive behaviour understanding is particularly important if we consider that regions of interest may not be treated as particularly significant individually, but represent much more semantically when considered in combination. Temporally correlated behaviour is identified and classified using accumulated co-occurrences of salient features at two levels. Firstly, co-occurrences are accumulated for spatio-temporally proximate salient features to form a local representation. Then, at the next level, the co-occurrence of these locally spatio-temporally bound features are accumulated again in order to discover unusual behaviour in the scene. The novelty of this work is that there are no assumptions made about whether interacting regions should be spatially proximate. Furthermore, no prior knowledge of the scene topology is used. Results show that it is possible to detect unusual interactions between regions of motion, which can visually infer higher levels of semantics. In the final part of the thesis, a more specific investigation of human behaviour is addressed through classification and detection of interactions between 2 human subjects. Here, further modifications are made to the feature extraction process in order to quantify the spatiotemporal saliency of a region of motion. These features are then grouped to find the people in the scene. Then, a loose pose distribution model is extracted for each person for finding salient correlations between poses of two interacting people using canonical correlation analysis. These canonical factors can be formed into trajectories and used for classification. Levenshtein distance is then used to categorise the features. The novelty of the work is that the interactions do not have to be spatially connected or proximate for them to be recognised. Furthermore, the data used is outdoors and cluttered with non-stationary background. Results show that co-occurrence techniques have the potential to provide a more generalised, compact, and meaningful representation of dynamic interactive scene behaviour. 006.37
2	Modèles d'attention visuelle pour l'analyse de scènes dynamiques / Spatio-temporal saliency detection in dynamic scenes using color and texture features Muddamsetty, Satya Mahesh 07 July 2014 (has links) De nombreuses applications de la vision par ordinateur requièrent la détection, la localisation et le suivi de régions ou d’objets d’intérêt dans une image ou une séquence d’images. De nombreux modèles d’attention visuelle, inspirés de la vision humaine, qui détectent de manière automatique les régions d’intérêt dans une image ou une vidéo, ont récemment été développés et utilisés avec succès dans différentes applications. Néanmoins, la plupart des approches existantes sont limitées à l’analyse de scènes statiques et très peu de méthodes exploitent la nature temporelle des séquences d’images.L'objectif principal de ce travail de thèse est donc l'étude de modèles d'attention visuelle pour l'analyse de scènes dynamiques complexes. Une carte de saliance est habituellement obtenue par la fusion d'une carte statitque (saliance spatiale dans une image) d'une part, et d'une carte dynamique (salience temporelle entre une série d'image) d'autre part. Dans notre travail, nous modélisons les changements dynamiques par un opérateur de texture LBP-TOP (Local Binary Patterns) et nous utilisons l'information couleur pour l'aspect spatial.Les deux cartes de saliances sont calculées en utilisant une formulation discriminante inspirée du système visuel humain, et fuionnées de manière appropriée en une carte de saliance spatio-temporelle.De nombreuses expériences avec des bases de données publiques, montrent que notre approche obteint des résulats meilleurs ou comparables avec les approches de la littérature. / Visual saliency is an important research topic in the field of computer vision due to its numerouspossible applications. It helps to focus on regions of interest instead of processingthe whole image or video data. Detecting visual saliency in still images has been widelyaddressed in literature with several formulations. However, visual saliency detection invideos has attracted little attention, and is a more challenging task due to additional temporalinformation. Indeed, a video contains strong spatio-temporal correlation betweenthe regions of consecutive frames, and, furthermore, motion of foreground objects dramaticallychanges the importance of the objects in a scene. The main objective of thethesis is to develop a spatio-temporal saliency method that works well for complex dynamicscenes.A spatio-temporal saliency map is usually obtained by the fusion of a static saliency mapand a dynamic saliency map. In our work, we model the dynamic textures in a dynamicscene with Local Binary Patterns (LBP-TOP) to compute the dynamic saliency map, andwe use color features to compute the static saliency map. Both saliency maps are computedusing a bio-inspired mechanism of Human Visual System (HVS) with a discriminantformulation known as center surround saliency, and are fused in a proper way.The proposed models have been extensively evaluated with diverse publicly availabledatasets which contain several videos of dynamic scenes. The evaluation is performed intwo parts. First, the method in locating interesting foreground objects in complex scene.Secondly, we evaluate our model on the task of predicting human observers fixations.The proposed method is also compared against state-of-the art methods, and the resultsshow that the proposed approach achieves competitive results.In this thesis we also evaluate the performance of different fusion techniques, because fusionplays a critical role in the accuracy of the spatio-temporal saliency map. We evaluatethe performances of different fusion techniques on a large and diverse complex datasetand the results show that a fusion method must be selected depending on the characteristics,in terms of color and motion contrasts, of a sequence. Overall, fusion techniqueswhich take the best of each saliency map (static and dynamic) in the final spatio-temporalmap achieve best results. Modèles d'attention visuelle Spatio-temporal saliency Dynamic textures Color feature Complex dynamic scenes Fusion Performance evalutation 006.4
3	Contribution of colour in guiding visual attention and in a computational model of visual saliency / Contribution de la couleur dans l'attention visuelle et un modèle de saillance visuelle Talebzadeh Shahrbabaki, Shahrbanoo 16 October 2015 (has links) Les études menées dans cette thèse portent sur le rôle de la couleur dans l'attention visuelle. Nous avons tenté de comprendre l'influence de l'information couleur dans les vidéos sur les mouvements oculaire, afin d'intégrer les attributs couleur dans un modèle de saillance visuelle. Pour cela, nous avons analysé différentes caractéristiques des mouvements oculaires d'observateurs regardant librement des vidéos en deux conditions: couleur et niveaux de gris. Nous avons également comparé les régions principales de regard sur des vidéos en couleur avec celles en niveaux de gris. Il est apparu que les informations de couleur modifient légèrement les caractéristiques de mouvement oculaire comme la position de l'œil et la durée des fixations. Cependant, nous avons constaté que la couleur augmente le nombre de régions de regard. De plus, cet influence de la couleur s'accroît au cours du temps. En nous appuyant sur ces résultats, nous avons proposé une méthode de calcul des cartes de saillance couleur. Nous avons intégré ces cartes dans un modèle de saillance existant. / The studies conducted in this thesis focus on the role of colour in visual attention. We tried to understand the influence of colour information on the eye movements while observing videos, to incorporate colour information into a model of visual saliency. For this, we analysed different characteristics of eye movements of observers while freely watching videos in two conditions: colour and grayscale videos. We also have compared the main regions of regard of colour videos with those of grayscale. We observed that colour information influences only moderately, the eye movement characteristics such as the position of gaze and duration of fixations. However, we found that colour increases the number of the regions of interest in video stimuli. Moreover, this varies across time. Based on these observations, we proposed a method to compute colour saliency maps for videos. We have incorporated colour saliency maps in an existing model of saliency. Attention visuelle Saillance spatio-temporel Modèle de vision Saillance de couleur Classification Calcul paralèlle Visual attention Spatio-temporal saliency Model of vision Color saliency Classification Parallel computing 620

1

Page generated in 0.2805 seconds