Global ETD Search

11	From visual saliency to video behaviour understanding Hung, Hayley Shi Wen January 2007 (has links) In a world of ever increasing amounts of video data, we are forced to abandon traditional methods of scene interpretation by fully manual means. Under such circumstances, some form of automation is highly desirable but this can be a very open ended issue with high complexity. Dealing with such large amounts of data is a non-trivial task that requires efficient selective extraction of parts of a scene which have the potential to develop a higher semantic meaning, alone, or in combination with others. In particular, the types of video data that are in need of automated analysis tend to be outdoor scenes with high levels of activity generated from either foreground or background. Such dynamic scenes add considerable complexity to the problem since we cannot rely on motion energy alone to detect regions of interest. Furthermore, the behaviour of these regions of motion can differ greatly, while still being highly dependent, both spatially and temporally on the movement of other objects within the scene. Modelling these dependencies, whilst eliminating as much redundancy from the feature extraction process as possible are the challenges addressed by this thesis. In the first half, finding the right mechanism to extract and represent meaningful features from dynamic scenes with no prior knowledge is investigated. Meaningful or salient information is treated as the parts of a scene that stand out or seem unusual or interesting to us. The novelty of the work is that it is able to select salient scales in both space and time in which a particular spatio-temporal volume is considered interesting relative to the rest of the scene. By quantifying the temporal saliency values of regions of motion, it is possible to consider their importance in terms of both the long and short-term. Variations in entropy over spatio-temporal scales are used to select a context dependent measure of the local scene dynamics. A method of quantifying temporal saliency is devised based on the variation of the entropy of the intensity distribution in a spatio-temporal volume over incraeasing scales. Entropy is used over traditional filter methods since the stability or predictability of the intensity distribution over scales of a local spatio-temporal region can be defined more robustly relative to the context of its neighbourhood, even for regions exhibiting high intensity variation due to being extremely textured. Results show that it is possible to extract both locally salient features as well as globally salient temporal features from contrasting scenerios. In the second part of the thesis, focus will shift towards binding these spatio-temporally salient features together so that some semantic meaning can be inferred from their interaction. Interaction in this sense, refers to any form of temporally correlated behaviour between any salient regions of motion in a scene. Feature binding as a mechanism for interactive behaviour understanding is particularly important if we consider that regions of interest may not be treated as particularly significant individually, but represent much more semantically when considered in combination. Temporally correlated behaviour is identified and classified using accumulated co-occurrences of salient features at two levels. Firstly, co-occurrences are accumulated for spatio-temporally proximate salient features to form a local representation. Then, at the next level, the co-occurrence of these locally spatio-temporally bound features are accumulated again in order to discover unusual behaviour in the scene. The novelty of this work is that there are no assumptions made about whether interacting regions should be spatially proximate. Furthermore, no prior knowledge of the scene topology is used. Results show that it is possible to detect unusual interactions between regions of motion, which can visually infer higher levels of semantics. In the final part of the thesis, a more specific investigation of human behaviour is addressed through classification and detection of interactions between 2 human subjects. Here, further modifications are made to the feature extraction process in order to quantify the spatiotemporal saliency of a region of motion. These features are then grouped to find the people in the scene. Then, a loose pose distribution model is extracted for each person for finding salient correlations between poses of two interacting people using canonical correlation analysis. These canonical factors can be formed into trajectories and used for classification. Levenshtein distance is then used to categorise the features. The novelty of the work is that the interactions do not have to be spatially connected or proximate for them to be recognised. Furthermore, the data used is outdoors and cluttered with non-stationary background. Results show that co-occurrence techniques have the potential to provide a more generalised, compact, and meaningful representation of dynamic interactive scene behaviour. 006.37
12	Bidirectional long short-term memory network for proto-object representation Zhou, Quan 09 October 2018 (has links) Researchers have developed many visual saliency models in order to advance the technology in computer vision. Neural networks, Convolution Neural Networks (CNNs) in particular, have successfully differentiate objects in images through feature extraction. Meanwhile, Cummings et al. has proposed a proto-object image saliency (POIS) model that shows perceptual objects or shapes can be modelled through the bottom-up saliency algorithm. Inspired from their work, this research is aimed to explore the imbedding features in the proto-object representations and utilizing artificial neural networks (ANN) to capture and predict the saliency output of POIS. A combination of CNN and a bi-directional long short-term memory (BLSTM) neural network is proposed for this saliency model as a machine learning alternative to the border ownership and grouping mechanism in POIS. As ANNs become more efficient in performing visual saliency tasks, the result of this work would extend their application in computer vision through successful implementation for proto-object based saliency. Computer science CNN LSTM Machine learning Neural networks Saliency
13	Saliency Maps using Channel Representations / Saliency-kartor utifrån kanalrepresentationer Tuttle, Alexander January 2010 (has links) <p>In this thesis an algorithm for producing saliency maps as well as an algorithm for detecting salient regions based on the saliency map was developed. The saliency values are computed as center-surround differences and a local descriptor called the region p-channel is used to represent center and surround respectively. An integral image representation called the integral p-channel is used to speed up extraction of the local descriptor for any given image region. The center-surround difference is calculated as either histogram or p-channel dissimilarities.</p><p>Ground truth was collected using human subjects and the algorithm’s ability to detect salient regions was evaluated against this ground truth. The algorithm was also compared to another saliency algorithm.</p><p>Two different center-surround interpretations are tested, as well as several p-channel and histogram dissimilarity measures. The results show that for all tested settings the best performing dissimilarity measure is the so called diffusion distance. The performance comparison showed that the algorithm developed in this thesis outperforms the algorithm against which it was compared, both with respect to region detection and saliency ranking of regions. It can be concluded that the algorithm shows promising results and further investigation of the algorithm is recommended. A list of suggested approaches for further research is provided.</p> computer vision saliency maps p-channels Image analysis Bildanalys
14	Data and Model-Driven Selection Using Color Regions Syeda-Mahmood, Tanveer Fathima 01 February 1992 (has links) A key problem in model-based object recognition is selection, namely, the problem of determining which regions in the image are likely to come from a single object. In this paper we present an approach that extracts and uses color region information to perform selection either based solely on image- data (data-driven), or based on the knowledge of the color description of the model (model -driven). The paper presents a method of perceptual color specification by color categories to extract perceptual color regions. It also discusses the utility of color-based selection in reducing the search involved in recognition. selection color recognition saliency visual attention sregion segmentation
15	Contextual Influences on Saliency Torralba, Antonio 14 April 2004 (has links) This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition. AI Attention context saliency scene recognition object detection
16	Intelligent Ad Resizing Badali, Anthony Paul 15 December 2009 (has links) Currently, online advertisements are created for specific dimensions and must be laboriously modified by advertisers to support different aspect ratios. In addition, publishers are constrained to design web pages to accommodate this limited set of sizes. As an alternative we present a framework for automatically generating visual banners at arbitrary sizes based on individual prototype ads. This technique can be used to create flexible visual ads that can be resized to accommodate various aspect ratios. In the proposed framework image and text data are stored separately. Resizing involves selecting a sub-region of the original image and updating text parameters (size and position). This problem is posed within an optimization framework that encourages solutions which maintain important structural properties of the original ad. The method can be applied to advertisements containing a wide variety of imagery and provides significantly more flexibility than existing solutions. Image Resizing Saliency Energy-Based Models Cropping 0800 0984
17	Intelligent Ad Resizing Badali, Anthony Paul 15 December 2009 (has links) Currently, online advertisements are created for specific dimensions and must be laboriously modified by advertisers to support different aspect ratios. In addition, publishers are constrained to design web pages to accommodate this limited set of sizes. As an alternative we present a framework for automatically generating visual banners at arbitrary sizes based on individual prototype ads. This technique can be used to create flexible visual ads that can be resized to accommodate various aspect ratios. In the proposed framework image and text data are stored separately. Resizing involves selecting a sub-region of the original image and updating text parameters (size and position). This problem is posed within an optimization framework that encourages solutions which maintain important structural properties of the original ad. The method can be applied to advertisements containing a wide variety of imagery and provides significantly more flexibility than existing solutions. Image Resizing Saliency Energy-Based Models Cropping 0800 0984
18	The Relationship Between Bottom-Up Saliency and Gaze Behaviour During Audiovisual Speech Perception Everdell, IAN 12 January 2009 (has links) Face-to-face communication is one of the most natural forms of interaction between humans. Speech perception is an important part of this interaction. While speech could be said to be primarily auditory in nature, visual information can play a significant role in influencing perception. It is not well understood what visual information is important or how that information is collected. Previous studies have documented the preference to gaze at the eyes, nose, and mouth of the talking face, but physical saliency, i.e., the unique low-level features of the stimulus, has not been explicitly examined. Two eye-tracking experiments are presented to investigate the role of physical saliency in the guidance of gaze fixations during audiovisual speech perception. Experiment 1 quantified the physical saliency of a talking face and examined its relationship with the gaze behaviour of participants performing an audiovisual speech perception task and an emotion judgment task. The majority of fixations were made to locations on the face that exhibited high relative saliency, but not necessarily the maximally salient location. The addition of acoustic background noise resulted in a change in gaze behaviour and a decrease in correspondence between saliency and gaze behaviour, whereas changing the task did not alter this correspondence despite changes in gaze behaviour. Experiment 2 manipulated the visual information available to the viewer by using animated full-feature and point-light talking faces. Removing static information, such as colour, intensity, and orientation, from the stimuli elicited both a change in gaze behaviour and a decrease in correspondence between saliency and gaze behaviour. Removing dynamic information, particularly head motion, resulted in a decrease in correspondence between saliency and gaze behaviour without any change in gaze behaviour. The results of these experiments show that, while physical saliency is correlated with gaze behaviour, it cannot be the single factor determining the selection of gaze fixations. Interactions within and between bottom-up and top-down processing are suggested to guide the selection of gaze fixations during audiovisual speech perception. / Thesis (Master, Neuroscience Studies) -- Queen's University, 2008-12-18 13:10:01.694 gaze behaviour saliency speech perception eye movements bottom-up
19	De-Emphasis of Distracting Image Regions Using Texture Power Maps Su, Sara L., Durand, Frédo, Agrawala, Maneesh 01 1900 (has links) We present a post-processing technique that selectively reduces the salience of distracting regions in an image. Computational models of attention predict that texture variation influences bottom-up attention mechanisms. Our method reduces the spatial variation of texture using power maps, high-order features describing local frequency content in an image. Modification of power maps results in effective regional de-emphasis. We validate our results quantitatively via a human subject search experiment and qualitatively with eye tracking data. / Singapore-MIT Alliance (SMA) Image processing computational photography saliency visual attention power map
20	Contextual Influences on Saliency Torralba, Antonio 14 April 2004 (has links) This article describes a model for including scene/context priors in attention guidance. In the proposed scheme, visual context information can be available early in the visual processing chain, in order to modulate the saliency of image regions and to provide an efficient short cut for object detection and recognition. The scene is represented by means of a low-dimensional global description obtained from low-level features. The global scene features are then used to predict the probability of presence of the target object in the scene, and its location and scale, before exploring the image. Scene information can then be used to modulate the saliency of image regions early during the visual processing in order to provide an efficient short cut for object detection and recognition. AI Attention context saliency scene recognition object detection

Search results