The incredible ability of human beings to quickly detect the prominent or salient regions in an image is often taken for granted. To be able to reproduce this intelligent ability in computer vision systems remains quite a challenge. This ability is of paramount importance to perception and image understanding since it accelerates the image analysis process, thereby allowing higher vision processes such as recognition to have a focus of attention. In addition to this, human eye fixation points occurring during the early stages of visual processing, often correspond to the loci of salient image regions. These regions provide us with assistance in determining the interesting parts of an image and they also lend support to our ability to discriminate between different objects in a scene. Salient regions attract our immediate attention without requiring an exhaustive scan of a scene. In essence, saliency can be defined as the quality of an image region that enables it to stand out in relation to its neighbors.
Saliency is often approached in either one of two ways. The bottom-up saliency approach refers to mechanisms which are image-driven and independent of the knowledge in an image, whereas the top-down saliency approach refers to mechanisms which are task-oriented and make use of the prior knowledge about a scene. In this thesis, we present a bottom-up measure of saliency based on the relationships exhibited among image features. The perceived structure in an image is determined more by the relationships among features rather than the individual feature attributes. From this standpoint, we aim to capture the organization within an image by employing relational distributions derived from distance and gradient direction relationships exhibited between image primitives. The Rényi entropy of the relational distribution tends to be lower if saliency is exhibited for some image region in the local pixel neighborhood over which the distribution is defined. This notion forms the foundation of our measure.
Correspondingly, results of our measure are presented in the form of a saliency map, highlighting salient image regions. We show results on a variety of real images from various datasets. We evaluate the performance of our measure in relation to a dominant saliency model and obtain comparable results. We also investigate the biological plausibility of our method by comparing our results to those captured by human fixation maps. In an effort to derive meaningful information from an image, we investigate the significance of scale relative to our saliency measure, and attempt to determine optimal scales for image analysis. In addition to this, we extend a perceptual grouping framework by using our measure as an optimization criterion for determining the organizational strength of edge groupings. As a result, the use of ground truth images is circumvented.
Identifer | oai:union.ndltd.org:USF/oai:scholarcommons.usf.edu:etd-2619 |
Date | 07 May 2010 |
Creators | Duncan, Kester |
Publisher | Scholar Commons |
Source Sets | University of South Flordia |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Graduate Theses and Dissertations |
Rights | default |
Page generated in 0.006 seconds