Spelling suggestions: "subject:"[een] SCENE"" "subject:"[enn] SCENE""
201 |
The Role Of Familiarity On Change PerceptionKaracan, Hacer 01 July 2007 (has links) (PDF)
In this study the mechanisms that control attention in natural scenes was examined. It was explored whether familiarity with the environment makes participants more sensitive to changes or novel events in the scene. Previous investigation of this issue has been based on viewing 2D pictures/images of simple objects or of natural scenes, a situation which does not accurately reflect the challenges of natural vision. In order to examine this issue, as well as the differences between 2D and 3D environments, two experiments were designed in which the general task demands could be manipulated. The results revealed that familiarity with the environment significantly increased the time spent fixating regions in the scene where a change had occurred. The results support the hypothesis that we learn the structure of natural scenes over time, and that attention is attracted by deviations from the stored scene representation. Such a mechanism would allow attention to objects or events that were not explicitly on the current cognitive agenda.
|
202 |
Robust Extraction Of Sparse 3d Points From Image SequencesVural, Elif 01 September 2008 (has links) (PDF)
In this thesis, the extraction of sparse 3D points from calibrated image sequences is studied. The presented method for sparse 3D reconstruction is examined in two steps, where the first part addresses the problem of two-view reconstruction, and the second part is the extension of the two-view reconstruction algorithm to include multiple views. The examined two-view reconstruction method consists of some basic building blocks, such as feature detection and matching, epipolar geometry estimation, and the reconstruction of cameras and scene structure. Feature detection and matching is achieved by Scale Invariant Feature Transform (SIFT) method. For the estimation of epipolar geometry, the 7-point and 8-point algorithms are examined for Fundamental matrix (F-matrix) computation, while RANSAC and PROSAC are utilized for the robustness and accuracy for model estimation. In the final stage of two-view reconstruction, the camera projection matrices are computed from the F-matrix, and the locations of 3D scene points are estimated by triangulation / hence, determining the scene structure and cameras up to a projective transformation. The extension of the two-view reconstruction to multiple views is achieved by estimating the camera projection matrix of each additional view from the already reconstructed matches, and then adding new points to the scene structure by triangulating the unreconstructed matches. Finally, the reconstruction is upgraded from projective to metric by a rectifying homography computed from the camera calibration information. In order to obtain a refined reconstruction, two different methods are suggested for the removal of erroneous points from the scene structure. In addition to the examination of the solution to the reconstruction problem, experiments have been conducted that compare the performances of competing algorithms used in various stages of reconstruction. In connection with sparse reconstruction, a rate-distortion efficient piecewise planar scene representation algorithm that generates mesh models of scenes from reconstructed point clouds is examined, and its performance is evaluated through experiments.
|
203 |
Blind image and video quality assessment using natural scene and motion modelsSaad, Michele Antoine 05 November 2013 (has links)
We tackle the problems of no-reference/blind image and video quality evaluation. The approach we take is that of modeling the statistical characteristics of natural images and videos, and utilizing deviations from those natural statistics as indicators of perceived quality. We propose a probabilistic model of natural scenes and a probabilistic model of natural videos to drive our image and video quality assessment (I/VQA) algorithms respectively. The VQA problem is considerably different from the IQA problem since it imposes a number of challenges on top of the challenges faced in the IQA problem; namely the challenges arising from the temporal dimension in video that plays an important role in influencing human perception of quality. We compare our IQA approach to the state of the art in blind, reduced reference and full-reference methods, and we show that it is top performing. We compare our VQA approach to the state of the art in reduced and full-reference methods (no blind VQA methods that perform reliably well exist), and show that our algorithm performs as well as the top performing full and reduced reference algorithms in predicting human judgments of quality. / text
|
204 |
Utilizing natural scene statistics and blind image quality analysis of infrared imageryKaser, Jennifer Yvonne 09 December 2013 (has links)
With the increasing number and affordability of image capture devices, there is an increasing demand to objectively analyze and compare the quality of images. Image quality can also be used as an indicator to determine if the source image is of high enough quality to perform analysis on. When applied to real world scenarios, use of a blind algorithm is essential since a flawless reference image typically is unavailable. Recent research has shown promising results in no reference image quality utilizing natural scene statistics in the visual image space. Research has also shown that although the statistical profiles vary slightly, there are statistical regularities in IR images as well which would indicate that natural scene statistical models may be able to be applied. In this project, I will analyze BRISQUE quality features of IR images and determine if the algorithm can successfully be applied to IR images. Additionally, in order to validate the usefulness of these techniques, the BRISQUE quality features are analyzed using a detection algorithm to determine if they can be used to predict conditions which may cause missed detections. / text
|
205 |
Applied statistical modeling of three-dimensional natural scene dataSu, Che-Chun 27 June 2014 (has links)
Natural scene statistics (NSS) have played an increasingly important role in both our understanding of the function and evolution of the human vision system, and in the development of modern image processing applications. Because depth/range, i.e., egocentric distance, is arguably the most important thing a visual system must compute (from an evolutionary perspective), the joint statistics between natural image and depth/range information are of particular interest. However, while there exist regular and reliable statistical models of two-dimensional (2D) natural images, there has been little work done on statistical modeling of natural luminance/chrominance and depth/disparity, and of their mutual relationships. One major reason is the dearth of high-quality three-dimensional (3D) image and depth/range database. To facilitate research progress on 3D natural scene statistics, this dissertation first presents a high-quality database of color images and accurately co-registered depth/range maps using an advanced laser range scanner mounted with a high-end digital single-lens reflex camera. By utilizing this high-resolution, high-quality database, this dissertation performs reliable and robust statistical modeling of natural image and depth/disparity information, including new bivariate and spatial oriented correlation models. In particular, these new statistical models capture higher-order dependencies embedded in spatially adjacent bandpass responses projected from natural environments, which have not yet been well understood or explored in literature. To demonstrate the efficacy and effectiveness of the advanced NSS models, this dissertation addresses two challenging, yet very important problems, depth estimation from monocular images and no-reference stereoscopic/3D (S3D) image quality assessment. A Bayesian depth estimation framework is proposed to consider the canonical depth/range patterns in natural scenes, and it forms priors and likelihoods using both univariate and bivariate NSS features. The no-reference S3D image quality index proposed in this dissertation exploits new bivariate and correlation NSS features to quantify different types of stereoscopic distortions. Experimental results show that the proposed framework and index achieve superior performance to state-of-the-art algorithms in both disciplines. / text
|
206 |
Learning Statistical Features of Scene ImagesLee, Wooyoung 01 September 2014 (has links)
Scene perception is a fundamental aspect of vision. Humans are capable of analyzing behaviorally-relevant scene properties such as spatial layouts or scene categories very quickly, even from low resolution versions of scenes. Although humans perform these tasks effortlessly, they are very challenging for machines. Developing methods that well capture the properties of the representation used by the visual system will be useful for building computational models that are more consistent with perception. While it is common to use hand-engineered features that extract information from predefined dimensions, they require careful tuning of parameters and do not generalize well to other tasks or larger datasets. This thesis is driven by the hypothesis that the perceptual representations are adapted to the statistical properties of natural visual scenes. For developing statistical features for global-scale structures (low spatial frequency information that encompasses entire scenes), I propose to train hierarchical probabilistic models on whole scene images. I first investigate statistical clusters of scene images by training a mixture model under the assumption that each image can be decoded by sparse and independent coefficients. Each cluster discovered by the unsupervised classifier is consistent with the high-level semantic categories (such as indoor, outdoor-natural and outdoor-manmade) as well as perceptual layout properties (mean depth, openness and perspective). To address the limitation of mixture models in their assumptions of a discrete number of underlying clusters, I further investigate a continuous representation for the distributions of whole scenes. The model parameters optimized for natural visual scenes reveal a compact representation that encodes their global-scale structures. I develop a probabilistic similarity measure based on the model and demonstrate its consistency with the perceptual similarities. Lastly, to learn the representations that better encode the manifold structures in general high-dimensional image space, I develop the image normalization process to find a set of canonical images that anchors the probabilistic distributions around the real data manifolds. The canonical images are employed as the centers of the conditional multivariate Gaussian distributions. This approach allows to learn more detailed structures of the local manifolds resulting in improved representation of the high level properties of scene images.
|
207 |
Natural scene statistics based blind image quality assessment in spatial domainMittal, Anish 05 August 2011 (has links)
We propose a natural scene statistic based quality assessment model Refer- enceless Image Spatial QUality Evaluator (RISQUE) which extracts marginal statistics of local normalized luminance signals and measures 'un-naturalness' of the distorted image based on measured deviation of them. We also model distribution of pairwise products of adjacent normalized luminance signals providing us with orientation distortion information. Although multi-scale, the model is defined in the space domain avoiding costly frequency or wavelet transforms. The frame work is simple, fast, human perception based and shown to perform statistically better than other proposed no reference algorithms and full reference structural similarity index(SSIM). / text
|
208 |
Top-Down Bayesian Modeling and Inference for Indoor ScenesDel Pero, Luca January 2013 (has links)
People can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods.
|
209 |
Bayesian Data Association for Temporal Scene UnderstandingBrau Avila, Ernesto January 2013 (has links)
Understanding the content of a video sequence is not a particularly difficult problem for humans. We can easily identify objects, such as people, and track their position and pose within the 3D world. A computer system that could understand the world through videos would be extremely beneficial in applications such as surveillance, robotics, biology. Despite significant advances in areas like tracking and, more recently, 3D static scene understanding, such a vision system does not yet exist. In this work, I present progress on this problem, restricted to videos of objects that move in smoothly and which are relatively easily detected, such as people. Our goal is to identify all the moving objects in the scene and track their physical state (e.g., their 3D position or pose) in the world throughout the video. We develop a Bayesian generative model of a temporal scene, where we separately model data association, the 3D scene and imaging system, and the likelihood function. Under this model, the video data is the result of capturing the scene with the imaging system, and noisily detecting video features. This formulation is very general, and can be used to model a wide variety of scenarios, including videos of people walking, and time-lapse images of pollen tubes growing in vitro. Importantly, we model the scene in world coordinates and units, as opposed to pixels, allowing us to reason about the world in a natural way, e.g., explaining occlusion and perspective distortion. We use Gaussian processes to model motion, and propose that it is a general and effective way to characterize smooth, but otherwise arbitrary, trajectories. We perform inference using MCMC sampling, where we fit our model of the temporal scene to data extracted from the videos. We address the problem of variable dimensionality by estimating data association and integrating out all scene variables. Our experiments show our approach is competitive, producing results which are comparable to state-of-the-art methods.
|
210 |
Predictive eyes precede retrieval : visual recognition as hypothesis testingHolm, Linus January 2007 (has links)
Does visual recognition entail verifying an idea about what is perceived? This question was addressed in the three studies of this thesis. The main hypothesis underlying the investigation was that visual recognition is an active process involving hypothesis testing. Recognition of faces (Study 1), scenes (Study 2) and objects (Study 3) was investigated using eye movement registration as a window on the recognition process. In Study 1, a functional relationship between eye movements and face recognition was established. Restricting the eye movements reduced recognition performance. In addition, perceptual reinstatement as indicated by eye movement consistency across study and test was related to recollective experience at test. Specifically, explicit recollection was related to higher eye movement consistency than familiarity-based recognition and false rejections (Studies 1-2). Furthermore, valid expectations about a forthcoming stimulus scene produced eye movements which were more similar to those of an earlier study episode, compared to invalid expectations (Study 2). In Study 3 participants recognized fragmented objects embedded in nonsense fragments. Around 8 seconds prior to explicit recognition, participants began to fixate the object region rather than a similar control region in the stimulus pictures. Before participants’ indicated awareness of the object, they fixated it with an average of 9 consecutive fixations. Hence, participants were looking at the object as if they had recognized it before they became aware of its identity. Furthermore, prior object information affected eye movement sampling of the stimulus, suggesting that semantic memory was involved in guiding the eyes during object recognition even before the participants were aware of its presence. Collectively, the studies support the view that gaze control is instrumental to visual recognition performance and that visual recognition is an interactive process between memory representation and information sampling.
|
Page generated in 0.043 seconds