Global ETD Search

1	Subjective and Objective Evaluation of Visual Attention Models January 2016 (has links) abstract: Visual attention (VA) is the study of mechanisms that allow the human visual system (HVS) to selectively process relevant visual information. This work focuses on the subjective and objective evaluation of computational VA models for the distortion-free case as well as in the presence of image distortions. Existing VA models are traditionally evaluated by using VA metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though there is a considerable number of objective VA metrics, there exists no study that validates that these metrics are adequate for the evaluation of VA models. This work constructs a VA Quality (VAQ) Database by subjectively assessing the prediction performance of VA models on distortion-free images. Additionally, shortcomings in existing metrics are discussed through illustrative examples and a new metric that uses local weights based on fixation density and that overcomes these flaws, is proposed. The proposed VA metric outperforms all other popular existing metrics in terms of the correlation with subjective ratings. In practice, the image quality is affected by a host of factors at several stages of the image processing pipeline such as acquisition, compression, and transmission. However, none of the existing studies have discussed the subjective and objective evaluation of visual saliency models in the presence of distortion. In this work, a Distortion-based Visual Attention Quality (DVAQ) subjective database is constructed to evaluate the quality of VA maps for images in the presence of distortions. For creating this database, saliency maps obtained from images subjected to various types of distortions, including blur, noise and compression, and varying levels of distortion severity are rated by human observers in terms of their visual resemblance to corresponding ground-truth fixation density maps. The performance of traditionally used as well as recently proposed VA metrics are evaluated by correlating their scores with the human subjective ratings. In addition, an objective evaluation of 20 state-of-the-art VA models is performed using the top-performing VA metrics together with a study of how the VA models’ prediction performance changes with different types and levels of distortions. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016 Engineering Image Distortion Subjective Testing Visual Saliency
2	Visual Saliency Analysis on Fashion Images Using Image Processing and Deep Learning Approaches Neupane, Aashish 01 December 2020 (has links) ABSTRACTAASHISH NEUPANE, for the Master of Science degree in BIOMEDICAL ENGINEERING, presented on July 35, 2020, at Southern Illinois University Carbondale. TITLE: VISUAL SALIENCY ANALYSIS ON FASHION IMAGES USING IMAGE PROCESSING AND DEEP LEARNING APPROACHES.MAJOR PROFESSOR: Dr. Jun QinState-of-art computer vision technologies have been applied in fashion in multiple ways, and saliency modeling is one of those applications. In computer vision, a saliency map is a 2D topological map which indicates the probabilistic distribution of visual attention priorities. This study is focusing on analysis of the visual saliency on fashion images using multiple saliency models, evaluated by several evaluation metrics. A human subject study has been conducted to collect people’s visual attention on 75 fashion images. Binary ground-truth fixation maps for these images have been created based on the experimentally collected visual attention data using Gaussian blurring function. Saliency maps for these 75 fashion images were generated using multiple conventional saliency models as well as deep feature-based state-of-art models. DeepFeat has been studied extensively, with 44 sets of saliency maps, exploiting the features extracted from GoogLeNet and ResNet50. Seven other saliency models have also been utilized to predict saliency maps on these images. The results were compared over 5 evaluation metrics – AUC, CC, KL Divergence, NSS and SIM. The performance of all 8 saliency models on prediction of visual attention on fashion images over all five metrics were comparable to the benchmarked scores. Furthermore, the models perform well consistently over multiple evaluation metrics, thus indicating that saliency models could in fact be applied to effectively predict salient regions in random fashion advertisement images. DeepFeat Pretrained DCNN Saliency Models Visual Saliency
3	Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition January 2018 (has links) abstract: Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements. First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance. Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model. Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Artificial intelligence computer vision deep learning human studies visual saliency
4	Modelando a atenção seletiva e a saliência visual através de redes complexas / Modeling the selective attention and visual saliency using complex networks Rigo, Gustavo Vrech 22 July 2010 (has links) A atenção seletiva é uma característica central do sistema visual humano, uma vez que todo o cérebro é otimizado de modo a perceber as informações ao seu redor da forma mais rápida possível. Porém, em geral os trabalhos nesta área apenas verificam quais são as regiões de maior freqüência da atenção seletiva, dando pouca importância para a sua mecânica. A presente dissertação propõe um modelo que represente a atenção seletiva como uma rede complexa, combinando naturalmente as áreas de redes complexas, cadeias de Markov, análise de imagens, atenção seletiva e saliência visual num modelo biologicamente plausível para simular a atenção seletiva. O modelo propõe que pontos importantes da imagem, pontos salientes, sejam caracterizados como vértices da rede complexa, e que as arestas sejam distribuídas de acordo com a probabilidade da mudança de atenção entre dois vértices. Desta forma, a mecânica da atenção seletiva seria simulada pela mecânica da rede complexa correspondente. Foram estudadas imagens em níveis de cinza, sendo estas correspondentes à cena observada. A probabilidade de mudança entre duas regiões, as arestas da rede, foram definidas através de diversos métodos de composição da saliência visual, e as redes resultantes comparadas com redes complexas provenientes de um experimento protótipo realizado. A partir deste experimento foram propostos refinamentos no modelo original, tornando assim a mecânica do modelo o mais próximo possível da mecânica humana da atenção seletiva. / Selective attention is a central feature of the human visual system, since the entire brain is optimized in order to understand the information around as quickly as possible. In general works in this area only search which regions has a higher frequency of selective attention, with little consideration for their mechanics. This study proposes a model that represents the selective attention as a complex network, combining naturally areas of complex networks, Markov chains, image analysis, selective attention and visual salience in a biologically plausible model to simulate the selective attention. The model proposes that the important points of the image, salient points, are identified as vertices of the complex network, and the edges are distributed according to the probability of shift of attention between two vertices. Thus, the mechanics of selective attention would be simulated by the mechanics of correspondent complex network. We studied images in gray levels, which are corresponding to the scene observed. The probability of switching between two regions, the edges of the network were identified through various methods of visual saliency composition, and the resulting networks compared with complex networks from a prototype experiment performed. From this experiment were proposed refinements to the original model, thereby making the mechanical design as close as possible to the mechanics of human selective attention. Atenção seletiva Complex networks Redes complexas Saliência visual Selective attention Visual saliency
5	On Visual Attention in Natural Images Tavakoli, Fatemeh January 2015 (has links) By visual attention process biological and machine vision systems are able to select the most relevant regions from a scene. The relevancy process is achieved either by top-down factors, driven by task, or bottom-up factors, the visual saliency, which distinguish a scene region that are different from its surrounding. During the past 20 years numerous research efforts have aimed to model bottom-up visual saliency with many successful applications in computer vision and robotics.In this thesis we have performed a comparison between a state-of-the-art saliency model and subjective test (human eye tracking) using different evaluation methods over three generated dataset of synthetic patterns and natural images. Our results showed that the objective model is partially valid and highly center-biased.By using empirical data obtained from subjective experiments we propose a special function, the Probability of Characteristic Radially Dependency Function, to model the lateral distribution of visual attention process. Bottom-up attention eye movement prediction model comparison visual attention visual saliency
6	Face perception in videos : contributions to a visual saliency model and its implementation on GPUs / La perception des visages en vidéos : contributions à un modèle saillance visuelle et son application sur les GPU Rahman, Anis Ur 12 April 2013 (has links) Les études menées dans cette thèse portent sur le rôle des visages dans l'attention visuelle. Nous avons cherché à mieux comprendre l'influence des visages dans les vidéos sur les mouvements oculaires, afin de proposer un modèle de saillance visuelle pour la prédiction de la direction du regard. Pour cela, nous avons analysé l'effet des visages sur les fixations oculaires d'observateurs regardant librement (sans consigne ni tâche particulière) des vidéos. Nous avons étudié l'impact du nombre de visages, de leur emplacement et de leur taille. Il est apparu clairement que les visages dans une scène dynamique (à l'instar de ce qui se passe sur les images fixes) modifie fortement les mouvements oculaires. En nous appuyant sur ces résultats, nous avons proposé un modèle de saillance visuelle, qui combine des caractéristiques classiques de bas-niveau (orientations et fréquences spatiales, amplitude du mouvement des objets) avec cette caractéristique importante de plus haut-niveau que constitue les visages. Enfin, afin de permettre des traitements plus proches du temps réel, nous avons développé une implémentation parallèle de ce modèle de saillance visuelle sur une plateforme multi-GPU. Le gain en vitesse est d'environ 130 par rapport à une implémentation sur un processeur multithread. / Studies conducted in this thesis focuses on faces and visual attention. We are interested to better understand the influence and perception of faces, to propose a visual saliency model with face features. Throughout the thesis, we concentrate on the question, "How people explore dynamic visual scenes, how the different visual features are modeled to mimic the eye movements of people, in particular, what is the influence of faces?" To answer these questions we analyze the influence of faces on gaze during free-viewing of videos, as well as the effects of the number, location and size of faces. Based on the findings of this work, we propose model with face as an important information feature extracted in parallel alongside other classical visual features (static and dynamic features). Finally, we propose a multi-GPU implementation of the visual saliency model, demonstrating an enormous speedup of more than 132 times compared to a multithreaded CPU. Perception des visages Saillance visuelle GPU Face perception Visual saliency GPU 620
7	Modelando a atenção seletiva e a saliência visual através de redes complexas / Modeling the selective attention and visual saliency using complex networks Gustavo Vrech Rigo 22 July 2010 (has links) A atenção seletiva é uma característica central do sistema visual humano, uma vez que todo o cérebro é otimizado de modo a perceber as informações ao seu redor da forma mais rápida possível. Porém, em geral os trabalhos nesta área apenas verificam quais são as regiões de maior freqüência da atenção seletiva, dando pouca importância para a sua mecânica. A presente dissertação propõe um modelo que represente a atenção seletiva como uma rede complexa, combinando naturalmente as áreas de redes complexas, cadeias de Markov, análise de imagens, atenção seletiva e saliência visual num modelo biologicamente plausível para simular a atenção seletiva. O modelo propõe que pontos importantes da imagem, pontos salientes, sejam caracterizados como vértices da rede complexa, e que as arestas sejam distribuídas de acordo com a probabilidade da mudança de atenção entre dois vértices. Desta forma, a mecânica da atenção seletiva seria simulada pela mecânica da rede complexa correspondente. Foram estudadas imagens em níveis de cinza, sendo estas correspondentes à cena observada. A probabilidade de mudança entre duas regiões, as arestas da rede, foram definidas através de diversos métodos de composição da saliência visual, e as redes resultantes comparadas com redes complexas provenientes de um experimento protótipo realizado. A partir deste experimento foram propostos refinamentos no modelo original, tornando assim a mecânica do modelo o mais próximo possível da mecânica humana da atenção seletiva. / Selective attention is a central feature of the human visual system, since the entire brain is optimized in order to understand the information around as quickly as possible. In general works in this area only search which regions has a higher frequency of selective attention, with little consideration for their mechanics. This study proposes a model that represents the selective attention as a complex network, combining naturally areas of complex networks, Markov chains, image analysis, selective attention and visual salience in a biologically plausible model to simulate the selective attention. The model proposes that the important points of the image, salient points, are identified as vertices of the complex network, and the edges are distributed according to the probability of shift of attention between two vertices. Thus, the mechanics of selective attention would be simulated by the mechanics of correspondent complex network. We studied images in gray levels, which are corresponding to the scene observed. The probability of switching between two regions, the edges of the network were identified through various methods of visual saliency composition, and the resulting networks compared with complex networks from a prototype experiment performed. From this experiment were proposed refinements to the original model, thereby making the mechanical design as close as possible to the mechanics of human selective attention. Atenção seletiva Redes complexas Saliência visual Complex networks Selective attention Visual saliency
8	Describing and retrieving visual content using natural language Ramanishka, Vasili 11 February 2021 (has links) Modern deep learning methods have boosted research progress in visual recognition and text understanding but it is a non-trivial task to unite these advances from both disciplines. In this thesis, we develop models and techniques that allow us to connect natural language and visual content enabling automatic video subtitling, visual grounding, and text-based image search. Such models could be useful in a wide range of applications in robotics and human-computer interaction bridging the gap in vision and language understanding. First, we develop a model that generates natural language descriptions of the main activities and scenes depicted in short videos. While previous methods were constrained to a predefined list of objects, actions, or attributes, our model learns to generate descriptions directly from raw pixels. The model exploits available audio information and the video’s category (e.g., cooking, movie, education) to generate more relevant and coherent sentences. Then, we introduce a technique for visual grounding of generated sentences using the same video description model. Our approach allows for explaining the model’s prediction by localizing salient video regions for corresponding words in the generated sentence. Lastly, we address the problem of image retrieval. Existing cross-modal retrieval methods work by learning a common embedding space for different modalities using parallel data such as images and their accompanying descriptions. Instead, we focus on the case when images are connected by relative annotations: given the context set as an image and its metadata, the user can specify desired semantic changes using natural language instructions. The model needs to capture distinctive visual differences between image pairs as described by the user. Our approach enables interactive image search such that the natural language feedback significantly improves the efficacy of image retrieval. We show that the proposed methods advance the state-of-the-art for video captioning and image retrieval tasks in terms of both accuracy and interpretability. Computer science Automatic video captioning Computer vision Cross-modal retrieval Natural language processing Visual saliency
9	Police Car 'Visibility': He Relationship between Detection, Categorization and Visual Saliency Thomas, Mark Dewayne 12 May 2012 (has links) Perceptual categorization involves integrating bottom-up sensory information with top-down knowledge which is based on prior experience. Bottom-up information comes from the external world and visual saliency is a type of bottom-up information that is calculated on the differences between the visual characteristics of adjacent spatial locations. There is currently a related debate in municipal law enforcement communities about which are more ‘visible’: white police cars or black and white police cars. Municipalities do not want police cars to be hit by motorists and they also want police cars to be seen in order to promote a public presence. The present study used three behavioral experiments to investigate the effects of visual saliency on object detection and categorization. Importantly, the results indicated that so-called ‘object detection’ is not a valid construct. Rather than identifying objectness or objecthood prior to categorization, object categorization is an obligatory process, and object detection is a postcategorization decision with higher salience objects being categorized easier than lower salience objects. An additional experiment was conducted to examine the features that constitute a police car. Based on salience alone, black and white police cars were better categorized than white police cars and light bars were slightly more important police car defining components than markings. police car categorization detection visual saliency perceptual categorization object detection object categorization
10	Real and predicted influence of image manipulations on eye movements during scene recognition Harding, Glen, Bloj, Marina January 2010 (has links) No / In this paper, we investigate how controlled changes to image properties and orientation affect eye movements for repeated viewings of images of natural scenes. We make changes to images by manipulating low-level image content (such as luminance or chromaticity) and/or inverting the image. We measure the effects of these manipulations on human scanpaths (the spatial and chronological path of fixations), additionally comparing these effects to those predicted by a widely used saliency model (L. Itti & C. Koch, 2000). Firstly we find that repeated viewing of a natural image does not significantly modify the previously known repeatability (S. A. Brandt & L. W. Stark, 1997; D. Noton & L. Stark, 1971) of scanpaths. Secondly we find that manipulating image features does not necessarily change the repeatability of scanpaths, but the removal of luminance information has a measurable effect. We also find that image inversion appears to affect scene perception and recognition and may alter fixation selection (although we only find an effect on scanpaths with the additional removal of luminance information). Additionally we confirm that visual saliency as defined by L. Itti and C. Koch's (2000) model is a poor predictor of real observer scanpaths and does not predict the small effects of our image manipulations on scanpaths. Image manipulation Eye movements Visual saliency Visual attention Natural senses Eye tracking Scanpaths Image inversion

Search results