Global ETD Search

51	Semantic Sparse Learning in Images and Videos January 2014 (has links) abstract: Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm design, should help achieving improved performance while delivering intuitive interpretation of the algorithmic outcomes. My study addresses the problem of how to explicitly consider the semantic information of the visual data in the sparse learning algorithms. In this work, we identify four problems which are of great importance and broad interest to the community. Specifically, a novel approach is proposed to incorporate label information to learn a dictionary which is not only reconstructive but also discriminative; considering the formation process of face images, a novel image decomposition approach for an ensemble of correlated images is proposed, where a subspace is built from the decomposition and applied to face recognition; based on the observation that, the foreground (or salient) objects are sparse in input domain and the background is sparse in frequency domain, a novel and efficient spatio-temporal saliency detection algorithm is proposed to identify the salient regions in video; and a novel hidden Markov model learning approach is proposed by utilizing a sparse set of pairwise comparisons among the data, which is easier to obtain and more meaningful, consistent than tradition labels, in many scenarios, e.g., evaluating motion skills in surgical simulations. In those four problems, different types of semantic information are modeled and incorporated in designing sparse learning algorithms for the corresponding visual computing tasks. Several real world applications are selected to demonstrate the effectiveness of the proposed methods, including, face recognition, spatio-temporal saliency detection, abnormality detection, spatio-temporal interest point detection, motion analysis and emotion recognition. In those applications, data of different modalities are involved, ranging from audio signal, image to video. Experiments on large scale real world data with comparisons to state-of-art methods confirm the proposed approaches deliver salient advantages, showing adding those semantic information dramatically improve the performances of the general sparse learning methods. / Dissertation/Thesis / Ph.D. Computer Science 2014 Computer science dictionary learning face recognition motion analysis saliency detection semantic sparse learning
52	Modelando a atenção seletiva e a saliência visual através de redes complexas / Modeling the selective attention and visual saliency using complex networks Gustavo Vrech Rigo 22 July 2010 (has links) A atenção seletiva é uma característica central do sistema visual humano, uma vez que todo o cérebro é otimizado de modo a perceber as informações ao seu redor da forma mais rápida possível. Porém, em geral os trabalhos nesta área apenas verificam quais são as regiões de maior freqüência da atenção seletiva, dando pouca importância para a sua mecânica. A presente dissertação propõe um modelo que represente a atenção seletiva como uma rede complexa, combinando naturalmente as áreas de redes complexas, cadeias de Markov, análise de imagens, atenção seletiva e saliência visual num modelo biologicamente plausível para simular a atenção seletiva. O modelo propõe que pontos importantes da imagem, pontos salientes, sejam caracterizados como vértices da rede complexa, e que as arestas sejam distribuídas de acordo com a probabilidade da mudança de atenção entre dois vértices. Desta forma, a mecânica da atenção seletiva seria simulada pela mecânica da rede complexa correspondente. Foram estudadas imagens em níveis de cinza, sendo estas correspondentes à cena observada. A probabilidade de mudança entre duas regiões, as arestas da rede, foram definidas através de diversos métodos de composição da saliência visual, e as redes resultantes comparadas com redes complexas provenientes de um experimento protótipo realizado. A partir deste experimento foram propostos refinamentos no modelo original, tornando assim a mecânica do modelo o mais próximo possível da mecânica humana da atenção seletiva. / Selective attention is a central feature of the human visual system, since the entire brain is optimized in order to understand the information around as quickly as possible. In general works in this area only search which regions has a higher frequency of selective attention, with little consideration for their mechanics. This study proposes a model that represents the selective attention as a complex network, combining naturally areas of complex networks, Markov chains, image analysis, selective attention and visual salience in a biologically plausible model to simulate the selective attention. The model proposes that the important points of the image, salient points, are identified as vertices of the complex network, and the edges are distributed according to the probability of shift of attention between two vertices. Thus, the mechanics of selective attention would be simulated by the mechanics of correspondent complex network. We studied images in gray levels, which are corresponding to the scene observed. The probability of switching between two regions, the edges of the network were identified through various methods of visual saliency composition, and the resulting networks compared with complex networks from a prototype experiment performed. From this experiment were proposed refinements to the original model, thereby making the mechanical design as close as possible to the mechanics of human selective attention. Atenção seletiva Redes complexas Saliência visual Complex networks Selective attention Visual saliency
53	The effect of colour use on the quality of websites Grijseels, Dorieke January 2016 (has links) The design of a website is important for the success of a company. Colours play an important part in websites. The goal of this thesis is to find out how the use of colour in websites relates to the quality of websites. Different aspects are studied. First it was found that the harmony of a colour palette only weakly correlates with the quality of a website. This correlation increases when only darker colour palettes are used. Next a method was proposed to extract the colour palette from a website. This novel method takes the saliency of the pixels in a website into account. Lastly, the palettes extracted using this method were utilized to propose a model to explain the relation between colour use and quality of websites. Sixty-one different features were tested using three different methods of feature selection. The accuracy achieved in the best model was low. Future work is suggested to improve on this, which should focus on identifying more relevant features and training the model using a better database. colour websites colour harmony saliency machine learning feature selection Human Computer Interaction
54	Saliency Maps using Channel Representations / Saliency-kartor utifrån kanalrepresentationer Tuttle, Alexander January 2010 (has links) In this thesis an algorithm for producing saliency maps as well as an algorithm for detecting salient regions based on the saliency map was developed. The saliency values are computed as center-surround differences and a local descriptor called the region p-channel is used to represent center and surround respectively. An integral image representation called the integral p-channel is used to speed up extraction of the local descriptor for any given image region. The center-surround difference is calculated as either histogram or p-channel dissimilarities. Ground truth was collected using human subjects and the algorithm’s ability to detect salient regions was evaluated against this ground truth. The algorithm was also compared to another saliency algorithm. Two different center-surround interpretations are tested, as well as several p-channel and histogram dissimilarity measures. The results show that for all tested settings the best performing dissimilarity measure is the so called diffusion distance. The performance comparison showed that the algorithm developed in this thesis outperforms the algorithm against which it was compared, both with respect to region detection and saliency ranking of regions. It can be concluded that the algorithm shows promising results and further investigation of the algorithm is recommended. A list of suggested approaches for further research is provided. computer vision saliency maps p-channels
55	Functional neuroimaging of pathophysiological mesolimbic dopamine system and aberrant motivational salience in schizophrenia Richter, Anja 02 April 2017 (has links) No description available. 570 fMRI schizophrenia saliency mesolimbic dopamine system functional activity and connectivity Biologie (PPN619462639)
56	Visual saliency and eye movement:modeling and applications Rezazadegan Tavakoli, H. (Hamed) 04 November 2014 (has links) Abstract Humans are capable of narrowing their focus on the highlights of visual information in a fraction of time in order to handle enormous mass of data. Akin to human, computers should deal with a tremendous amount of visual information. To replicate such a focusing mechanism, computer vision relies on techniques that filter out redundant information. Consequently, saliency has recently been a popular subject of discussion in the computer vision community, though it is an old subject matter in the disciplines of cognitive sciences rather than computer science. The reputation of saliency techniques – particularly in the computer vision domain – is greatly due to their inexpensive and fast computation which facilitates their use in many computer vision applications, e.g., image/video compression, object recognition, tracking, etc. This study investigates visual saliency modeling, which is the transformation of an image into a salience map such that the identified conspicuousness agrees with the statistics of human eye movements. It explores the extent of image and video processing to develop saliency techniques suitable for computer vision, e.g., it adopts sparse sampling scheme and kernel density estimation to introduce a saliency measure for images. Also, it studies the role of eye movement in salience modeling. To this end, it introduces a particle filter based framework of saccade generation incorporated into a salience model. Moreover, eye movements and salience are exploited in several applications. The contributions of this study lie on the proposal of a number of salience models for image and video stimuli, a framework to incorporate a model of eye movement generation in salience modeling, and the investigation of the application of salience models and eye movements in tracking, background subtraction, scene recognition, and valence recognition. / Tiivistelmä Ihmiset kykenevät kohdistamaan katseensa hetkessä näkymän keskeisiin asioihin, mikä vaatii näköjärjestelmältä valtavan suurten tietomäärien käsittelyä. Kuten ihmisen myös tietokoneen pitäisi pystyä käsittelemään vastaavasti suurta määrää visuaalista informaatiota. Tällaisen mekanismin toteuttaminen tietokonenäöllä edellyttää menetelmiä, joilla redundanttista tietoa voidaan suodattaa. Tämän vuoksi salienssista eli silmiinpistävyydestä on muodostunut viime aikoina suosittu tutkimusaihe tietotekniikassa ja erityisesti tietokonenäön tutkimusyhteisössä, vaikka sitä sinänsä on jo pitkään tutkittu kognitiivisissa tieteissä. Salienssimenetelmien tunnettavuus erityisesti tietokonenäössä johtuu pääasiassa niiden laskennallisesta tehokkuudesta, mikä taas mahdollistaa menetelmien käytön monissa tietokonenäön sovelluksissa kuten kuvan ja videon pakkaamisessa, objektin tunnistuksessa, seurannassa, etc. Tässä väitöskirjassa tutkitaan visuaalisen salienssin mallintamista, millä tarkoitetaan muunnosta kuvasta salienssikartaksi siten, että laskennallinen silmiinpistävyys vastaa ihmisen silmänliikkeistä muodostettavaa statistiikkaa. Työssä tarkastellaan keinoja, miten kuvan- ja videonkäsittelyä voidaan käyttää kehittämään salienssimenetelmiä tietokonenäön tarpeisiin. Työssä esitellään esimerkiksi harvaa näytteistystä ja ydinestimointia hyödyntävä kuvien salienssimitta. Työssä tutkitaan myös silmänliikkeiden merkitystä salienssin mallintamisen kannalta. Tätä varten esitellään partikkelisuodatusta hyödyntävä lähestymistapa sakkadien generointiin, joka voidaan liittää salienssimalliin. Lisäksi silmänliikkeitä ja salienssia hyödynnetään useissa sovelluksissa. Suoritetun tutkimuksen tieteellisiin kontribuutioihin sisältyvät useat esitetyt salienssimallit kuvasta ja videosta saatavalle herätteelle, lähestymistapa silmänliikkeiden laskennalliseen mallintamiseen ja generointiin osana salienssimallia sekä salienssimallien ja silmänliikkeiden sovellettavuuden tutkiminen visuaalisessa seurannassa, taustanvähennyksessä, näkymäanalyysissa ja valenssin tunnistuksessa. computer vision pattern recognition saliency map vision system visual attention hahmontunnistus näköjärjestelmä salienssikartta tietokonenäkö visuaalinen tarkkaavaisuus
57	Semantic-oriented Object Segmentation / Segmentation d'objet pour l'interprétation sémantique Zou, Wenbin 13 March 2014 (has links) Cette thèse porte sur les problèmes de segmentation d’objets et la segmentation sémantique qui visent soit à séparer des objets du fond, soit à l’attribution d’une étiquette sémantique spécifique à chaque pixel de l’image. Nous proposons deux approches pour la segmentation d’objets, et une approche pour la segmentation sémantique. La première approche est basée sur la détection de saillance. Motivés par notre but de segmentation d’objets, un nouveau modèle de détection de saillance est proposé. Cette approche se formule dans le modèle de récupération de la matrice de faible rang en exploitant les informations de structure de l’image provenant d’une segmentation ascendante comme contrainte importante. La segmentation construite à l’aide d’un schéma d’optimisation itératif et conjoint, effectue simultanément, d’une part, une segmentation d’objets basée sur la carte de saillance résultant de sa détection et, d’autre part, une amélioration de la qualité de la saillance à l’aide de la segmentation. Une carte de saillance optimale et la segmentation finale sont obtenues après plusieurs itérations. La deuxième approche proposée pour la segmentation d’objets se fonde sur des images exemples. L’idée sous-jacente est de transférer les étiquettes de segmentation d’exemples similaires, globalement et localement, à l’image requête. Pour l’obtention des exemples les mieux assortis, nous proposons une représentation nouvelle de haut niveau de l’image, à savoir le descripteur orienté objet, qui reflète à la fois l’information globale et locale de l’image. Ensuite, un prédicteur discriminant apprend en ligne à l’aide les exemples récupérés pour attribuer à chaque région de l’image requête un score d’appartenance au premier plan. Ensuite, ces scores sont intégrés dans un schéma de segmentation du champ de Markov (MRF) itératif qui minimise l’énergie. La segmentation sémantique se fonde sur une banque de régions et la représentation parcimonieuse. La banque des régions est un ensemble de régions générées par segmentations multi-niveaux. Ceci est motivé par l’observation que certains objets peuvent être capturés à certains niveaux dans une segmentation hiérarchique. Pour la description de la région, nous proposons la méthode de codage parcimonieux qui représente chaque caractéristique locale avec plusieurs vecteurs de base du dictionnaire visuel appris, et décrit toutes les caractéristiques locales d’une région par un seul histogramme parcimonieux. Une machine à support de vecteurs (SVM) avec apprentissage de noyaux multiple est utilisée pour l’inférence sémantique. Les approches proposées sont largement évaluées sur plusieurs ensembles de données. Des expériences montrent que les approches proposées surpassent les méthodes de l’état de l’art. Ainsi, par rapport au meilleur résultat de la littérature, l’approche proposée de segmentation d’objets améliore la mesure d F-score de 63% à 68,7% sur l’ensemble de données Pascal VOC 2011. / This thesis focuses on the problems of object segmentation and semantic segmentation which aim at separating objects from background or assigning a specific semantic label to each pixel in an image. We propose two approaches for the object segmentation and one approach for semantic segmentation. The first proposed approach for object segmentation is based on saliency detection. Motivated by our ultimate goal for object segmentation, a novel saliency detection model is proposed. This model is formulated in the low-rank matrix recovery model by taking the information of image structure derived from bottom-up segmentation as an important constraint. The object segmentation is built in an iterative and mutual optimization framework, which simultaneously performs object segmentation based on the saliency map resulting from saliency detection, and saliency quality boosting based on the segmentation. The optimal saliency map and the final segmentation are achieved after several iterations. The second proposed approach for object segmentation is based on exemplar images. The underlying idea is to transfer segmentation labels of globally and locally similar exemplar images to the query image. For the purpose of finding the most matching exemplars, we propose a novel high-level image representation method called object-oriented descriptor, which captures both global and local information of image. Then, a discriminative predictor is learned online by using the retrieved exemplars. This predictor assigns a probabilistic score of foreground to each region of the query image. After that, the predicted scores are integrated into the segmentation scheme of Markov random field (MRF) energy optimization. Iteratively finding minimum energy of MRF leads the final segmentation. For semantic segmentation, we propose an approach based on region bank and sparse coding. Region bank is a set of regions generated by multi-level segmentations. This is motivated by the observation that some objects might be captured at certain levels in a hierarchical segmentation. For region description, we propose sparse coding method which represents each local feature descriptor with several basic vectors in the learned visual dictionary, and describes all local feature descriptors within a region by a single sparse histogram. With the sparse representation, support vector machine with multiple kernel learning is employed for semantic inference. The proposed approaches have been extensively evaluated on several challenging and widely used datasets. Experiments demonstrated the proposed approaches outperform the stateofthe- art methods. Such as, compared to the best result in the literature, the proposed object segmentation approach based on exemplar images improves the F-score from 63% to 68.7% on Pascal VOC 2011 dataset. Segmentation d’objets Segmentation sémantique Détection de saillance Object segmentation Semantic segmentation Saliency detection 621.367
58	Exploitation de la multimodalité pour l'analyse de la saillance et l'évaluation de la qualité audiovisuelle / Exploitation of multimodality for saliency analysis and audiovisual quality assessment Sidaty, Naty 11 December 2015 (has links) Les données audiovisuelles font partie de notre quotidien que ce soit pour des besoins professionnels ou tout simplement pour le loisir. Les quantités pléthoriques de ces données imposent un recours à la compression pour le stockage ou la diffusion, ce qui peut altérer la qualité audio-visuelle si les aspects perceptuels ne sont pas pris en compte. L’état de l’art sur la saillance et la qualité est très riche, ignorant souvent l’existence de la composante audio qui a un rôle important dans le parcours visuel et la qualité de l’expérience. Cette thèse a pour objectif de contribuer à combler le manque d’approches multimodales et ce, en suivant une démarche expérimentale dédiée. Les travaux associés se déclinent en deux parties : l’attention audiovisuelle et la qualité multimodale. Tout d'abord, afin de comprendre et d’analyser l’influence de l’audio sur les mouvements oculaires humains, nous avons mené une expérimentation oculométriques impliquant un panel d’observateurs, et exploitant une base de vidéos construite pour ce contexte. L'importance des visages a ainsi été confortée mais en particulier pour les visages parlants qui ont une saillance accrue. Sur la base de ces résultats, nous avons proposé un modèle de saillance audiovisuelle basé sur la détection des locuteurs dans la vidéo et exploitant les informations de bas niveau spatiales et temporelles. Par la suite, nous avons étudié l’influence de l’audio sur la qualité multimodale et multi-supports. A cette fin, des campagnes d’évaluations psychovisuelles ont été menées dans l’optique de quantifier la qualité multimodale pour des applications de streaming vidéo où différents dispositifs de visualisation sont utilisés. / Audiovisual information are part of our daily life either for professional needs or simply for leisure purposes. The plethoric quantity of data requires the use of compression for both storage and transmission, which may alter the audiovisual quality if it does not account for perceptual aspects. The literature on saliency and quality is very rich, often ignoring the audio component playing an important role in the visual scanpath and the quality of experience. This thesis aims at contributing in overing the lack of multimodal approaches, by following a dedicated experimental procedures. The proposed work is twofold: visual attention modelling and multimodal quality evaluation. First, in order to better understand and analyze the influence of audio on humain ocular movements, we run several eyetracking experiments involving a panel of observers and exploiting a video dataset constructed for our context. The importance of faces has been confirmed, particularly for talking faces having an increased saliency. Following these results, we proposed an audiovisual saliency model based on locutors detection in video and relying on spatial and temporal low-level features. Afterward, the influence of audio on multi-modal and multi-devices quality has been studied. To this end, psychovisual experiments have been conducted with the aim to quantify the multimodal quality in the context of video streaming applications where various display devices could be used. Saillance audiovisuelle Expérimentations oculométriques Qualité multimodale Compression Audiovisual saliency Eye-Tracking experiments Multimodal quality Compression 006.42
59	Describing and retrieving visual content using natural language Ramanishka, Vasili 11 February 2021 (has links) Modern deep learning methods have boosted research progress in visual recognition and text understanding but it is a non-trivial task to unite these advances from both disciplines. In this thesis, we develop models and techniques that allow us to connect natural language and visual content enabling automatic video subtitling, visual grounding, and text-based image search. Such models could be useful in a wide range of applications in robotics and human-computer interaction bridging the gap in vision and language understanding. First, we develop a model that generates natural language descriptions of the main activities and scenes depicted in short videos. While previous methods were constrained to a predefined list of objects, actions, or attributes, our model learns to generate descriptions directly from raw pixels. The model exploits available audio information and the video’s category (e.g., cooking, movie, education) to generate more relevant and coherent sentences. Then, we introduce a technique for visual grounding of generated sentences using the same video description model. Our approach allows for explaining the model’s prediction by localizing salient video regions for corresponding words in the generated sentence. Lastly, we address the problem of image retrieval. Existing cross-modal retrieval methods work by learning a common embedding space for different modalities using parallel data such as images and their accompanying descriptions. Instead, we focus on the case when images are connected by relative annotations: given the context set as an image and its metadata, the user can specify desired semantic changes using natural language instructions. The model needs to capture distinctive visual differences between image pairs as described by the user. Our approach enables interactive image search such that the natural language feedback significantly improves the efficacy of image retrieval. We show that the proposed methods advance the state-of-the-art for video captioning and image retrieval tasks in terms of both accuracy and interpretability. Computer science Automatic video captioning Computer vision Cross-modal retrieval Natural language processing Visual saliency
60	Comparing Human Reasoning and Explainable AI Helgstrand, Carl Johan, Hultin, Niklas January 2022 (has links) Explainable AI (XAI) is a research field dedicated to formulating avenues of breaching the black box nature of many of today’s machine learning models. As society finds new ways of applying these models in everyday life, certain risk thresholds are crossed when society replaces human decision making with autonomous systems. How can we trust the algorithms to make sound judgement when all we provide is input and all they provide is an output? XAI methods examine different data points in the machine learning process to determine what factors influenced the decision making. While these methods of post-hoc explanation may provide certain insights, previous studies into XAI have found the designs to often be biased towards the designers and do not incorporate necessary interdisciplinary fields to improve user understanding. In this thesis, we look at animal classification and what features in animal images were found to be important by humans. We use a novel approach of first letting the participants create their own post-hoc explanations, before asking them to evaluate real XAI explanations as well as a pre-made human explanation generated from a test group. The results show strong cohesion in the participants' answers and can provide guidelines for designing XAI explanations more closely related to human reasoning. The data also indicates a preference for human-like explanations within the context of this study. Additionally, a potential bias was identified as participants preferred explanations marking large portions of an image as important, even if many of the important areas coincided with what the participants themselves considered to be unimportant. While the sample pool and data gathering tools are limiting, the results points toward a need for additional research into comparisons of human reasoning and XAI explanations and how it may affect the evaluation of, and bias towards, explanation methods. Explainable AI XAI Visual Explanations Saliency Maps Artificial Intelligence Computer Sciences Datavetenskap (datalogi)

Search results