Spelling suggestions: "subject:"salience detection"" "subject:"salient detection""
1 |
Video Deinterlacing using Control Grid Interpolation FrameworksJanuary 2012 (has links)
abstract: Video deinterlacing is a key technique in digital video processing, particularly with the widespread usage of LCD and plasma TVs. This thesis proposes a novel spatio-temporal, non-linear video deinterlacing technique that adaptively chooses between the results from one dimensional control grid interpolation (1DCGI), vertical temporal filter (VTF) and temporal line averaging (LA). The proposed method performs better than several popular benchmarking methods in terms of both visual quality and peak signal to noise ratio (PSNR). The algorithm performs better than existing approaches like edge-based line averaging (ELA) and spatio-temporal edge-based median filtering (STELA) on fine moving edges and semi-static regions of videos, which are recognized as particularly challenging deinterlacing cases. The proposed approach also performs better than the state-of-the-art content adaptive vertical temporal filtering (CAVTF) approach. Along with the main approach several spin-off approaches are also proposed each with its own characteristics. / Dissertation/Thesis / M.S. Electrical Engineering 2012
|
2 |
Visual saliency computation for image analysisZhang, Jianming 08 December 2016 (has links)
Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training.
|
3 |
Semantic Sparse Learning in Images and VideosJanuary 2014 (has links)
abstract: Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm design, should help achieving improved performance while delivering intuitive interpretation of the algorithmic outcomes. My study addresses the problem of how to explicitly consider the semantic information of the visual data in the sparse learning algorithms. In this work, we identify four problems which are of great importance and broad interest to the community. Specifically, a novel approach is proposed to incorporate label information to learn a dictionary which is not only reconstructive but also discriminative; considering the formation process of face images, a novel image decomposition approach for an ensemble of correlated images is proposed, where a subspace is built from the decomposition and applied to face recognition; based on the observation that, the foreground (or salient) objects are sparse in input domain and the background is sparse in frequency domain, a novel and efficient spatio-temporal saliency detection algorithm is proposed to identify the salient regions in video; and a novel hidden Markov model learning approach is proposed by utilizing a sparse set of pairwise comparisons among the data, which is easier to obtain and more meaningful, consistent than tradition labels, in many scenarios, e.g., evaluating motion skills in surgical simulations. In those four problems, different types of semantic information are modeled and incorporated in designing sparse learning algorithms for the corresponding visual computing tasks. Several real world applications are selected to demonstrate the effectiveness of the proposed methods, including, face recognition, spatio-temporal saliency detection, abnormality detection, spatio-temporal interest point detection, motion analysis and emotion recognition. In those applications, data of different modalities are involved, ranging from audio signal, image to video. Experiments on large scale real world data with comparisons to state-of-art methods confirm the proposed approaches deliver salient advantages, showing adding those semantic information dramatically improve the performances of the general sparse learning methods. / Dissertation/Thesis / Ph.D. Computer Science 2014
|
4 |
Semantic-oriented Object Segmentation / Segmentation d'objet pour l'interprétation sémantiqueZou, Wenbin 13 March 2014 (has links)
Cette thèse porte sur les problèmes de segmentation d’objets et la segmentation sémantique qui visent soit à séparer des objets du fond, soit à l’attribution d’une étiquette sémantique spécifique à chaque pixel de l’image. Nous proposons deux approches pour la segmentation d’objets, et une approche pour la segmentation sémantique. La première approche est basée sur la détection de saillance. Motivés par notre but de segmentation d’objets, un nouveau modèle de détection de saillance est proposé. Cette approche se formule dans le modèle de récupération de la matrice de faible rang en exploitant les informations de structure de l’image provenant d’une segmentation ascendante comme contrainte importante. La segmentation construite à l’aide d’un schéma d’optimisation itératif et conjoint, effectue simultanément, d’une part, une segmentation d’objets basée sur la carte de saillance résultant de sa détection et, d’autre part, une amélioration de la qualité de la saillance à l’aide de la segmentation. Une carte de saillance optimale et la segmentation finale sont obtenues après plusieurs itérations. La deuxième approche proposée pour la segmentation d’objets se fonde sur des images exemples. L’idée sous-jacente est de transférer les étiquettes de segmentation d’exemples similaires, globalement et localement, à l’image requête. Pour l’obtention des exemples les mieux assortis, nous proposons une représentation nouvelle de haut niveau de l’image, à savoir le descripteur orienté objet, qui reflète à la fois l’information globale et locale de l’image. Ensuite, un prédicteur discriminant apprend en ligne à l’aide les exemples récupérés pour attribuer à chaque région de l’image requête un score d’appartenance au premier plan. Ensuite, ces scores sont intégrés dans un schéma de segmentation du champ de Markov (MRF) itératif qui minimise l’énergie. La segmentation sémantique se fonde sur une banque de régions et la représentation parcimonieuse. La banque des régions est un ensemble de régions générées par segmentations multi-niveaux. Ceci est motivé par l’observation que certains objets peuvent être capturés à certains niveaux dans une segmentation hiérarchique. Pour la description de la région, nous proposons la méthode de codage parcimonieux qui représente chaque caractéristique locale avec plusieurs vecteurs de base du dictionnaire visuel appris, et décrit toutes les caractéristiques locales d’une région par un seul histogramme parcimonieux. Une machine à support de vecteurs (SVM) avec apprentissage de noyaux multiple est utilisée pour l’inférence sémantique. Les approches proposées sont largement évaluées sur plusieurs ensembles de données. Des expériences montrent que les approches proposées surpassent les méthodes de l’état de l’art. Ainsi, par rapport au meilleur résultat de la littérature, l’approche proposée de segmentation d’objets améliore la mesure d F-score de 63% à 68,7% sur l’ensemble de données Pascal VOC 2011. / This thesis focuses on the problems of object segmentation and semantic segmentation which aim at separating objects from background or assigning a specific semantic label to each pixel in an image. We propose two approaches for the object segmentation and one approach for semantic segmentation. The first proposed approach for object segmentation is based on saliency detection. Motivated by our ultimate goal for object segmentation, a novel saliency detection model is proposed. This model is formulated in the low-rank matrix recovery model by taking the information of image structure derived from bottom-up segmentation as an important constraint. The object segmentation is built in an iterative and mutual optimization framework, which simultaneously performs object segmentation based on the saliency map resulting from saliency detection, and saliency quality boosting based on the segmentation. The optimal saliency map and the final segmentation are achieved after several iterations. The second proposed approach for object segmentation is based on exemplar images. The underlying idea is to transfer segmentation labels of globally and locally similar exemplar images to the query image. For the purpose of finding the most matching exemplars, we propose a novel high-level image representation method called object-oriented descriptor, which captures both global and local information of image. Then, a discriminative predictor is learned online by using the retrieved exemplars. This predictor assigns a probabilistic score of foreground to each region of the query image. After that, the predicted scores are integrated into the segmentation scheme of Markov random field (MRF) energy optimization. Iteratively finding minimum energy of MRF leads the final segmentation. For semantic segmentation, we propose an approach based on region bank and sparse coding. Region bank is a set of regions generated by multi-level segmentations. This is motivated by the observation that some objects might be captured at certain levels in a hierarchical segmentation. For region description, we propose sparse coding method which represents each local feature descriptor with several basic vectors in the learned visual dictionary, and describes all local feature descriptors within a region by a single sparse histogram. With the sparse representation, support vector machine with multiple kernel learning is employed for semantic inference. The proposed approaches have been extensively evaluated on several challenging and widely used datasets. Experiments demonstrated the proposed approaches outperform the stateofthe- art methods. Such as, compared to the best result in the literature, the proposed object segmentation approach based on exemplar images improves the F-score from 63% to 68.7% on Pascal VOC 2011 dataset.
|
5 |
ANSWER : A Cognitively-Inspired System for the Unsupervised Detection of Semantically Salient Words in TextsCandadai Vasu, Madhavun 16 October 2015 (has links)
No description available.
|
6 |
Self-calibrating eye tracker using imagesaliency : Självkalibrerande ögonspårare medhjälp av image saliency / Självkalibrerande ögonspårare medhjälp av image saliency : Self-calibrating eye tracker using imagesaliencyVega, Gabriel January 2022 (has links)
Self-calibrating eye tracker using image saliency. / Självkalibrerande ögonspårare med hjälp av image saliency.
|
7 |
A computational model of visual attentionChilukamari, Jayachandra January 2017 (has links)
Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation.
|
8 |
Traitement des objets 3D et images par les méthodes numériques sur graphes / 3D object processing and Image processing by numerical methodsEl Sayed, Abdul Rahman 24 October 2018 (has links)
La détection de peau consiste à détecter les pixels correspondant à une peau humaine dans une image couleur. Les visages constituent une catégorie de stimulus importante par la richesse des informations qu’ils véhiculent car avant de reconnaître n’importe quelle personne il est indispensable de localiser et reconnaître son visage. La plupart des applications liées à la sécurité et à la biométrie reposent sur la détection de régions de peau telles que la détection de visages, le filtrage d'objets 3D pour adultes et la reconnaissance de gestes. En outre, la détection de la saillance des mailles 3D est une phase de prétraitement importante pour de nombreuses applications de vision par ordinateur. La segmentation d'objets 3D basée sur des régions saillantes a été largement utilisée dans de nombreuses applications de vision par ordinateur telles que la correspondance de formes 3D, les alignements d'objets, le lissage de nuages de points 3D, la recherche des images sur le web, l’indexation des images par le contenu, la segmentation de la vidéo et la détection et la reconnaissance de visages. La détection de peau est une tâche très difficile pour différentes raisons liées en général à la variabilité de la forme et la couleur à détecter (teintes différentes d’une personne à une autre, orientation et tailles quelconques, conditions d’éclairage) et surtout pour les images issues du web capturées sous différentes conditions de lumière. Il existe plusieurs approches connues pour la détection de peau : les approches basées sur la géométrie et l’extraction de traits caractéristiques, les approches basées sur le mouvement (la soustraction de l’arrière-plan (SAP), différence entre deux images consécutives, calcul du flot optique) et les approches basées sur la couleur. Dans cette thèse, nous proposons des méthodes d'optimisation numérique pour la détection de régions de couleurs de peaux et de régions saillantes sur des maillages 3D et des nuages de points 3D en utilisant un graphe pondéré. En se basant sur ces méthodes, nous proposons des approches de détection de visage 3D à l'aide de la programmation linéaire et de fouille de données (Data Mining). En outre, nous avons adapté nos méthodes proposées pour résoudre le problème de la simplification des nuages de points 3D et de la correspondance des objets 3D. En plus, nous montrons la robustesse et l’efficacité de nos méthodes proposées à travers de différents résultats expérimentaux réalisés. Enfin, nous montrons la stabilité et la robustesse de nos méthodes par rapport au bruit. / Skin detection involves detecting pixels corresponding to human skin in a color image. The faces constitute a category of stimulus important by the wealth of information that they convey because before recognizing any person it is essential to locate and recognize his face. Most security and biometrics applications rely on the detection of skin regions such as face detection, 3D adult object filtering, and gesture recognition. In addition, saliency detection of 3D mesh is an important pretreatment phase for many computer vision applications. 3D segmentation based on salient regions has been widely used in many computer vision applications such as 3D shape matching, object alignments, 3D point-point smoothing, searching images on the web, image indexing by content, video segmentation and face detection and recognition. The detection of skin is a very difficult task for various reasons generally related to the variability of the shape and the color to be detected (different hues from one person to another, orientation and different sizes, lighting conditions) and especially for images from the web captured under different light conditions. There are several known approaches to skin detection: approaches based on geometry and feature extraction, motion-based approaches (background subtraction (SAP), difference between two consecutive images, optical flow calculation) and color-based approaches. In this thesis, we propose numerical optimization methods for the detection of skins color and salient regions on 3D meshes and 3D point clouds using a weighted graph. Based on these methods, we provide 3D face detection approaches using Linear Programming and Data Mining. In addition, we adapted our proposed methods to solve the problem of simplifying 3D point clouds and matching 3D objects. In addition, we show the robustness and efficiency of our proposed methods through different experimental results. Finally, we show the stability and robustness of our methods with respect to noise.
|
9 |
Détection des émotions à partir de vidéos dans un environnement non contrôlé / Detection of emotions from video in non-controlled environmentKhan, Rizwan Ahmed 14 November 2013 (has links)
Dans notre communication quotidienne avec les autres, nous avons autant de considération pour l’interlocuteur lui-même que pour l’information transmise. En permanence coexistent en effet deux modes de transmission : le verbal et le non-verbal. Sur ce dernier thème intervient principalement l’expression faciale avec laquelle l’interlocuteur peut révéler d’autres émotions et intentions. Habituellement, un processus de reconnaissance d’émotions faciales repose sur 3 étapes : le suivi du visage, l’extraction de caractéristiques puis la classification de l’expression faciale. Pour obtenir un processus robuste apte à fournir des résultats fiables et exploitables, il est primordial d’extraire des caractéristiques avec de forts pouvoirs discriminants (selon les zones du visage concernées). Les avancées récentes de l’état de l’art ont conduit aujourd’hui à diverses approches souvent bridées par des temps de traitement trop couteux compte-tenu de l’extraction de descripteurs sur le visage complet ou sur des heuristiques mathématiques et/ou géométriques.En fait, aucune réponse bio-inspirée n’exploite la perception humaine dans cette tâche qu’elle opère pourtant régulièrement. Au cours de ces travaux de thèse, la base de notre approche fut ainsi de singer le modèle visuel pour focaliser le calcul de nos descripteurs sur les seules régions du visage essentielles pour la reconnaissance d’émotions. Cette approche nous a permis de concevoir un processus plus naturel basé sur ces seules régions émergentes au regard de la perception humaine. Ce manuscrit présente les différentes méthodologies bio-inspirées mises en place pour aboutir à des résultats qui améliorent généralement l’état de l’art sur les bases de référence. Ensuite, compte-tenu du fait qu’elles se focalisent sur les seules parties émergentes du visage, elles améliorent les temps de calcul et la complexité des algorithmes mis en jeu conduisant à une utilisation possible pour des applications temps réel. / Communication in any form i.e. verbal or non-verbal is vital to complete various daily routine tasks and plays a significant role inlife. Facial expression is the most effective form of non-verbal communication and it provides a clue about emotional state, mindset and intention. Generally automatic facial expression recognition framework consists of three step: face tracking, feature extraction and expression classification. In order to built robust facial expression recognition framework that is capable of producing reliable results, it is necessary to extract features (from the appropriate facial regions) that have strong discriminative abilities. Recently different methods for automatic facial expression recognition have been proposed, but invariably they all are computationally expensive and spend computational time on whole face image or divides the facial image based on some mathematical or geometrical heuristic for features extraction. None of them take inspiration from the human visual system in completing the same task. In this research thesis we took inspiration from the human visual system in order to find from where (facial region) to extract features. We argue that the task of expression analysis and recognition could be done in more conducive manner, if only some regions are selected for further processing (i.e.salient regions) as it happens in human visual system. In this research thesis we have proposed different frameworks for automatic recognition of expressions, all getting inspiration from the human vision. Every subsequently proposed addresses the shortcomings of the previously proposed framework. Our proposed frameworks in general, achieve results that exceeds state-of-the-artmethods for expression recognition. Secondly, they are computationally efficient and simple as they process only perceptually salient region(s) of face for feature extraction. By processing only perceptually salient region(s) of the face, reduction in feature vector dimensionality and reduction in computational time for feature extraction is achieved. Thus making them suitable for real-time applications.
|
10 |
Traffic analysis of low and ultra-low frame-rate videos / Analyse de trafic routier à partir de vidéos à faible débitLuo, Zhiming January 2017 (has links)
Abstract: Nowadays, traffic analysis are relying on data collected from various traffic sensors. Among the various traffic surveillance techniques, video surveillance systems are often used for monitoring and characterizing traffic load. In this thesis, we focused on two aspects of traffic analysis without using motion features in low frame-rate videos: Traffic density flow analysis and Vehicle detection and classification. Traffic density flow analysis}: Knowing in real time when the traffic is fluid or when it jams is a key information to help authorities re-route vehicles and reduce congestion. Accurate and timely traffic flow information is strongly needed by individual travelers, the business sectors and government agencies. In this part, we investigated the possibility of monitoring highway traffic based on videos whose frame rate is too low to accurately estimate motion features. As we are focusing on analyzing traffic images and low frame-rate videos, traffic density is defined as the percentage of road being occupied by vehicles. In our previous work, we validated that traffic status is highly correlated to its texture features and that Convolutional Neural Networks (CNN) has the superiority of extracting discriminative texture features. We proposed several CNN models to segment traffic images into three different classes (road, car and background), classify traffic images into different categories (empty, fluid, heavy, jam) and predict traffic density without using any motion features. In order to generalize the model trained on a specific dataset to analyze new traffic scenes, we also proposed a novel transfer learning framework to do model adaptation. Vehicle detection and classification: The detection of vehicles pictured by traffic cameras is often the very first step of video surveillance systems, such as vehicle counting, tracking and retrieval. In this part, we explore different deep learning methods applied to vehicle detection and classification. Firstly, realizing the importance of large dataset for traffic analysis, we built and released the largest traffic dataset (MIO-TCD) in the world for vehicle localization and classification in collaboration with colleagues from Miovision inc. (Waterloo, On). With this dataset, we organized the Traffic Surveillance Workshop and Challenge in conjunction with CVPR 2017. Secondly, we evaluated several state-of-the-art deep learning methods for the classification and localization task on the MIO-TCD dataset. In light of the results, we may conclude that state-of-the-art deep learning methods exhibit a capacity to localize and recognize vehicle from a single video frame. While with a deep analysis of the results, we also identify scenarios for which state-of-the-art methods are still failing and propose concrete ideas for future work. Lastly, as saliency detection aims to highlight the most relevant objects in an image (e.g. vehicles in traffic scenes), we proposed a multi-resolution 4*5 grid CNN model for the salient object detection. The model enables near real-time high performance saliency detection. We also extend this model to do traffic analysis, experiment results show that our model can precisely segment foreground vehicles in traffic scenes. / De nos jours, l’analyse de trafic routier est de plus en plus automatisée et s’appuie sur des données issues de senseurs en tout genre. Parmi les approches d’analyse de trafic routier figurent les méthodes à base de vidéo. Les méthodes à base de vidéo ont pour but d’identifier et de reconnaître les objets en mouvement (généralement des voitures et des piétons) et de comprendre leur dynamique. Un des défis parmi les plus difficile à résoudre est d’analyser des séquences vidéo dont le nombre d’images par seconde est très faible. Ce type de situation est pourtant fréquent considérant qu’il est très difficile (voir impossible) de transmettre et de stocker sur un serveur un très grand nombre d’images issues de plusieurs caméras. Dans ce cas, les méthodes issues de l’état de l’art échouent car un faible nombre d’images par seconde ne permet pas d’extraire les caractéristiques vidéos utilisées par ces méthodes tels le flux optique, la détection de mouvement et le suivi de véhicules. Au cours de cette thèse, nous nous sommes concentré sur l’analyse de trafic routier à partir de séquences vidéo contenant un très faible nombre d’images par seconde. Plus particulièrement, nous nous sommes concentrés sur les problème d’estimation de la densité du trafic routier et de la classification de véhicules. Pour ce faire, nous avons proposé différents modèles à base de réseaux de neurones profonds (plus particulièrement des réseaux à convolution) ainsi que de nouvelles bases de données permettant d’entraîner les dits modèles. Parmi ces bases de données figure « MIO-TCD », la plus grosse base de données annotées au monde faite pour l’analyse de trafic routier.
|
Page generated in 0.0927 seconds