Global ETD Search

1	Saliency Maps using Channel Representations / Saliency-kartor utifrån kanalrepresentationer Tuttle, Alexander January 2010 (has links) <p>In this thesis an algorithm for producing saliency maps as well as an algorithm for detecting salient regions based on the saliency map was developed. The saliency values are computed as center-surround differences and a local descriptor called the region p-channel is used to represent center and surround respectively. An integral image representation called the integral p-channel is used to speed up extraction of the local descriptor for any given image region. The center-surround difference is calculated as either histogram or p-channel dissimilarities.</p><p>Ground truth was collected using human subjects and the algorithm’s ability to detect salient regions was evaluated against this ground truth. The algorithm was also compared to another saliency algorithm.</p><p>Two different center-surround interpretations are tested, as well as several p-channel and histogram dissimilarity measures. The results show that for all tested settings the best performing dissimilarity measure is the so called diffusion distance. The performance comparison showed that the algorithm developed in this thesis outperforms the algorithm against which it was compared, both with respect to region detection and saliency ranking of regions. It can be concluded that the algorithm shows promising results and further investigation of the algorithm is recommended. A list of suggested approaches for further research is provided.</p> computer vision saliency maps p-channels Image analysis Bildanalys
2	Saliency Maps using Channel Representations / Saliency-kartor utifrån kanalrepresentationer Tuttle, Alexander January 2010 (has links) In this thesis an algorithm for producing saliency maps as well as an algorithm for detecting salient regions based on the saliency map was developed. The saliency values are computed as center-surround differences and a local descriptor called the region p-channel is used to represent center and surround respectively. An integral image representation called the integral p-channel is used to speed up extraction of the local descriptor for any given image region. The center-surround difference is calculated as either histogram or p-channel dissimilarities. Ground truth was collected using human subjects and the algorithm’s ability to detect salient regions was evaluated against this ground truth. The algorithm was also compared to another saliency algorithm. Two different center-surround interpretations are tested, as well as several p-channel and histogram dissimilarity measures. The results show that for all tested settings the best performing dissimilarity measure is the so called diffusion distance. The performance comparison showed that the algorithm developed in this thesis outperforms the algorithm against which it was compared, both with respect to region detection and saliency ranking of regions. It can be concluded that the algorithm shows promising results and further investigation of the algorithm is recommended. A list of suggested approaches for further research is provided. computer vision saliency maps p-channels
3	Comparing Human Reasoning and Explainable AI Helgstrand, Carl Johan, Hultin, Niklas January 2022 (has links) Explainable AI (XAI) is a research field dedicated to formulating avenues of breaching the black box nature of many of today’s machine learning models. As society finds new ways of applying these models in everyday life, certain risk thresholds are crossed when society replaces human decision making with autonomous systems. How can we trust the algorithms to make sound judgement when all we provide is input and all they provide is an output? XAI methods examine different data points in the machine learning process to determine what factors influenced the decision making. While these methods of post-hoc explanation may provide certain insights, previous studies into XAI have found the designs to often be biased towards the designers and do not incorporate necessary interdisciplinary fields to improve user understanding. In this thesis, we look at animal classification and what features in animal images were found to be important by humans. We use a novel approach of first letting the participants create their own post-hoc explanations, before asking them to evaluate real XAI explanations as well as a pre-made human explanation generated from a test group. The results show strong cohesion in the participants' answers and can provide guidelines for designing XAI explanations more closely related to human reasoning. The data also indicates a preference for human-like explanations within the context of this study. Additionally, a potential bias was identified as participants preferred explanations marking large portions of an image as important, even if many of the important areas coincided with what the participants themselves considered to be unimportant. While the sample pool and data gathering tools are limiting, the results points toward a need for additional research into comparisons of human reasoning and XAI explanations and how it may affect the evaluation of, and bias towards, explanation methods. Explainable AI XAI Visual Explanations Saliency Maps Artificial Intelligence Computer Sciences Datavetenskap (datalogi)
4	Mesure sans référence de la qualité des vidéos haute déﬁnition diffusées avec des pertes de transmission / No-Reference Video Quality Assessment of High Deﬁnition Video Streams Delivered with Losses Boujut, Hugo 24 September 2012 (has links) Les objectifs de ce travail de thèse ont été: d’une part de détecter automatique-ment les images gelées dans des vidéos télédiffusées; et d’autre part de mesurer sans référencela qualité des vidéos télédiffusées (IP et DVB-T). Ces travaux ont été effectués dans le cadred’un projet de recherche mené conjointement par le LaBRI et la société Audemat WorldCastSystems.Pour la détection d’images gelées, trois méthodes ont été proposées: MV (basée vecteurde mouvement), DC (basée sur les coefﬁcients DC de la DCT) et SURF (basée sur les pointscaractéristiques SURF). Les deux premières méthodes ne nécessitent qu’un décodage partieldu ﬂux vidéo.Le second objectif était de mesurer sans référence la qualité des vidéos télédiffusées (IP etDVB-T). Une métrique a été développée pour mesurer la qualité perçue lorsque le ﬂux vidéoa été altéré par des pertes de transmission. Cette métrique "Weighted Macro-Block ErrorRate" (WMBER) est fondée sur la mesure de la saillance visuelle et la détection des macro-blocs endommagés. Le rôle de la saillance visuelle est de pondérer l’importance des erreursdétectées. Certaines améliorations ont été apportées à la construction des cartes de saillancespatio-temporelle. En particulier, la fusion des cartes de saillance spatiale et temporelle aété améliorée par rapport à l’état de l’art. Par ailleurs, plusieurs études ont montré que lasémantique d’une scène visuelle avait une inﬂuence sur le comportement du système visuelhumain. Il apparaît que ce sont surtout les visages humains qui attirent le regard. C’est laraison pour laquelle nous avons ajouté une dimension sémantique aux cartes de saillancespatio-temporelle. Cette dimension sémantique est essentiellement basée sur le détecteurde visage de Viola Jones. Pour prédire la qualité perçue par les utilisateurs, nous avonsutilisé une méthode par apprentissage supervisé. Cette méthode offre ainsi la possibilité deprédire la métrique subjective "Mean Opinion Score" (MOS) à partir de mesures objectivestelles que le WMBER, PSNR ou SSIM. Une expérience psycho-visuelle a été menée avec 50sujets pour évaluer ces travaux. Cette base de données vidéo Haute-Déﬁnition est en coursde transfert à l’action COST Qualinet. Ces travaux ont également été évalués sur une autrebase de données vidéo (en déﬁnition standard) provenant de l’IRCCyN / The goal of this Ph.D thesis is to design a no-reference video quality assessment method for lossy net-works. This Ph.D thesis is conducted in collaboration with the Audemat Worldcast Systemscompany.Our ﬁrst no-reference video quality assessment indicator is the frozen frame detection.Frozen frame detection was a research topic which was well studied in the past decades.However, the challenge is to embed a frozen frame detection method in the GoldenEagleAudemat equipment. This equipment has low computation resources that not allow real-time HD video decoding. Two methods are proposed: one based on the compressed videostream motion vectors (MV-method) and another one based on the DC coefﬁcients from thedct transform (DC-method). Both methods only require the partial decoding of the com-pressed video stream which allows for real-time analysis on the GoldenEagle equipment.The evaluation shows that results are better than the frame difference base-line method.Nevertheless, the MV and the DC methods are only suitable with for MPEG2 and H.264video streams. So a third method based on SURF points is proposed.As a second step on the way to a no-reference video quality assessment metric, we areinterested in the visual perception of transmission impairments. We propose a full-referencemetric based on saliency maps. This metric, Weighted Mean Squared Error (WMSE), is theMSE metric weighted by the saliency map. The saliency map role is to distinguish betweennoticeable and unnoticeable transmission impairments. Therefore this spatio-temporal saliencymaps is computed on the impaired frame. Thus the pixel difference in the MSE computationis emphasized or diminished with regard to the pixel saliency. According to the state of theart, several improvements are brought to the saliency map computation process. Especially,new spatio-temporal saliency map fusion strategies are designed.After our successful attempt to assess the video quality with saliency maps, we develop ano-reference quality metric. This metric, Weighted Macro-Block Error Rate (WMBER), relies on the saliency map and the macro-block error detection. The macro-block error detectionprovides the impaired macro-blocks location in the frame. However, the impaired macro-blocks are concealed with more or less success during the decoding process. So the saliencymap provides the user perceived impairment strength for each macro-block.Several psycho-visual studies have shown that semantics play an important role in visualscene perception. These studies conclude that faces and text are the most attractive. Toimprove the spatio-temporal saliency model a semantic dimension is added. This semanticsaliency is based on the Viola & Jones face detector.To predict the Mean Opinion Score (MOS) from objective metric values like WMBER,WMSE, PSNR or SSIM, we propose to use a supervised learning approach. This approach iscalled Similarity Weighted Average (SWA). Several improvements are brought to the originalSWA.For the metrics evaluation a psycho-visual experiment with 50 subjects has been carriedout. To measure the saliency map models accuracy, a psycho-visual experiment with aneye-tracker has also been carried out. These two experiments habe been conducted in col-laboration with the Ben Gurion University, Israel. WMBER and WMSE performances arecompared with reference metrics like SSIM and PSNR. The proposed metrics are also testedon a database provided by IRCCyN research laboratory. Qualité vidéo Sans référence H.264 Haute-Définition Carte de saillance Image gelée Apprentissage supervisé Video quality assessment No reference H.264 High Definition Saliency maps Frozen frames Supervised learning

1

Page generated in 0.0659 seconds