Global ETD Search

1	Closed-Loop Learning of Visual Control Policies Jodogne, Sébastien 05 December 2006 (has links) In this dissertation, I introduce a general, flexible framework for learning direct mappings from images to actions in an agent that interacts with its surrounding environment. This work is motivated by the paradigm of purposive vision. The original contributions consist in the design of reinforcement learning algorithms that are applicable to visual spaces. Inspired by the paradigm of local-appearance vision, these algorithms exploit specialized visual features that can be detected in the visual signal. Two different ways to use the visual features are described. Firstly, I introduce adaptive-resolution methods for discretizing the visual space into a manageable number of perceptual classes. To this end, a percept classifier that tests the presence or absence of few highly informative visual features is incrementally refined. New discriminant visual features are selected in a sequence of attempts to remove perceptual aliasing. Any standard reinforcement learning algorithm can then be used to extract an optimal visual control policy. The resulting algorithm is called "Reinforcement Learning of Visual Classes." Secondly, I propose to exploit the raw content of the visual features, without ever considering an equivalence relation on the visual feature space. Technically, feature regression models that associate visual features with a real-valued utility are introduced within the Approximate Policy Iteration architecture. This is done by means of a general, abstract version of Approximate Policy Iteration. This results in the "Visual Approximate Policy Iteration" algorithm. Another major contribution of this dissertation is the design of adaptive-resolution techniques that can be applied to complex, high-dimensional and/or continuous action spaces, simultaneously to visual spaces. The "Reinforcement Learning of Joint Classes" algorithm produces a non-uniform discretization of the joint space of percepts and actions. This is a brand new, general approach to adaptive-resolution methods in reinforcement learning that can deal with arbitrary, hybrid state-action spaces. Throughout this dissertation, emphasis is also put on the design of general algorithms that can be used in non-visual (e.g. continuous) perceptual spaces. The applicability of the proposed algorithms is demonstrated by solving several visual navigation tasks. Visual features Reinforcement learning Purposive vision
2	Learning Statistical Features of Scene Images Lee, Wooyoung 01 September 2014 (has links) Scene perception is a fundamental aspect of vision. Humans are capable of analyzing behaviorally-relevant scene properties such as spatial layouts or scene categories very quickly, even from low resolution versions of scenes. Although humans perform these tasks effortlessly, they are very challenging for machines. Developing methods that well capture the properties of the representation used by the visual system will be useful for building computational models that are more consistent with perception. While it is common to use hand-engineered features that extract information from predefined dimensions, they require careful tuning of parameters and do not generalize well to other tasks or larger datasets. This thesis is driven by the hypothesis that the perceptual representations are adapted to the statistical properties of natural visual scenes. For developing statistical features for global-scale structures (low spatial frequency information that encompasses entire scenes), I propose to train hierarchical probabilistic models on whole scene images. I first investigate statistical clusters of scene images by training a mixture model under the assumption that each image can be decoded by sparse and independent coefficients. Each cluster discovered by the unsupervised classifier is consistent with the high-level semantic categories (such as indoor, outdoor-natural and outdoor-manmade) as well as perceptual layout properties (mean depth, openness and perspective). To address the limitation of mixture models in their assumptions of a discrete number of underlying clusters, I further investigate a continuous representation for the distributions of whole scenes. The model parameters optimized for natural visual scenes reveal a compact representation that encodes their global-scale structures. I develop a probabilistic similarity measure based on the model and demonstrate its consistency with the perceptual similarities. Lastly, to learn the representations that better encode the manifold structures in general high-dimensional image space, I develop the image normalization process to find a set of canonical images that anchors the probabilistic distributions around the real data manifolds. The canonical images are employed as the centers of the conditional multivariate Gaussian distributions. This approach allows to learn more detailed structures of the local manifolds resulting in improved representation of the high level properties of scene images. Visual scene understanding visual features probabilistic models generative models adaptive representation feature learning
3	The Effects of Color on Depth Perception in Virtual Reality : A Case Study Wallin, Linus, Norström, Vilhelm January 2023 (has links) Finding if color has an effect on depth perception in virtual reality (VR) is important, as it could be important for e.g. surgeons to perceive the depth correctly if they were to be trained in VR environments as a preparation for surgeries on real patients. If color has an effect on perceived depth in VR then producers of these simulations have to take their color choices into account when creating simulations. Previous research has shown that luminosity and hue can have effects on depth perception. It is also perceived that depth underestimation is prevalent in VR. Discerning if either the color of the focal object or the background is affecting the depth perception is important. Therefore finding what effect different color attributes of a focal object and background has on the depth perception in a VR environment is important. This experimental study examined this through a case study performed in a VR environment built in Unity. The tests were set up to emulate the piercing of a catheter into a plane, where the user pressed a button the moment the plane was pierced. To test different colors of the focal object, in this case a plane, the background was assigned neutral colors (white or black) and while testing the background the plane had a neutral color (white). Results from the study show that colors have a small effect, namely up to 13.2 mm error (for the yellow hue with high luminosity and high saturation), on users’ depth perception in VR. No single attribute was better than another but on the object, blue hue gave the largest error while red hue gave the smallest error. For the background, there was more variation on the data but green and blue hue gave the smallest errors and red and yellow the largest. In sum, color has differing effects on depth perception in VR depending on if the color is applied to a background or an object. Red color gave the most accurate depth perception when applied to the object. For color applied to the background, green hue with high luminosity and blue hue with low luminosity resulted in the most accurate depth perception. / Att ta reda på om färg har en påverkan på djupseende i virtuell verklighet (VR) är viktigt, eftersom det skulle vara viktigt för t.ex. kirurger att uppfatta djupet korrekt om de skulle bli tränade i VR miljöer som en förberedelse inför operationer på riktiga patienter. Om färg har en effekt på upplevd djup i VR, då måste tillverkarna av dessa simulationer ha deras färgval i åtanke när de skapar simulatorerna. Tidigare forskning har visat att ljusintensitet och kulörton kan ha en effekt på djupseende. Det har också upptäckts att djupunderskattning är allmänt förekommande i VR. Att urskilja om antingen färgen på fokusobjektet eller på bakgrunden påverkar djupseendet är viktigt. Således att hitta vilken effekt olika färg attribut av ett fokusobjekt och bakgrund har på djupseendet i en VR miljö. Studien undersökte detta genom en fallstudie i en VR miljö byggd i Unity. Testen var uppbyggda för att efterlikna en kateter som genomtränger ett plan där användaren trycker på en knapp då den trängde igenom planet. För att testa olika färger på fokusobjektet, i detta fall ett plan, blev bakgrunden tilldelad neutrala färger (vit och svart) och när bakgrunden testades var planet tilldelad en neutral färg (vit). Resultaten från studien visar att färg har en liten effekt, upp till 13.2 mm i fel (för den gula kulörtonen med hög ljusintensitet och hög mättnad), på djupseende i VR. Inget enskilt attribut var bättre än ett annat, men på objektet gav blå kulörton det största felet medan röd kulörton gav det minsta felet. För bakgrunden var det mer variation på data men grön och blå kulörton gav de minsta felen och röd och gul gav de största felen. Färgen har olika påverkan på djupseende i VR beroende på om färgen är applicerad på en bakgrund eller ett objekt. Röd färg gav det mest korrekta djupseendet när den var applicerad på objektet. För färg applicerad på bakgrund, resulterade grön kulörton med hög ljusintensitet och blå kulörton med låg ljusintensitet i det mest korrekta djupseendet. Virtual reality Depth perception Color Preattentive visual features Hue Saturation Luminosity Computer and Information Sciences Data- och informationsvetenskap
4	Investigating Gaze Attraction to Bottom-Up Visual Features for Visual Aids in Games Sjöblom, Mattias January 2016 (has links) Context. Video games usually have visual aids guiding the players in 3D-environments. The designers need to know which visual feature is the most effective in attracting a player's gaze and what features are preferred by players as visual aid. Objectives. This study investigates which feature of the bottom-upvisual attention process attracts the gaze faster. Methods. With the use of the Tobii T60 eye tracking system, a user study with 32 participants was conducted in a controlled environment. An experiment was created where each participant looked at a slideshow consisting of 18 pictures with 8 objects on each picture. One object per picture had a bottom-up visual feature applied that made it stand out as different. Video games often have a goal or a task and to connect the experiment to video games a goal was set. This goal was to find the object with the visual feature applied. The eye tracker measured the gaze while the participant was trying to find the object. A survey to determine which visual feature was preferredby the players was also made. Results. The result showed that colour was the visual feature with the shortest time to attract attention. It was closely followed by intensity,motion and a pulsating highlight. Small size had the longest attraction time. Results also showed that the preferred visual feature for visual aid by the players was intensity and the least preferred was orientation. Conclusions. The results show that visual features with contrast changes in the texture seems to draw attention faster, with colour the fastest, than changes on the object itself. These features were also the most preferred as visual aid by the players with intensity the most preferred. If this study was done on a larger scale within a 3D-environment, this experiment could show promise to help designers in decisions regarding visual aid in video games. Visual attention Bottom-up visual features Video games Eye tracking Other Computer and Information Science Annan data- och informationsvetenskap
5	Contribution to automatic adjustments of vertebrae landmarks on x-ray images for 3D reconstruction and quantification of clinical indices / Contribution aux ajustements automatiques de points anatomiques des vertèbres pour la reconstruction 3D et la quantification d’indices cliniques Ebrahimi, Shahin 12 December 2017 (has links) L’exploitation de données radiographiques, en particulier pour la reconstruction 3D du rachis de patients scoliotiques, est un prérequis à la modélisation personnalisée. Les méthodes actuelles, bien qu’assez robustes pour la routine clinique, reposent sur des ajustements manuels fastidieux. Dans ce contexte, ce travail de thèse vise à la détection automatisée de points anatomiques spécifiques des vertèbres, permettant ainsi des ajustements automatisés. Nous avons développé premièrement une méthode originale de localisation de coins de vertèbres cervicales et lombaires sur les radiographies sagittales. L’évaluation rigoureuse de cette méthode suggère sa robustesse et sa précision. Nous avons ensuite développé un algorithme pour le problème pertinent cliniquement de localisation des pédicules sur les radiographies coronales. Cet algorithme se compare favorablement aux méthodes similaires dans la littérature, qui nécessitent une saisie manuelle. Enfin, nous avons soulevé les problèmes, relativement peu étudiés, de détection, identification et segmentation des apophyses épineuses du rachis cervical dans les radiographies sagittales. Toutes les tâches mentionnées ont été réalisées grâce à une combinaison originale de descripteurs visuels et une classification multi-classe par Random Forest, menant à une nouvelle et puissante approche de localisation et de segmentation. Les méthodes proposées dans cette thèse suggèrent un grand potentiel pour être intégré à la reconstruction 3D du rachis, utilisée quotidiennement en routine clinique. / Exploitation of spine radiographs, in particular for 3D spine shape reconstruction of scoliotic patients, is a prerequisite for personalized modelling. Current methods, even though robust enough to be used in clinical routine, still rely on tedious manual adjustments. In this context, this PhD thesis aims toward automated detection of specific vertebrae landmarks in spine radiographs, enabling automated adjustments. In the first part, we developed an original Random Forest based framework for vertebrae corner localization that was applied on sagittal radiographs of both cervical and lumbar spine regions. A rigorous evaluation of the method confirms robustness and high accuracy of the proposed method. In the second part, we developed an algorithm for the clinically-important task of pedicle localization in the thoracolumbar region on frontal radiographs. The proposed algorithm compares favourably to similar methods from the literature while relying on less manual supervision. The last part of this PhD tackled the scarcely-studied task of joint detection, identification and segmentation of spinous processes of cervical vertebrae in sagittal radiographs, with again high precision performance. All three algorithmic solutions were designed around a generic framework exploiting dedicated visual feature descriptors and multi-class Random Forest classifiers, proposing a novel solution with computational and manual supervision burdens aiming for translation into clinical use. Overall, the presented frameworks suggest a great potential of being integrated in current spine 3D reconstruction frameworks that are used in daily clinical routine. Rachis Radiographies Vertèbres Random forest Descripteurs visuels Descripteurs contextuels Spine X-Ray Vertebrae landmarks Random Forest Visual features Contextuel Features
6	Modeling Spatiotemporal Pedestrian-Environment Interactions for Predicting Pedestrian Crossing Intention from the Ego-View Chen Chen (11014800) 06 August 2021 (has links) <div> <div> <div> <p>For pedestrians and autonomous vehicles (AVs) to co-exist harmoniously and safely in the real-world, AVs will need to not only react to pedestrian actions, but also anticipate their intentions. In this thesis, we propose to use rich visual and pedestrian-environment interaction features to improve pedestrian crossing intention prediction from the ego-view. We do so by combining visual feature extraction, graph modeling of scene objects and their relationships, and feature encoding as comprehensive inputs for an LSTM encoder-decoder network. </p> <p>Pedestrians react and make decisions based on their surrounding environment, and the behaviors of other road users around them. The human-human social relationship has already been explored for pedestrian trajectory prediction from the bird’s eye view in stationary cameras. However, context and pedestrian-environment relationships are often missing in current research into pedestrian trajectory, and intention prediction from the ego-view. To map the pedestrian’s relationship to its surrounding objects we use a star graph with the pedestrian in the center connected to all other road objects/agents in the scene. The pedestrian and road objects/agents are represented in the graph through visual features extracted using state of the art deep learning algorithms. We use graph convolutional networks, and graph autoencoders to encode the star graphs in a lower dimension. Using the graph en- codings, pedestrian bounding boxes, and human pose estimation, we propose a novel model that predicts pedestrian crossing intention using not only the pedestrian’s action behaviors (bounding box and pose estimation), but also their relationship to their environment. </p> <p>Through tuning hyperparameters, and experimenting with different graph convolutions for our graph autoencoder, we are able to improve on the state of the art results. Our context- driven method is able to outperform current state of the art results on benchmark dataset Pedestrian Intention Estimation (PIE). The state of the art is able to predict pedestrian crossing intention with a balanced accuracy (to account for dataset imbalance) score of 0.61, while our best performing model has a balanced accuracy score of 0.79. Our model especially outperforms in no crossing intention scenarios with an F1 score of 0.56 compared to the state of the art’s score of 0.36. Additionally, we also experiment with training the state of the art model and our model to predict pedestrian crossing action, and intention jointly. While jointly predicting crossing action does not help improve crossing intention prediction, it is an important distinction to make between predicting crossing action versus intention.</p> </div> </div> </div> Computer Engineering pedestrian intention ego-view pedestrian crossing intention prediction pedestrian behavior Pedestrian Autonomous Vehicle LSTM visual features
7	Localization of autonomous ground vehicles in dense urban environments Himstedt, Marian 03 March 2014 (has links) (PDF) The localization of autonomous ground vehicles in dense urban environments poses a challenge. Applications in classical outdoor robotics rely on the availability of GPS systems in order to estimate the position. However, the presence of complex building structures in dense urban environments hampers a reliable localization based on GPS. Alternative approaches have to be applied In order to tackle this problem. This thesis proposes an approach which combines observations of a single perspective camera and odometry in a probabilistic framework. In particular, the localization in the space of appearance is addressed. First, a topological map of reference places in the environment is built. Each reference place is associated with a set of visual features. A feature selection is carried out in order to obtain distinctive reference places. The topological map is extended to a hybrid representation by the use of metric information from Geographic Information Systems (GIS) and satellite images. The localization is solved in terms of the recognition of reference places. A particle lter implementation incorporating this and the vehicle's odometry is presented. The proposed system is evaluated based on multiple experiments in exemplary urban environments characterized by high building structures and a multitude of dynamic objects. Lokalisierung Kartierung Autonome Fahrzeuge Partikelfilter SLAM Visuelle Merkmale GPS Localization Mapping Autonomous vehicles Particle filter SLAM Visual features GPS denied environments ddc:629 rvk:ZQ 6230
8	Logo detection, recognition and spotting in context by matching local visual features / Détection, reconnaissance et localisation de logo dans un contexte avec appariement de caractéristiques visuelles locales Le, Viet Phuong 08 December 2015 (has links) Cette thèse présente un framework pour le logo spotting appliqué à repérer les logos à partir de l’image des documents en se concentrant sur la catégorisation de documents et les problèmes de récupération de documents. Nous présentons également trois méthodes de matching par point clé : le point clé simple avec le plus proche voisin, le matching par règle des deux voisins les plus proches et le matching par deux descripteurs locaux à différents étapes de matching. Les deux derniers procédés sont des améliorations de la première méthode. En outre, utiliser la méthode de classification basée sur la densité pour regrouper les correspondances dans le framework proposé peut aider non seulement à segmenter la région candidate du logo mais également à rejeter les correspondances incorrectes comme des valeurs aberrantes. En outre, afin de maximiser la performance et de localiser les logos, un algorithme à deux étages a été proposé pour la vérification géométrique basée sur l’homographie avec RANSAC. Comme les approches fondées sur le point clé supposent des approches coûteuses, nous avons également investi dans l’optimisation de notre framework. Les problèmes de séparation de texte/graphique sont étudiés. Nous proposons une méthode de segmentation de texte et non-texte dans les images de documents basée sur un ensemble de fonctionnalités puissantes de composants connectés. Nous avons appliqué les techniques de réduction de dimensionnalité pour réduire le vecteur de descripteurs locaux de grande dimension et rapprocher les algorithmes de recherche du voisin le plus proche pour optimiser le framework. En outre, nous avons également mené des expériences pour un système de récupération de documents sur les documents texte et non-texte segmentés et l'algorithme ANN. Les résultats montrent que le temps de calcul du système diminue brusquement de 56% tandis que la précision diminue légèrement de près de 2,5%. Dans l'ensemble, nous avons proposé une approche efficace et efficiente pour résoudre le problème de spotting des logos dans les images de documents. Nous avons conçu notre approche pour être flexible pour des futures améliorations. Nous croyons que notre travail peut être considéré comme une étape sur la voie pour résoudre le problème de l’analyse complète et la compréhension des images de documents. / This thesis presents a logo spotting framework applied to spotting logo images on document images and focused on document categorization and document retrieval problems. We also present three key-point matching methods: simple key-point matching with nearest neighbor, matching by 2-nearest neighbor matching rule method and matching by two local descriptors at different matching stages. The last two matching methods are improvements of the first method. In addition, using a density-based clustering method to group the matches in our proposed spotting framework can help not only segment the candidate logo region but also reject the incorrect matches as outliers. Moreover, to maximize the performance and to locate logos, an algorithm with two stages is proposed for geometric verification based on homography with RANSAC. Since key-point-based approaches assume costly approaches, we have also invested to optimize our proposed framework. The problems of text/graphics separation are studied. We propose a method for segmenting text and non-text in document images based on a set of powerful connected component features. We applied dimensionality reduction techniques to reduce the high dimensional vector of local descriptors and approximate nearest neighbor search algorithms to optimize our proposed framework. In addition, we have also conducted experiments for a document retrieval system on the text and non-text segmented documents and ANN algorithm. The results show that the computation time of the system decreases sharply by 56% while its accuracy decreases slightly by nearly 2.5%. Overall, we have proposed an effective and efficient approach for solving the problem of logo spotting in document images. We have designed our approach to be flexible for future improvements by us and by other researchers. We believe that our work could be considered as a step in the direction of solving the problem of complete analysis and understanding of document images. Récupération de logo Détection de logo Reconnaissance de logo Caractéristiques visuelles locales Composant connecté Logo spotting Logo detection Logo recognition Local visual features Connected component
9	Visipedia - Embedding-driven Visual Feature Extraction and Learning / Visipedia - Embedding-driven Visual Feature Extraction and Learning Jakeš, Jan January 2014 (has links) Multidimenzionální indexování je účinným nástrojem pro zachycení podobností mezi objekty bez nutnosti jejich explicitní kategorizace. V posledních letech byla tato metoda hojně využívána pro anotaci objektů a tvořila významnou část publikací spojených s projektem Visipedia. Tato práce analyzuje možnosti strojového učení z multidimenzionálně indexovaných obrázků na základě jejich obrazových příznaků a přestavuje metody predikce multidimenzionálních souřadnic pro předem neznámé obrázky. Práce studuje příslušené algoritmy pro extrakci příznaků, analyzuje relevantní metody strojového účení a popisuje celý proces vývoje takového systému. Výsledný systém je pak otestován na dvou různých datasetech a provedené experimenty prezentují první výsledky pro úlohu svého druhu.
10	Porovnávání dokumentů na základě vizuálních rysů / Document Comparison Based on Visual Features Milička, Martin January 2011 (has links) Obsahem této práce je návrh metody porovnání webových stránek na základě vizuálních rysů. Na začátku jsou popsány možné přístupy k porovnávání dokumentů s ohledem na jejich použití. Dále jsou prezentovány přístupy porovnávání zaměřené na vizuální vzhled dokumentů. Zde jsou nejdříve popsány metody pro porovnávání z vyrendrovaného obrázku dokumentu a pak přístup pomocí zdrojového kódu. Tato práce je dále detailně zaměřena na získání vizuální rysů ze zdrojového kódu dokumentu. Je zde popsán návrh nové metody pro porovnávání dokumentů na základě vizuálních rysů, která využívá strukturální popis dokumentu. Součástí je taky popis implementace aplikace a dosažené výsledky. V závěru jsou informace o možném rozšíření navržené metody a dalším pokračování.

Search results