1 |
Gesture passwords: concepts, methods and challengesWu, Jonathan 21 June 2016 (has links)
Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures.
A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric.
This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods.
A byproduct of this work is the creation and release of multiple publicly available,
user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures.
|
2 |
Indoor 3D Scene Understanding Using Depth SensorsLahoud, Jean 09 1900 (has links)
One of the main goals in computer vision is to achieve a human-like understanding of images. Nevertheless, image understanding has been mainly studied in the 2D image frame, so more information is needed to relate them to the 3D world. With the emergence of 3D sensors (e.g. the Microsoft Kinect), which provide depth along with color information, the task of propagating 2D knowledge into 3D becomes more attainable and enables interaction between a machine (e.g. robot) and its environment. This dissertation focuses on three aspects of indoor 3D scene understanding: (1) 2D-driven 3D object detection for single frame scenes with inherent 2D information, (2) 3D object instance segmentation for 3D reconstructed scenes, and (3) using room and floor orientation for automatic labeling of indoor scenes that could be used for self-supervised object segmentation. These methods allow capturing of physical extents of 3D objects, such as their sizes and actual locations within a scene.
|
3 |
Mixed Reality for Gripen Flight SimulatorsOlsson, Tobias, Ullberg, Oscar January 2021 (has links)
This thesis aims to evaluate how different mixed reality solutions can be built and whetheror not it could be used for flight simulators. A simulator prototype was implemented usingUnreal Engine 4 with Varjo’s Unreal Engine plugin giving the foundation for the evaluations done through user studies. Three user studies were performed to test subjectivelatency with Varjo XR-1 in a mixed reality environment, test hand-eye coordination withVarjo XR-1 in a video see-through environment, and test the sense of immersion betweenan IR depth sensor and chroma key flight simulator prototype. The evaluation was seenfrom several perspectives, consisting of: an evaluation from a latency perspective on howa mixed reality solution would compare to an existing dome projector solution, how wellthe masking could be done when using either chroma keying or IR depth sensors, andlastly, which of the two evaluated mixed reality techniques are preferred to use in a senseof immersion and usability.The investigation conducted during the thesis showed that while using a mixed realityenvironment had a minimal impact on system latency compared to using a monitor setup.However, the precision in hand-eye coordination while using VST-mode was evaluated tohave a decreased interaction accuracy while conducting tasks. The comparison betweenthe two mixed reality techniques presented in which areas the techniques excel and wherethey are lacking, therefore, a decision needs to be made to what is more important for eachindividual use case while developing a mixed reality simulator.
|
4 |
Computational Multimedia for Video Self ModelingShen, Ju 01 January 2014 (has links)
Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras.
|
5 |
Real-time Software Hand Pose Recognition using Single View Depth ImagesAlberts, Stefan Francois 04 1900 (has links)
Thesis (MEng)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: The fairly recent introduction of low-cost depth sensors such as Microsoft’s Xbox Kinect
has encouraged a large amount of research on the use of depth sensors for many
common Computer Vision problems. Depth images are advantageous over normal
colour images because of how easily objects in a scene can be segregated in real-time.
Microsoft used the depth images from the Kinect to successfully separate multiple
users and track various larger body joints, but has difficulty tracking smaller joints
such as those of the fingers. This is a result of the low resolution and noisy nature of
the depth images produced by the Kinect.
The objective of this project is to use the depth images produced by the Kinect to
remotely track the user’s hands and to recognise the static hand poses in real-time.
Such a system would make it possible to control an electronic device from a distance
without the use of a remote control. It can be used to control computer systems during
computer aided presentations, translate sign language and to provide more hygienic
control devices in clean rooms such as operating theatres and electronic laboratories.
The proposed system uses the open-source OpenNI framework to retrieve the depth
images from the Kinect and to track the user’s hands. Random Decision Forests are
trained using computer generated depth images of various hand poses and used to
classify the hand regions from a depth image. The region images are processed using
a Mean-Shift based joint estimator to find the 3D joint coordinates. These coordinates
are finally used to classify the static hand pose using a Support Vector Machine trained
using the libSVM library. The system achieves a final accuracy of 95.61% when tested
against synthetic data and 81.35% when tested against real world data. / AFRIKAANSE OPSOMMING: Die onlangse bekendstelling van lae-koste diepte sensors soos Microsoft se Xbox Kinect
het groot belangstelling opgewek in navorsing oor die gebruik van die diepte sensors
vir algemene Rekenaarvisie probleme. Diepte beelde maak dit baie eenvoudig om
intyds verskillende voorwerpe in ’n toneel van mekaar te skei. Microsoft het diepte
beelde van die Kinect gebruik om verskeie persone en hul ledemate suksesvol te volg.
Dit kan egter nie kleiner ledemate soos die vingers volg nie as gevolg van die lae resolusie
en voorkoms van geraas in die beelde.
Die doel van hierdie projek is om die diepte beelde (verkry vanaf die Kinect) te gebruik
om intyds ’n gebruiker se hande te volg oor ’n afstand en die statiese handgebare
te herken. So ’n stelsel sal dit moontlik maak om elektroniese toestelle oor ’n afstand
te kan beheer sonder die gebruik van ’n afstandsbeheerder. Dit kan gebruik word om
rekenaarstelsels te beheer gedurende rekenaargesteunde aanbiedings, vir die vertaling
van vingertaal en kan ook gebruik word as higiëniese, tasvrye beheer toestelle in
skoonkamers soos operasieteaters en elektroniese laboratoriums. Die voorgestelde stelsel maak gebruik van die oopbron OpenNI raamwerk om die
diepte beelde vanaf die Kinect te lees en die gebruiker se hande te volg. Lukrake
Besluitnemingswoude ("Random Decision Forests") is opgelei met behulp van rekenaar
gegenereerde diepte beelde van verskeie handgebare en word gebruik om die
verskeie handdele vanaf ’n diepte beeld te klassifiseer. Die 3D koördinate van die hand
ledemate word dan verkry deur gebruik te maak van ’n Gemiddelde-Afset gebaseerde
ledemaat herkenner. Hierdie koördinate word dan gebruik om die statiese handgebaar
te klassifiseer met behulp van ’n Steun-Vektor Masjien ("Support Vector Machine"),
opgelei met behulp van die libSVM biblioteek. Die stelsel behaal ’n finale akkuraatheid
van 95.61% wanneer dit getoets word teen sintetiese data en 81.35% wanneer getoets
word teen werklike data.
|
6 |
Analyse de scène temps réel pour l'interaction 3D / Real-time scene analysis for 3D interactionKaiser, Adrien 01 July 2019 (has links)
Cette thèse porte sur l'analyse visuelle de scènes intérieures capturées par des caméras de profondeur dans le but de convertir leurs données en information de haut niveau sur la scène. Elle explore l'application d'outils d'analyse géométrique 3D à des données visuelles de profondeur en termes d'amélioration de qualité, de recalage et de consolidation. En particulier, elle vise à montrer comment l'abstraction de formes permet de générer des représentations légères pour une analyse rapide avec des besoins matériels faibles. Cette propriété est liée à notre objectif de concevoir des algorithmes adaptés à un fonctionnement embarqué en temps réel dans le cadre d'appareils portables, téléphones ou robots mobiles. Le contexte de cette thèse est l'exécution d'un procédé d’interaction 3D temps réel sur un appareil mobile. Cette exécution soulève plusieurs problématiques, dont le placement de zones d'interaction 3D par rapport à des objets environnants réels, le suivi de ces zones dans l'espace lorsque le capteur est déplacé ainsi qu'une utilisation claire et compréhensible du système par des utilisateurs non experts. Nous apportons des contributions vers la résolution de ces problèmes pour montrer comment l'abstraction géométrique de la scène permet une localisation rapide et robuste du capteur et une représentation efficace des données fournies ainsi que l'amélioration de leur qualité et leur consolidation. Bien que les formes géométriques simples ne contiennent pas autant d'information que les nuages de points denses ou les ensembles volumiques pour représenter les scènes observées, nous montrons qu’elles constituent une approximation acceptable et que leur légèreté leur donne un bon équilibre entre précision et performance. / This PhD thesis focuses on the problem of visual scene analysis captured by commodity depth sensors to convert their data into high level understanding of the scene. It explores the use of 3D geometry analysis tools on visual depth data in terms of enhancement, registration and consolidation. In particular, we aim to show how shape abstraction can generate lightweight representations of the data for fast analysis with low hardware requirements. This last property is important as one of our goals is to design algorithms suitable for live embedded operation in e.g., wearable devices, smartphones or mobile robots. The context of this thesis is the live operation of 3D interaction on a mobile device, which raises numerous issues including placing 3D interaction zones with relation to real surrounding objects, tracking the interaction zones in space when the sensor moves and providing a meaningful and understandable experience to non-expert users. Towards solving these problems, we make contributions where scene abstraction leads to fast and robust sensor localization as well as efficient frame data representation, enhancement and consolidation. While simple geometric surface shapes are not as faithful as heavy point sets or volumes to represent observed scenes, we show that they are an acceptable approximation and their light weight makes them well balanced between accuracy and performance.
|
Page generated in 0.0395 seconds