• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 121
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 182
  • 182
  • 94
  • 69
  • 45
  • 33
  • 33
  • 30
  • 28
  • 28
  • 27
  • 26
  • 25
  • 24
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Advancing human pose and gesture recognition

Pfister, Tomas January 2015 (has links)
This thesis presents new methods in two closely related areas of computer vision: human pose estimation, and gesture recognition in videos. In human pose estimation, we show that random forests can be used to estimate human pose in monocular videos. To this end, we propose a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses are correct or not. We further extend this pose estimator to new domains (with a transfer learning approach), and enhance its predictions by predicting the joint positions sequentially (rather than independently) in an image, and using temporal information in the videos (rather than predicting the poses from a single frame). Finally, we go beyond random forests, and show that convolutional neural networks can be used to estimate human pose even more accurately and efficiently. We propose two new convolutional neural network architectures, and show how optical flow can be employed in convolutional nets to further improve the predictions. In gesture recognition, we explore the idea of using weak supervision to learn gestures. We show that we can learn sign language automatically from signed TV broadcasts with subtitles by letting algorithms 'watch' the TV broadcasts and 'match' the signs with the subtitles. We further show that if even a small amount of strong supervision is available (as there is for sign language, in the form of sign language video dictionaries), this strong supervision can be combined with weak supervision to learn even better models.
82

[en] AN EVALUATION OF AUTOMATIC FACE RECOGNITION METHODS FOR SURVEILLANCE / [pt] ESTUDO DE MÉTODOS AUTOMÁTICOS DE RECONHECIMENTO FACIAL PARA VÍDEO MONITORAMENTO

VICTOR HUGO AYMA QUIRITA 26 March 2015 (has links)
[pt] Esta dissertação teve por objetivo comparar o desempenho de diversos algoritmos que representam o estado da arte em reconhecimento facial a imagens de sequências de vídeo. Três objetivos específicos foram perseguidos: desenvolver um método para determinar quando uma face está em posição frontal com respeito à câmera (detector de face frontal); avaliar a acurácia dos algoritmos de reconhecimento com base nas imagens faciais obtidas com ajuda do detector de face frontal; e, finalmente, identificar o algoritmo com melhor desempenho quando aplicado a tarefas de verificação e identificação. A comparação dos métodos de reconhecimento foi realizada adotando a seguinte metodologia: primeiro, foi criado um detector de face frontal que permitiu o captura das imagens faciais frontais; segundo, os algoritmos foram treinados e testados com a ajuda do facereclib, uma biblioteca desenvolvida pelo Grupo de Biometria no Instituto de Pesquisa IDIAP; terceiro, baseando-se nas curvas ROC e CMC como métricas, compararam-se os algoritmos de reconhecimento; e por ultimo, as análises dos resultados foram realizadas e as conclusões estão relatadas neste trabalho. Experimentos realizados sobre os bancos de vídeo: MOBIO, ChokePOINT, VidTIMIT, HONDA, e quatro fragmentos de diversos filmes, indicam que o Inter Session Variability Modeling e Gaussian Mixture Model são os algoritmos que fornecem a melhor acurácia quando são usados em tarefas tanto de verificação quanto de identificação, o que os indica como técnicas de reconhecimento viáveis para o vídeo monitoramento automático em vídeo. / [en] This dissertation aimed to compare the performance of state-of-the-arte face recognition algorithms in facial images captured from multiple video sequences. Three specific objectives were pursued: to develop a method for determining when a face is in frontal position with respect to the camera (frontal face detector); to evaluate the accuracy for recognition algorithms based on the facial images obtained with the help of the frontal face detector; and finally, to identify the algorithm with better performance when applied to verification and identification tasks in video surveillance systems. The comparison of the recognition methods was performed adopting the following approach: first, a frontal face detector, which allowed the capture of facial images was created; second, the algorithms were trained and tested with the help of facereclib, a library developed by the Biometrics Group at the IDIAP Research Institute; third, ROC and CMC curves were used as metrics to compare the recognition algorithms; and finally, the results were analyzed and the conclusions were reported in this manuscript. Experiments conducted on the video datasets: MOBIO, ChokePOINT, VidTIMIT, HONDA, and four fragments of several films, indicate that the Inter-Session Variability Modelling and Gaussian Mixture Model algorithms provide the best accuracy on classification when the algorithms are used in verification and identification tasks, which indicates them as a good automatic recognition techniques for video surveillance applications.
83

Avancements dans l'estimation de pose et la reconstruction 3D de scènes à 2 et 3 vues / Advances on Pose Estimation and 3D Resconstruction of 2 and 3-View Scenes

Fernandez Julia, Laura 13 December 2018 (has links)
L'étude des caméras et des images a été un sujet prédominant depuis le début de la vision par ordinateur, l'un des principaux axes étant l'estimation de la pose et la reconstruction 3D. Le but de cette thèse est d'aborder et d'étudier certains problèmes et méthodes spécifiques du pipeline de la structure-from-motion afin d'améliorer la précision, de réaliser de vastes études pour comprendre les avantages et les inconvénients des modèles existants et de créer des outils mis à la disposition du public. Plus spécifiquement, nous concentrons notre attention sur les pairs stéréoscopiques et les triplets d'images et nous explorons certaines des méthodes et modèles capables de fournir une estimation de la pose et une reconstruction 3D de la scène.Tout d'abord, nous abordons la tâche d'estimation de la profondeur pour les pairs stéréoscopiques à l'aide de la correspondance de blocs. Cette approche suppose implicitement que tous les pixels du patch ont la même profondeur, ce qui produit l'artefact commun dénommé "foreground-fattening effect". Afin de trouver un support plus approprié, Yoon et Kweon ont introduit l'utilisation de poids basés sur la similarité des couleurs et la distance spatiale, analogues à ceux utilisés dans le filtre bilatéral. Nous présentons la théorie de cette méthode et l'implémentation que nous avons développée avec quelques améliorations. Nous discutons de quelques variantes de la méthode et analysons ses paramètres et ses performances.Deuxièmement, nous considérons l'ajout d'une troisième vue et étudions le tenseur trifocal, qui décrit les contraintes géométriques reliant les trois vues. Nous explorons les avantages offerts par cet opérateur dans la tâche d'estimation de pose d'un triplet de caméras par opposition au calcul des poses relatives paire par paire en utilisant la matrice fondamentale. De plus, nous présentons une étude et l’implémentation de plusieurs paramétrisations du tenseur. Nous montrons que l'amélioration initiale de la précision du tenseur trifocal n'est pas suffisante pour avoir un impact remarquable sur l'estimation de la pose après ajustement de faisceau et que l'utilisation de la matrice fondamentale avec des triplets d'image reste pertinente.Enfin, nous proposons d'utiliser un modèle de projection différent de celui de la caméra à sténopé pour l'estimation de la pose des caméras en perspective. Nous présentons une méthode basée sur la factorisation matricielle due à Tomasi et Kanade qui repose sur la projection orthographique. Cette méthode peut être utilisée dans des configurations où d'autres méthodes échouent, en particulier lorsque l'on utilise des caméras avec des objectifs à longue distance focale. La performance de notre implémentation de cette méthode est comparée à celle des méthodes basées sur la perspective, nous considérons que l'exactitude obtenue et la robustesse démontré en font un élément à considérer dans toute procédure de la SfM / The study of cameras and images has been a prominent subject since the beginning of computer vision, one of the main focus being the pose estimation and 3D reconstruction. The goal of this thesis is to tackle and study some specific problems and methods of the structure-from-motion pipeline in order to provide improvements in accuracy, broad studies to comprehend the advantages and disadvantages of the state-of-the-art models and useful implementations made available to the public. More specifically, we center our attention to stereo pairs and triplets of images and discuss some of the methods and models able to provide pose estimation and 3D reconstruction of the scene.First, we address the depth estimation task for stereo pairs using block-matching. This approach implicitly assumes that all pixels in the patch have the same depth producing the common artifact known as the ``foreground fattening effect''. In order to find a more appropriate support, Yoon and Kweon introduced the use of weights based on color similarity and spatial distance, analogous to those used in the bilateral filter. We present the theory of this method and the implementation we have developed with some improvements. We discuss some variants of the method and analyze its parameters and performance.Secondly, we consider the addition of a third view and study the trifocal tensor, which describes the geometric constraints linking the three views. We explore the advantages offered by this operator in the pose estimation task of a triplet of cameras as opposed to computing the relative poses pair by pair using the fundamental matrix. In addition, we present a study and implementation of several parameterizations of the tensor. We show that the initial improvement in accuracy of the trifocal tensor is not enough to have a remarkable impact on the pose estimation after bundle adjustment and that using the fundamental matrix with image triplets remains relevant.Finally, we propose using a different projection model than the pinhole camera for the pose estimation of perspective cameras. We present a method based on the matrix factorization due to Tomasi and Kanade that relies on the orthographic projection. This method can be used in configurations where other methods fail, in particular, when using cameras with long focal length lenses. The performance of our implementation of this method is compared to that given by the perspective-based methods, we consider that the accuracy achieved and its robustness make it worth considering in any SfM procedure
84

Local pose estimation of feature points for object based augmented reality. / Detecção de poses locais de pontos de interesse para realidade aumentada baseadas em objetos.

Tokunaga, Daniel Makoto 27 June 2016 (has links)
Usage of real objects as links between real and virtual information is one key aspect in augmented reality. A central issue to achieve this link is the estimation of the visuospatial information of the observed object, or in other words, estimating the object pose. Different objects can have different behaviors when used for interaction. This not only encompasses changes in position, but also folding or deformations. Traditional researches in the area solve those pose estimation problems using different approaches, depending on the type of the object. Additionally, some researches are based only on positional information of observed feature points, simplifying the object information. In this work, we explore the pose estimation of different objects by gathering more information from the observed feature points, and obtaining the local poses of such points, which are not explored in other researches. We apply this local pose estimation idea in two different capturing scenarios, reaching two novel approaches of pose estimation: one based on RGB-D cameras, and another based on RGB and machine learning methods. In the RGB-D based approach, we use the feature point orientation and near surface to obtain its normal; then, find the local 6 degrees-of-freedom (DoF) pose. This approach gives us not only the rigid object pose, but also the approximated pose of deformed objects. On the other hand, our RGB based approach explores machine learning with local appearance changes. Unlike other RGB based works, we replace the complex non-linear systems solvers with a fast and robust method, reaching local rotation of the observed feature points, as well as, full 6 DoF rigid object pose with dramatically lower real-time calculation demands. Both approaches show us that gathering local poses can bring information for the pose estimation of different types of objects. / O uso de objetos reais como meio de conexão entre informações reais e virtuais é um aspecto chave dentro da realidade aumentada. Uma questão central para tal conexão é a estimativa de informações visuo-espaciais do objeto, ou em outras palavras, a detecção da pose do objeto. Diferentes objetos podem ter diferentes comportamentos quando utilizados em interações. Não somente incluindo a mudança de posição, mas também sendo dobradas ou deformadas. Pesquisas tradicionais solucionam tais problemas de detecção usando diferentes abordagens, dependendo do tipo de objeto. Adicionalmente, algumas pesquisas se baseiam somente na informação posicional dos pontos de interesse, simplificando a informação do objeto. Neste trabalho, a detecção de pose de diferente objetos é explorada coletando-se mais informações dos pontos de interesse observados e, por sua vez, obtendo as poses locais de tais pontos, poses que não são exploradas em outras pesquisas. Este conceito da detecção de pose locais é aplicada em dois ambientes de capturas, estendendo-se em duas abordagens inovadoras: uma baseada em câmeras RGB-D, e outra baseada em câmeras RGB e métodos de aprendizado de maquinas. Na abordagem baseada em RGB-D, a orientação e superfície ao redor do ponto de interesse são utilizadas para obter a normal do ponto. Através de tais informações a pose local é obtida. Esta abordagem não só permite a obtenção de poses de objetos rígidos, mas também a pose aproximada de objetos deformáveis. Por outro lado, a abordagem baseada em RGB explora o aprendizado de máquina aplicado em alterações das aparências locais. Diferentemente de outros trabalhos baseados em câmeras RGB, esta abordagem substitui solucionadores não lineares complexos com um método rápido e robusto, permitindo a obtenção de rotações locais dos pontos de interesse, assim como, a pose completa (com 6 graus-de-liberdade) de objetos rígidos, com uma demanda computacional muito menor para cálculos em tempo-real. Ambas as abordagens mostram que a coleta de poses locais podem gerar informações para a detecção de poses de diferentes tipos de objetos.
85

Discriminative hand-object pose estimation from depth images using convolutional neural networks

Goudie, Duncan January 2018 (has links)
This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released.
86

Local pose estimation of feature points for object based augmented reality. / Detecção de poses locais de pontos de interesse para realidade aumentada baseadas em objetos.

Daniel Makoto Tokunaga 27 June 2016 (has links)
Usage of real objects as links between real and virtual information is one key aspect in augmented reality. A central issue to achieve this link is the estimation of the visuospatial information of the observed object, or in other words, estimating the object pose. Different objects can have different behaviors when used for interaction. This not only encompasses changes in position, but also folding or deformations. Traditional researches in the area solve those pose estimation problems using different approaches, depending on the type of the object. Additionally, some researches are based only on positional information of observed feature points, simplifying the object information. In this work, we explore the pose estimation of different objects by gathering more information from the observed feature points, and obtaining the local poses of such points, which are not explored in other researches. We apply this local pose estimation idea in two different capturing scenarios, reaching two novel approaches of pose estimation: one based on RGB-D cameras, and another based on RGB and machine learning methods. In the RGB-D based approach, we use the feature point orientation and near surface to obtain its normal; then, find the local 6 degrees-of-freedom (DoF) pose. This approach gives us not only the rigid object pose, but also the approximated pose of deformed objects. On the other hand, our RGB based approach explores machine learning with local appearance changes. Unlike other RGB based works, we replace the complex non-linear systems solvers with a fast and robust method, reaching local rotation of the observed feature points, as well as, full 6 DoF rigid object pose with dramatically lower real-time calculation demands. Both approaches show us that gathering local poses can bring information for the pose estimation of different types of objects. / O uso de objetos reais como meio de conexão entre informações reais e virtuais é um aspecto chave dentro da realidade aumentada. Uma questão central para tal conexão é a estimativa de informações visuo-espaciais do objeto, ou em outras palavras, a detecção da pose do objeto. Diferentes objetos podem ter diferentes comportamentos quando utilizados em interações. Não somente incluindo a mudança de posição, mas também sendo dobradas ou deformadas. Pesquisas tradicionais solucionam tais problemas de detecção usando diferentes abordagens, dependendo do tipo de objeto. Adicionalmente, algumas pesquisas se baseiam somente na informação posicional dos pontos de interesse, simplificando a informação do objeto. Neste trabalho, a detecção de pose de diferente objetos é explorada coletando-se mais informações dos pontos de interesse observados e, por sua vez, obtendo as poses locais de tais pontos, poses que não são exploradas em outras pesquisas. Este conceito da detecção de pose locais é aplicada em dois ambientes de capturas, estendendo-se em duas abordagens inovadoras: uma baseada em câmeras RGB-D, e outra baseada em câmeras RGB e métodos de aprendizado de maquinas. Na abordagem baseada em RGB-D, a orientação e superfície ao redor do ponto de interesse são utilizadas para obter a normal do ponto. Através de tais informações a pose local é obtida. Esta abordagem não só permite a obtenção de poses de objetos rígidos, mas também a pose aproximada de objetos deformáveis. Por outro lado, a abordagem baseada em RGB explora o aprendizado de máquina aplicado em alterações das aparências locais. Diferentemente de outros trabalhos baseados em câmeras RGB, esta abordagem substitui solucionadores não lineares complexos com um método rápido e robusto, permitindo a obtenção de rotações locais dos pontos de interesse, assim como, a pose completa (com 6 graus-de-liberdade) de objetos rígidos, com uma demanda computacional muito menor para cálculos em tempo-real. Ambas as abordagens mostram que a coleta de poses locais podem gerar informações para a detecção de poses de diferentes tipos de objetos.
87

Scene-Dependent Human Intention Recognition for an Assistive Robotic System

Duncan, Kester 17 January 2014 (has links)
In order for assistive robots to collaborate effectively with humans for completing everyday tasks, they must be endowed with the ability to effectively perceive scenes and more importantly, recognize human intentions. As a result, we present in this dissertation a novel scene-dependent human-robot collaborative system capable of recognizing and learning human intentions based on scene objects, the actions that can be performed on them, and human interaction history. The aim of this system is to reduce the amount of human interactions necessary for communicating tasks to a robot. Accordingly, the system is partitioned into scene understanding and intention recognition modules. For scene understanding, the system is responsible for segmenting objects from captured RGB-D data, determining their positions and orientations in space, and acquiring their category labels. This information is fed into our intention recognition component where the most likely object and action pair that the user desires is determined. Our contributions to the state of the art are manifold. We propose an intention recognition framework that is appropriate for persons with limited physical capabilities, whereby we do not observe human physical actions for inferring intentions as is commonplace, but rather we only observe the scene. At the core of this framework is our novel probabilistic graphical model formulation entitled Object-Action Intention Networks. These networks are undirected graphical models where the nodes are comprised of object, action, and object feature variables, and the links between them indicate some form of direct probabilistic interaction. This setup, in tandem with a recursive Bayesian learning paradigm, enables our system to adapt to a user's preferences. We also propose an algorithm for the rapid estimation of position and orientation values of scene objects from single-view 3D point cloud data using a multi-scale superquadric fitting approach. Additionally, we leverage recent advances in computer vision for an RGB-D object categorization procedure that balances discrimination and generalization as well as a depth segmentation procedure that acquires candidate objects from tabletops. We demonstrate the feasibility of the collaborative system presented herein by conducting evaluations on multiple scenes comprised of objects from 11 categories, along with 7 possible actions, and 36 possible intentions. We achieve approximately 81% reduction in interactions overall after learning despite changes to scene structure.
88

AFFECT-PRESERVING VISUAL PRIVACY PROTECTION

Xu, Wanxin 01 January 2018 (has links)
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding. The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection. The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously.
89

Pose AR: Assessing Pose Based Input in an AR Context

Jakub, Nilsson January 2019 (has links)
Despite the rapidly growing adoption of augmented reality (AR) applications, existing methods for interacting with AR content are rated poorly, with surveyors of the area calling for better means of interaction, while researchers strive to create more natural input methods, mainly focusing on gesture input. This thesis aims to contribute to the aforementioned efforts by recognizing that technologies for consumer-grade smartphone-based pose estimation have been rapidly improving in recent years and due to their increased accuracy may have untapped potential ready to be utilized for user input. To this end, a rudimentary system for pose based input is integrated into prototype applications, which are constructed with both pose based input and touch input in mind. In this work, pose, pose estimation, and posed based input refer to using the distance and orientation of the user (or more precisely, the distance and orientation of their device) in relation to the AR content. Using said prototypes within a user interaction study allowed the identification of user preferences which indicate the approaches that future efforts into utilizing pose for input in an AR context ought to adopt. By comparing questionnaire answers and logged positional data across four prototype scenarios, it can be clearly identified that to perceive pose input as intuitive, the AR experiences shouldn’t employ a scale which is so large that it requires substantial shifts in the position of the user, as opposed to merely shifts in the position of the user’s device.
90

Compact Representations and Multi-cue Integration for Robotics

Söderberg, Robert January 2005 (has links)
<p>This thesis presents methods useful in a bin picking application, such as detection and representation of local features, pose estimation and multi-cue integration.</p><p>The scene tensor is a representation of multiple line or edge segments and was first introduced by Nordberg in [30]. A method for estimating scene tensors from gray-scale images is presented. The method is based on orientation tensors, where the scene tensor can be estimated by correlations of the elements in the orientation tensor with a number of 1<em>D</em> filters. Mechanisms for analyzing the scene tensor are described and an algorithm for detecting interest points and estimating feature parameters is presented. It is shown that the algorithm works on a wide spectrum of images with good result.</p><p>Representations that are invariant with respect to a set of transformations are useful in many applications, such as pose estimation, tracking and wide baseline stereo. The scene tensor itself is not invariant and three different methods for implementing an invariant representation based on the scene tensor is presented. One is based on a non-linear transformation of the scene tensor and is invariant to perspective transformations. Two versions of a tensor doublet is presented, which is based on a geometry of two interest points and is invariant to translation, rotation and scaling. The tensor doublet is used in a framework for view centered pose estimation of 3<em>D</em> objects. It is shown that the pose estimation algorithm has good performance even though the object is occluded and has a different scale compared to the training situation.</p><p>An industrial implementation of a bin picking application have to cope with several different types of objects. All pose estimation algorithms use some kind of model and there is yet no model that can cope with all kinds of situations and objects. This thesis presents a method for integrating cues from several pose estimation algorithms for increasing the system stability. It is also shown that the same framework can also be used for increasing the accuracy of the system by using cues from several views of the object. An extensive test with several different objects, lighting conditions and backgrounds shows that multi-cue integration makes the system more robust and increases the accuracy.</p><p>Finally, a system for bin picking is presented, built from the previous parts of this thesis. An eye in hand setup is used with a standard industrial robot arm. It is shown that the system works for real bin-picking situations with a positioning error below 1 mm and an orientation error below 1<sup>o</sup> degree for most of the different situations.</p> / Report code: LiU-TEK-LIC-2005:15.

Page generated in 0.1081 seconds