• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 124
  • 10
  • 7
  • 5
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 189
  • 189
  • 98
  • 72
  • 50
  • 36
  • 33
  • 33
  • 30
  • 29
  • 29
  • 28
  • 26
  • 25
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Repousser les limites de l'identification faciale en contexte de vidéo-surveillance / Breaking the limits of facial identification in video-surveillance context.

Fiche, Cécile 31 January 2012 (has links)
Les systèmes d'identification de personnes basés sur le visage deviennent de plus en plus répandus et trouvent des applications très variées, en particulier dans le domaine de la vidéosurveillance. Or, dans ce contexte, les performances des algorithmes de reconnaissance faciale dépendent largement des conditions d'acquisition des images, en particulier lorsque la pose varie mais également parce que les méthodes d'acquisition elles mêmes peuvent introduire des artéfacts. On parle principalement ici de maladresse de mise au point pouvant entraîner du flou sur l'image ou bien d'erreurs liées à la compression et faisant apparaître des effets de blocs. Le travail réalisé au cours de la thèse porte donc sur la reconnaissance de visages à partir d'images acquises à l'aide de caméras de vidéosurveillance, présentant des artéfacts de flou ou de bloc ou bien des visages avec des poses variables. Nous proposons dans un premier temps une nouvelle approche permettant d'améliorer de façon significative la reconnaissance des visages avec un niveau de flou élevé ou présentant de forts effets de bloc. La méthode, à l'aide de métriques spécifiques, permet d'évaluer la qualité de l'image d'entrée et d'adapter en conséquence la base d'apprentissage des algorithmes de reconnaissance. Dans un second temps, nous nous sommes focalisés sur l'estimation de la pose du visage. En effet, il est généralement très difficile de reconnaître un visage lorsque celui-ci n'est pas de face et la plupart des algorithmes d'identification de visages considérés comme peu sensibles à ce paramètre nécessitent de connaître la pose pour atteindre un taux de reconnaissance intéressant en un temps relativement court. Nous avons donc développé une méthode d'estimation de la pose en nous basant sur des méthodes de reconnaissance récentes afin d'obtenir une estimation rapide et suffisante de ce paramètre. / The person identification systems based on face recognition are becoming increasingly widespread and are being used in very diverse applications, particularly in the field of video surveillance. In this context, the performance of the facial recognition algorithms largely depends on the image acquisition context, especially because the pose can vary, but also because the acquisition methods themselves can introduce artifacts. The main issues are focus imprecision, which can lead to blurred images, or the errors related to compression, which can introduce the block artifact. The work done during the thesis focuses on facial recognition in images taken by video surveillance cameras, in cases where the images contain blur or block artifacts or show various poses. First, we are proposing a new approach that allows to significantly improve facial recognition in images with high blur levels or with strong block artifacts. The method, which makes use of specific noreference metrics, starts with the evaluation of the quality level of the input image and then adapts the training database of the recognition algorithms accordingly. Second, we have focused on the facial pose estimation. Normally, it is very difficult to recognize a face in an image taken from another viewpoint than the frontal one and the majority of facial identification algorithms which are robust to pose variation need to know the pose in order to achieve a satisfying recognition rate in a relatively short time. We have therefore developed a fast and satisfying pose estimation method based on recent recognition techniques.
82

Real-Time Head Pose Estimation in Low-Resolution Football Footage / Realtidsestimering av huvudets vridning i lågupplösta videosekvenser från fotbollsmatcher

Launila, Andreas January 2009 (has links)
This report examines the problem of real-time head pose estimation in low-resolution football footage. A method is presented for inferring the head pose using a combination of footage and knowledge of the locations of the football and players. An ensemble of randomized ferns is compared with a support vector machine for processing the footage, while a support vector machine performs pattern recognition on the location data. Combining the two sources of information outperforms either in isolation. The location of the football turns out to be an important piece of information. / QC 20100707 / Capturing and Visualizing Large scale Human Action (ACTVIS)
83

Vision-Based Techniques for Cognitive and Motor Skill Assessments

Floyd, Beatrice K. 24 August 2012 (has links)
No description available.
84

Advancing human pose and gesture recognition

Pfister, Tomas January 2015 (has links)
This thesis presents new methods in two closely related areas of computer vision: human pose estimation, and gesture recognition in videos. In human pose estimation, we show that random forests can be used to estimate human pose in monocular videos. To this end, we propose a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses are correct or not. We further extend this pose estimator to new domains (with a transfer learning approach), and enhance its predictions by predicting the joint positions sequentially (rather than independently) in an image, and using temporal information in the videos (rather than predicting the poses from a single frame). Finally, we go beyond random forests, and show that convolutional neural networks can be used to estimate human pose even more accurately and efficiently. We propose two new convolutional neural network architectures, and show how optical flow can be employed in convolutional nets to further improve the predictions. In gesture recognition, we explore the idea of using weak supervision to learn gestures. We show that we can learn sign language automatically from signed TV broadcasts with subtitles by letting algorithms 'watch' the TV broadcasts and 'match' the signs with the subtitles. We further show that if even a small amount of strong supervision is available (as there is for sign language, in the form of sign language video dictionaries), this strong supervision can be combined with weak supervision to learn even better models.
85

[en] AN EVALUATION OF AUTOMATIC FACE RECOGNITION METHODS FOR SURVEILLANCE / [pt] ESTUDO DE MÉTODOS AUTOMÁTICOS DE RECONHECIMENTO FACIAL PARA VÍDEO MONITORAMENTO

VICTOR HUGO AYMA QUIRITA 26 March 2015 (has links)
[pt] Esta dissertação teve por objetivo comparar o desempenho de diversos algoritmos que representam o estado da arte em reconhecimento facial a imagens de sequências de vídeo. Três objetivos específicos foram perseguidos: desenvolver um método para determinar quando uma face está em posição frontal com respeito à câmera (detector de face frontal); avaliar a acurácia dos algoritmos de reconhecimento com base nas imagens faciais obtidas com ajuda do detector de face frontal; e, finalmente, identificar o algoritmo com melhor desempenho quando aplicado a tarefas de verificação e identificação. A comparação dos métodos de reconhecimento foi realizada adotando a seguinte metodologia: primeiro, foi criado um detector de face frontal que permitiu o captura das imagens faciais frontais; segundo, os algoritmos foram treinados e testados com a ajuda do facereclib, uma biblioteca desenvolvida pelo Grupo de Biometria no Instituto de Pesquisa IDIAP; terceiro, baseando-se nas curvas ROC e CMC como métricas, compararam-se os algoritmos de reconhecimento; e por ultimo, as análises dos resultados foram realizadas e as conclusões estão relatadas neste trabalho. Experimentos realizados sobre os bancos de vídeo: MOBIO, ChokePOINT, VidTIMIT, HONDA, e quatro fragmentos de diversos filmes, indicam que o Inter Session Variability Modeling e Gaussian Mixture Model são os algoritmos que fornecem a melhor acurácia quando são usados em tarefas tanto de verificação quanto de identificação, o que os indica como técnicas de reconhecimento viáveis para o vídeo monitoramento automático em vídeo. / [en] This dissertation aimed to compare the performance of state-of-the-arte face recognition algorithms in facial images captured from multiple video sequences. Three specific objectives were pursued: to develop a method for determining when a face is in frontal position with respect to the camera (frontal face detector); to evaluate the accuracy for recognition algorithms based on the facial images obtained with the help of the frontal face detector; and finally, to identify the algorithm with better performance when applied to verification and identification tasks in video surveillance systems. The comparison of the recognition methods was performed adopting the following approach: first, a frontal face detector, which allowed the capture of facial images was created; second, the algorithms were trained and tested with the help of facereclib, a library developed by the Biometrics Group at the IDIAP Research Institute; third, ROC and CMC curves were used as metrics to compare the recognition algorithms; and finally, the results were analyzed and the conclusions were reported in this manuscript. Experiments conducted on the video datasets: MOBIO, ChokePOINT, VidTIMIT, HONDA, and four fragments of several films, indicate that the Inter-Session Variability Modelling and Gaussian Mixture Model algorithms provide the best accuracy on classification when the algorithms are used in verification and identification tasks, which indicates them as a good automatic recognition techniques for video surveillance applications.
86

Avancements dans l'estimation de pose et la reconstruction 3D de scènes à 2 et 3 vues / Advances on Pose Estimation and 3D Resconstruction of 2 and 3-View Scenes

Fernandez Julia, Laura 13 December 2018 (has links)
L'étude des caméras et des images a été un sujet prédominant depuis le début de la vision par ordinateur, l'un des principaux axes étant l'estimation de la pose et la reconstruction 3D. Le but de cette thèse est d'aborder et d'étudier certains problèmes et méthodes spécifiques du pipeline de la structure-from-motion afin d'améliorer la précision, de réaliser de vastes études pour comprendre les avantages et les inconvénients des modèles existants et de créer des outils mis à la disposition du public. Plus spécifiquement, nous concentrons notre attention sur les pairs stéréoscopiques et les triplets d'images et nous explorons certaines des méthodes et modèles capables de fournir une estimation de la pose et une reconstruction 3D de la scène.Tout d'abord, nous abordons la tâche d'estimation de la profondeur pour les pairs stéréoscopiques à l'aide de la correspondance de blocs. Cette approche suppose implicitement que tous les pixels du patch ont la même profondeur, ce qui produit l'artefact commun dénommé "foreground-fattening effect". Afin de trouver un support plus approprié, Yoon et Kweon ont introduit l'utilisation de poids basés sur la similarité des couleurs et la distance spatiale, analogues à ceux utilisés dans le filtre bilatéral. Nous présentons la théorie de cette méthode et l'implémentation que nous avons développée avec quelques améliorations. Nous discutons de quelques variantes de la méthode et analysons ses paramètres et ses performances.Deuxièmement, nous considérons l'ajout d'une troisième vue et étudions le tenseur trifocal, qui décrit les contraintes géométriques reliant les trois vues. Nous explorons les avantages offerts par cet opérateur dans la tâche d'estimation de pose d'un triplet de caméras par opposition au calcul des poses relatives paire par paire en utilisant la matrice fondamentale. De plus, nous présentons une étude et l’implémentation de plusieurs paramétrisations du tenseur. Nous montrons que l'amélioration initiale de la précision du tenseur trifocal n'est pas suffisante pour avoir un impact remarquable sur l'estimation de la pose après ajustement de faisceau et que l'utilisation de la matrice fondamentale avec des triplets d'image reste pertinente.Enfin, nous proposons d'utiliser un modèle de projection différent de celui de la caméra à sténopé pour l'estimation de la pose des caméras en perspective. Nous présentons une méthode basée sur la factorisation matricielle due à Tomasi et Kanade qui repose sur la projection orthographique. Cette méthode peut être utilisée dans des configurations où d'autres méthodes échouent, en particulier lorsque l'on utilise des caméras avec des objectifs à longue distance focale. La performance de notre implémentation de cette méthode est comparée à celle des méthodes basées sur la perspective, nous considérons que l'exactitude obtenue et la robustesse démontré en font un élément à considérer dans toute procédure de la SfM / The study of cameras and images has been a prominent subject since the beginning of computer vision, one of the main focus being the pose estimation and 3D reconstruction. The goal of this thesis is to tackle and study some specific problems and methods of the structure-from-motion pipeline in order to provide improvements in accuracy, broad studies to comprehend the advantages and disadvantages of the state-of-the-art models and useful implementations made available to the public. More specifically, we center our attention to stereo pairs and triplets of images and discuss some of the methods and models able to provide pose estimation and 3D reconstruction of the scene.First, we address the depth estimation task for stereo pairs using block-matching. This approach implicitly assumes that all pixels in the patch have the same depth producing the common artifact known as the ``foreground fattening effect''. In order to find a more appropriate support, Yoon and Kweon introduced the use of weights based on color similarity and spatial distance, analogous to those used in the bilateral filter. We present the theory of this method and the implementation we have developed with some improvements. We discuss some variants of the method and analyze its parameters and performance.Secondly, we consider the addition of a third view and study the trifocal tensor, which describes the geometric constraints linking the three views. We explore the advantages offered by this operator in the pose estimation task of a triplet of cameras as opposed to computing the relative poses pair by pair using the fundamental matrix. In addition, we present a study and implementation of several parameterizations of the tensor. We show that the initial improvement in accuracy of the trifocal tensor is not enough to have a remarkable impact on the pose estimation after bundle adjustment and that using the fundamental matrix with image triplets remains relevant.Finally, we propose using a different projection model than the pinhole camera for the pose estimation of perspective cameras. We present a method based on the matrix factorization due to Tomasi and Kanade that relies on the orthographic projection. This method can be used in configurations where other methods fail, in particular, when using cameras with long focal length lenses. The performance of our implementation of this method is compared to that given by the perspective-based methods, we consider that the accuracy achieved and its robustness make it worth considering in any SfM procedure
87

Local pose estimation of feature points for object based augmented reality. / Detecção de poses locais de pontos de interesse para realidade aumentada baseadas em objetos.

Tokunaga, Daniel Makoto 27 June 2016 (has links)
Usage of real objects as links between real and virtual information is one key aspect in augmented reality. A central issue to achieve this link is the estimation of the visuospatial information of the observed object, or in other words, estimating the object pose. Different objects can have different behaviors when used for interaction. This not only encompasses changes in position, but also folding or deformations. Traditional researches in the area solve those pose estimation problems using different approaches, depending on the type of the object. Additionally, some researches are based only on positional information of observed feature points, simplifying the object information. In this work, we explore the pose estimation of different objects by gathering more information from the observed feature points, and obtaining the local poses of such points, which are not explored in other researches. We apply this local pose estimation idea in two different capturing scenarios, reaching two novel approaches of pose estimation: one based on RGB-D cameras, and another based on RGB and machine learning methods. In the RGB-D based approach, we use the feature point orientation and near surface to obtain its normal; then, find the local 6 degrees-of-freedom (DoF) pose. This approach gives us not only the rigid object pose, but also the approximated pose of deformed objects. On the other hand, our RGB based approach explores machine learning with local appearance changes. Unlike other RGB based works, we replace the complex non-linear systems solvers with a fast and robust method, reaching local rotation of the observed feature points, as well as, full 6 DoF rigid object pose with dramatically lower real-time calculation demands. Both approaches show us that gathering local poses can bring information for the pose estimation of different types of objects. / O uso de objetos reais como meio de conexão entre informações reais e virtuais é um aspecto chave dentro da realidade aumentada. Uma questão central para tal conexão é a estimativa de informações visuo-espaciais do objeto, ou em outras palavras, a detecção da pose do objeto. Diferentes objetos podem ter diferentes comportamentos quando utilizados em interações. Não somente incluindo a mudança de posição, mas também sendo dobradas ou deformadas. Pesquisas tradicionais solucionam tais problemas de detecção usando diferentes abordagens, dependendo do tipo de objeto. Adicionalmente, algumas pesquisas se baseiam somente na informação posicional dos pontos de interesse, simplificando a informação do objeto. Neste trabalho, a detecção de pose de diferente objetos é explorada coletando-se mais informações dos pontos de interesse observados e, por sua vez, obtendo as poses locais de tais pontos, poses que não são exploradas em outras pesquisas. Este conceito da detecção de pose locais é aplicada em dois ambientes de capturas, estendendo-se em duas abordagens inovadoras: uma baseada em câmeras RGB-D, e outra baseada em câmeras RGB e métodos de aprendizado de maquinas. Na abordagem baseada em RGB-D, a orientação e superfície ao redor do ponto de interesse são utilizadas para obter a normal do ponto. Através de tais informações a pose local é obtida. Esta abordagem não só permite a obtenção de poses de objetos rígidos, mas também a pose aproximada de objetos deformáveis. Por outro lado, a abordagem baseada em RGB explora o aprendizado de máquina aplicado em alterações das aparências locais. Diferentemente de outros trabalhos baseados em câmeras RGB, esta abordagem substitui solucionadores não lineares complexos com um método rápido e robusto, permitindo a obtenção de rotações locais dos pontos de interesse, assim como, a pose completa (com 6 graus-de-liberdade) de objetos rígidos, com uma demanda computacional muito menor para cálculos em tempo-real. Ambas as abordagens mostram que a coleta de poses locais podem gerar informações para a detecção de poses de diferentes tipos de objetos.
88

Discriminative hand-object pose estimation from depth images using convolutional neural networks

Goudie, Duncan January 2018 (has links)
This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released.
89

Local pose estimation of feature points for object based augmented reality. / Detecção de poses locais de pontos de interesse para realidade aumentada baseadas em objetos.

Daniel Makoto Tokunaga 27 June 2016 (has links)
Usage of real objects as links between real and virtual information is one key aspect in augmented reality. A central issue to achieve this link is the estimation of the visuospatial information of the observed object, or in other words, estimating the object pose. Different objects can have different behaviors when used for interaction. This not only encompasses changes in position, but also folding or deformations. Traditional researches in the area solve those pose estimation problems using different approaches, depending on the type of the object. Additionally, some researches are based only on positional information of observed feature points, simplifying the object information. In this work, we explore the pose estimation of different objects by gathering more information from the observed feature points, and obtaining the local poses of such points, which are not explored in other researches. We apply this local pose estimation idea in two different capturing scenarios, reaching two novel approaches of pose estimation: one based on RGB-D cameras, and another based on RGB and machine learning methods. In the RGB-D based approach, we use the feature point orientation and near surface to obtain its normal; then, find the local 6 degrees-of-freedom (DoF) pose. This approach gives us not only the rigid object pose, but also the approximated pose of deformed objects. On the other hand, our RGB based approach explores machine learning with local appearance changes. Unlike other RGB based works, we replace the complex non-linear systems solvers with a fast and robust method, reaching local rotation of the observed feature points, as well as, full 6 DoF rigid object pose with dramatically lower real-time calculation demands. Both approaches show us that gathering local poses can bring information for the pose estimation of different types of objects. / O uso de objetos reais como meio de conexão entre informações reais e virtuais é um aspecto chave dentro da realidade aumentada. Uma questão central para tal conexão é a estimativa de informações visuo-espaciais do objeto, ou em outras palavras, a detecção da pose do objeto. Diferentes objetos podem ter diferentes comportamentos quando utilizados em interações. Não somente incluindo a mudança de posição, mas também sendo dobradas ou deformadas. Pesquisas tradicionais solucionam tais problemas de detecção usando diferentes abordagens, dependendo do tipo de objeto. Adicionalmente, algumas pesquisas se baseiam somente na informação posicional dos pontos de interesse, simplificando a informação do objeto. Neste trabalho, a detecção de pose de diferente objetos é explorada coletando-se mais informações dos pontos de interesse observados e, por sua vez, obtendo as poses locais de tais pontos, poses que não são exploradas em outras pesquisas. Este conceito da detecção de pose locais é aplicada em dois ambientes de capturas, estendendo-se em duas abordagens inovadoras: uma baseada em câmeras RGB-D, e outra baseada em câmeras RGB e métodos de aprendizado de maquinas. Na abordagem baseada em RGB-D, a orientação e superfície ao redor do ponto de interesse são utilizadas para obter a normal do ponto. Através de tais informações a pose local é obtida. Esta abordagem não só permite a obtenção de poses de objetos rígidos, mas também a pose aproximada de objetos deformáveis. Por outro lado, a abordagem baseada em RGB explora o aprendizado de máquina aplicado em alterações das aparências locais. Diferentemente de outros trabalhos baseados em câmeras RGB, esta abordagem substitui solucionadores não lineares complexos com um método rápido e robusto, permitindo a obtenção de rotações locais dos pontos de interesse, assim como, a pose completa (com 6 graus-de-liberdade) de objetos rígidos, com uma demanda computacional muito menor para cálculos em tempo-real. Ambas as abordagens mostram que a coleta de poses locais podem gerar informações para a detecção de poses de diferentes tipos de objetos.
90

Scene-Dependent Human Intention Recognition for an Assistive Robotic System

Duncan, Kester 17 January 2014 (has links)
In order for assistive robots to collaborate effectively with humans for completing everyday tasks, they must be endowed with the ability to effectively perceive scenes and more importantly, recognize human intentions. As a result, we present in this dissertation a novel scene-dependent human-robot collaborative system capable of recognizing and learning human intentions based on scene objects, the actions that can be performed on them, and human interaction history. The aim of this system is to reduce the amount of human interactions necessary for communicating tasks to a robot. Accordingly, the system is partitioned into scene understanding and intention recognition modules. For scene understanding, the system is responsible for segmenting objects from captured RGB-D data, determining their positions and orientations in space, and acquiring their category labels. This information is fed into our intention recognition component where the most likely object and action pair that the user desires is determined. Our contributions to the state of the art are manifold. We propose an intention recognition framework that is appropriate for persons with limited physical capabilities, whereby we do not observe human physical actions for inferring intentions as is commonplace, but rather we only observe the scene. At the core of this framework is our novel probabilistic graphical model formulation entitled Object-Action Intention Networks. These networks are undirected graphical models where the nodes are comprised of object, action, and object feature variables, and the links between them indicate some form of direct probabilistic interaction. This setup, in tandem with a recursive Bayesian learning paradigm, enables our system to adapt to a user's preferences. We also propose an algorithm for the rapid estimation of position and orientation values of scene objects from single-view 3D point cloud data using a multi-scale superquadric fitting approach. Additionally, we leverage recent advances in computer vision for an RGB-D object categorization procedure that balances discrimination and generalization as well as a depth segmentation procedure that acquires candidate objects from tabletops. We demonstrate the feasibility of the collaborative system presented herein by conducting evaluations on multiple scenes comprised of objects from 11 categories, along with 7 possible actions, and 36 possible intentions. We achieve approximately 81% reduction in interactions overall after learning despite changes to scene structure.

Page generated in 0.1097 seconds