Global ETD Search

201	Advancing human pose and gesture recognition Pfister, Tomas January 2015 (has links) This thesis presents new methods in two closely related areas of computer vision: human pose estimation, and gesture recognition in videos. In human pose estimation, we show that random forests can be used to estimate human pose in monocular videos. To this end, we propose a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses are correct or not. We further extend this pose estimator to new domains (with a transfer learning approach), and enhance its predictions by predicting the joint positions sequentially (rather than independently) in an image, and using temporal information in the videos (rather than predicting the poses from a single frame). Finally, we go beyond random forests, and show that convolutional neural networks can be used to estimate human pose even more accurately and efficiently. We propose two new convolutional neural network architectures, and show how optical flow can be employed in convolutional nets to further improve the predictions. In gesture recognition, we explore the idea of using weak supervision to learn gestures. We show that we can learn sign language automatically from signed TV broadcasts with subtitles by letting algorithms 'watch' the TV broadcasts and 'match' the signs with the subtitles. We further show that if even a small amount of strong supervision is available (as there is for sign language, in the form of sign language video dictionaries), this strong supervision can be combined with weak supervision to learn even better models. 006.3
202	Local pose estimation of feature points for object based augmented reality. / Detecção de poses locais de pontos de interesse para realidade aumentada baseadas em objetos. Tokunaga, Daniel Makoto 27 June 2016 (has links) Usage of real objects as links between real and virtual information is one key aspect in augmented reality. A central issue to achieve this link is the estimation of the visuospatial information of the observed object, or in other words, estimating the object pose. Different objects can have different behaviors when used for interaction. This not only encompasses changes in position, but also folding or deformations. Traditional researches in the area solve those pose estimation problems using different approaches, depending on the type of the object. Additionally, some researches are based only on positional information of observed feature points, simplifying the object information. In this work, we explore the pose estimation of different objects by gathering more information from the observed feature points, and obtaining the local poses of such points, which are not explored in other researches. We apply this local pose estimation idea in two different capturing scenarios, reaching two novel approaches of pose estimation: one based on RGB-D cameras, and another based on RGB and machine learning methods. In the RGB-D based approach, we use the feature point orientation and near surface to obtain its normal; then, find the local 6 degrees-of-freedom (DoF) pose. This approach gives us not only the rigid object pose, but also the approximated pose of deformed objects. On the other hand, our RGB based approach explores machine learning with local appearance changes. Unlike other RGB based works, we replace the complex non-linear systems solvers with a fast and robust method, reaching local rotation of the observed feature points, as well as, full 6 DoF rigid object pose with dramatically lower real-time calculation demands. Both approaches show us that gathering local poses can bring information for the pose estimation of different types of objects. / O uso de objetos reais como meio de conexão entre informações reais e virtuais é um aspecto chave dentro da realidade aumentada. Uma questão central para tal conexão é a estimativa de informações visuo-espaciais do objeto, ou em outras palavras, a detecção da pose do objeto. Diferentes objetos podem ter diferentes comportamentos quando utilizados em interações. Não somente incluindo a mudança de posição, mas também sendo dobradas ou deformadas. Pesquisas tradicionais solucionam tais problemas de detecção usando diferentes abordagens, dependendo do tipo de objeto. Adicionalmente, algumas pesquisas se baseiam somente na informação posicional dos pontos de interesse, simplificando a informação do objeto. Neste trabalho, a detecção de pose de diferente objetos é explorada coletando-se mais informações dos pontos de interesse observados e, por sua vez, obtendo as poses locais de tais pontos, poses que não são exploradas em outras pesquisas. Este conceito da detecção de pose locais é aplicada em dois ambientes de capturas, estendendo-se em duas abordagens inovadoras: uma baseada em câmeras RGB-D, e outra baseada em câmeras RGB e métodos de aprendizado de maquinas. Na abordagem baseada em RGB-D, a orientação e superfície ao redor do ponto de interesse são utilizadas para obter a normal do ponto. Através de tais informações a pose local é obtida. Esta abordagem não só permite a obtenção de poses de objetos rígidos, mas também a pose aproximada de objetos deformáveis. Por outro lado, a abordagem baseada em RGB explora o aprendizado de máquina aplicado em alterações das aparências locais. Diferentemente de outros trabalhos baseados em câmeras RGB, esta abordagem substitui solucionadores não lineares complexos com um método rápido e robusto, permitindo a obtenção de rotações locais dos pontos de interesse, assim como, a pose completa (com 6 graus-de-liberdade) de objetos rígidos, com uma demanda computacional muito menor para cálculos em tempo-real. Ambas as abordagens mostram que a coleta de poses locais podem gerar informações para a detecção de poses de diferentes tipos de objetos. Augmented reality Computação gráfica Computer vision Detecção de poses. Poses locais Pose estimation Processamento de imagens Realidade aumentada Realidade virtual Visão computacional
203	Discriminative hand-object pose estimation from depth images using convolutional neural networks Goudie, Duncan January 2018 (has links) This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released. 004
204	Local pose estimation of feature points for object based augmented reality. / Detecção de poses locais de pontos de interesse para realidade aumentada baseadas em objetos. Daniel Makoto Tokunaga 27 June 2016 (has links) Usage of real objects as links between real and virtual information is one key aspect in augmented reality. A central issue to achieve this link is the estimation of the visuospatial information of the observed object, or in other words, estimating the object pose. Different objects can have different behaviors when used for interaction. This not only encompasses changes in position, but also folding or deformations. Traditional researches in the area solve those pose estimation problems using different approaches, depending on the type of the object. Additionally, some researches are based only on positional information of observed feature points, simplifying the object information. In this work, we explore the pose estimation of different objects by gathering more information from the observed feature points, and obtaining the local poses of such points, which are not explored in other researches. We apply this local pose estimation idea in two different capturing scenarios, reaching two novel approaches of pose estimation: one based on RGB-D cameras, and another based on RGB and machine learning methods. In the RGB-D based approach, we use the feature point orientation and near surface to obtain its normal; then, find the local 6 degrees-of-freedom (DoF) pose. This approach gives us not only the rigid object pose, but also the approximated pose of deformed objects. On the other hand, our RGB based approach explores machine learning with local appearance changes. Unlike other RGB based works, we replace the complex non-linear systems solvers with a fast and robust method, reaching local rotation of the observed feature points, as well as, full 6 DoF rigid object pose with dramatically lower real-time calculation demands. Both approaches show us that gathering local poses can bring information for the pose estimation of different types of objects. / O uso de objetos reais como meio de conexão entre informações reais e virtuais é um aspecto chave dentro da realidade aumentada. Uma questão central para tal conexão é a estimativa de informações visuo-espaciais do objeto, ou em outras palavras, a detecção da pose do objeto. Diferentes objetos podem ter diferentes comportamentos quando utilizados em interações. Não somente incluindo a mudança de posição, mas também sendo dobradas ou deformadas. Pesquisas tradicionais solucionam tais problemas de detecção usando diferentes abordagens, dependendo do tipo de objeto. Adicionalmente, algumas pesquisas se baseiam somente na informação posicional dos pontos de interesse, simplificando a informação do objeto. Neste trabalho, a detecção de pose de diferente objetos é explorada coletando-se mais informações dos pontos de interesse observados e, por sua vez, obtendo as poses locais de tais pontos, poses que não são exploradas em outras pesquisas. Este conceito da detecção de pose locais é aplicada em dois ambientes de capturas, estendendo-se em duas abordagens inovadoras: uma baseada em câmeras RGB-D, e outra baseada em câmeras RGB e métodos de aprendizado de maquinas. Na abordagem baseada em RGB-D, a orientação e superfície ao redor do ponto de interesse são utilizadas para obter a normal do ponto. Através de tais informações a pose local é obtida. Esta abordagem não só permite a obtenção de poses de objetos rígidos, mas também a pose aproximada de objetos deformáveis. Por outro lado, a abordagem baseada em RGB explora o aprendizado de máquina aplicado em alterações das aparências locais. Diferentemente de outros trabalhos baseados em câmeras RGB, esta abordagem substitui solucionadores não lineares complexos com um método rápido e robusto, permitindo a obtenção de rotações locais dos pontos de interesse, assim como, a pose completa (com 6 graus-de-liberdade) de objetos rígidos, com uma demanda computacional muito menor para cálculos em tempo-real. Ambas as abordagens mostram que a coleta de poses locais podem gerar informações para a detecção de poses de diferentes tipos de objetos. Computação gráfica Detecção de poses. Poses locais Processamento de imagens Realidade aumentada Realidade virtual Visão computacional Augmented reality Computer vision Pose estimation
205	Scene-Dependent Human Intention Recognition for an Assistive Robotic System Duncan, Kester 17 January 2014 (has links) In order for assistive robots to collaborate effectively with humans for completing everyday tasks, they must be endowed with the ability to effectively perceive scenes and more importantly, recognize human intentions. As a result, we present in this dissertation a novel scene-dependent human-robot collaborative system capable of recognizing and learning human intentions based on scene objects, the actions that can be performed on them, and human interaction history. The aim of this system is to reduce the amount of human interactions necessary for communicating tasks to a robot. Accordingly, the system is partitioned into scene understanding and intention recognition modules. For scene understanding, the system is responsible for segmenting objects from captured RGB-D data, determining their positions and orientations in space, and acquiring their category labels. This information is fed into our intention recognition component where the most likely object and action pair that the user desires is determined. Our contributions to the state of the art are manifold. We propose an intention recognition framework that is appropriate for persons with limited physical capabilities, whereby we do not observe human physical actions for inferring intentions as is commonplace, but rather we only observe the scene. At the core of this framework is our novel probabilistic graphical model formulation entitled Object-Action Intention Networks. These networks are undirected graphical models where the nodes are comprised of object, action, and object feature variables, and the links between them indicate some form of direct probabilistic interaction. This setup, in tandem with a recursive Bayesian learning paradigm, enables our system to adapt to a user's preferences. We also propose an algorithm for the rapid estimation of position and orientation values of scene objects from single-view 3D point cloud data using a multi-scale superquadric fitting approach. Additionally, we leverage recent advances in computer vision for an RGB-D object categorization procedure that balances discrimination and generalization as well as a depth segmentation procedure that acquires candidate objects from tabletops. We demonstrate the feasibility of the collaborative system presented herein by conducting evaluations on multiple scenes comprised of objects from 11 categories, along with 7 possible actions, and 36 possible intentions. We achieve approximately 81% reduction in interactions overall after learning despite changes to scene structure. 3D Object Categorization 3D Object Pose Estimation Markov Networks Object-Action Pairs Superquadrics Artificial Intelligence and Robotics Computer Sciences Robotics
206	AFFECT-PRESERVING VISUAL PRIVACY PROTECTION Xu, Wanxin 01 January 2018 (has links) The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding. The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection. The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously. Pose Estimation Human Body Reshaping 3D Face Reconstruction Facial Expression Transfer Visual Privacy Protection Engineering Social and Behavioral Sciences
207	Pose AR: Assessing Pose Based Input in an AR Context Jakub, Nilsson January 2019 (has links) Despite the rapidly growing adoption of augmented reality (AR) applications, existing methods for interacting with AR content are rated poorly, with surveyors of the area calling for better means of interaction, while researchers strive to create more natural input methods, mainly focusing on gesture input. This thesis aims to contribute to the aforementioned efforts by recognizing that technologies for consumer-grade smartphone-based pose estimation have been rapidly improving in recent years and due to their increased accuracy may have untapped potential ready to be utilized for user input. To this end, a rudimentary system for pose based input is integrated into prototype applications, which are constructed with both pose based input and touch input in mind. In this work, pose, pose estimation, and posed based input refer to using the distance and orientation of the user (or more precisely, the distance and orientation of their device) in relation to the AR content. Using said prototypes within a user interaction study allowed the identification of user preferences which indicate the approaches that future efforts into utilizing pose for input in an AR context ought to adopt. By comparing questionnaire answers and logged positional data across four prototype scenarios, it can be clearly identified that to perceive pose input as intuitive, the AR experiences shouldn’t employ a scale which is so large that it requires substantial shifts in the position of the user, as opposed to merely shifts in the position of the user’s device. Augmented Reality Natural User Interfaces Mobile AR Pose Estimation relative distance and orientation input Human Computer Interaction
208	Vision based pose estimation for autonomous helicopter landing / Kamerabaserad position- och attitydskattning för autonom helikopterlandning Saläng, Björn, Salomonsson, Henrik January 2008 (has links) <p>The market for unmanned aerial vehicles (UAVs) is growing rapidly. In order to meet the demand for marine applications CybAero AB has recently started a project named Mobile Automatic Launch and Landing Station (MALLS). MALLS enables the uav to land on moving targets such as ships. This thesis studies a system that estimates the pose of a helicopter in order to land on a moving target.</p><p>The main focus has been on developing a pose estimation system using computer vision. Two different methods for estimating the pose have been studied, one using homography and one using an Extended Kalman Filter (ekf). Both methods have been tested on real flight data from a camera mounted on a RC-helicopter. The accuracy of the pose estimation system has been verified with data from a test with the camera mounted on an industrial robot. The test results show that the ekf-based system is less sensitive to noise than the homography-based system. The ekf-based system however requires initial values which the homography-based system does not. The accuracy of both systems is found to be good enough for the purpose.</p><p>A novel control system with control rules for performing an autonomous landing on a moving target has been developed. The control system has not been tested during flight.</p> / <p>Marknaden for obemannade autonoma luftburna farkoster (UAV:er) växer snabbt. För att möta behovet av marina tillämpningar har CybAero AB nyligen startat ett projekt som kallas Mobil Automatisk Start- och Landningsstation (MALLS). Syftet med malls är att möjliggöra autonom start och landning på objekt i rörelse, som till exempel ett fartyg. I det här examensarbetet studeras ett system för att bestämma position och attityd för en helikopter relativt en helikopterplatta, för att möjliggöra landning på ett ojekt i rörelse.</p><p>Fokus har främst legat på att utveckla ett positionerings- och attitydbestämningssystem. Ett datorseende positionerings- och attitydbestämningssystem har utvecklats. Två olika metoder har undersökts, ett system som bygger på homografi och ett annat som bygger på Extended Kalman Filter (EKF). Båda metoderna har testats med verklig data från en kamera monterad på en RC helikopter. Noggrannheten i positionsbestämmelsen har undersökts med hjälp av data från en industrirobot. Testresultaten visar att det EKF-baserade systemet är mindre bruskänsligt än det homografibaserade systemet. En nackdel med det ekf-baserade systemet är däremot att det kräver initialvillkor vilket det homografibaserade systemet inte gör. Noggrannheten på båda systemen finner vi tillfredsställande för syftet.</p><p>Ett enkelt styrsystem med styrlagar för att genomföra landningar på ett rörligtobjekt har utvecklats. Styrsystemet har dock inte testats under verklig flygning.</p> autonomous helicopter UAV computer vision pose extended kalman filter control Electrical engineering Elektroteknik Automatic control Reglerteknik Image analysis Bildanalys
209	Compact Representations and Multi-cue Integration for Robotics Söderberg, Robert January 2005 (has links) <p>This thesis presents methods useful in a bin picking application, such as detection and representation of local features, pose estimation and multi-cue integration.</p><p>The scene tensor is a representation of multiple line or edge segments and was first introduced by Nordberg in [30]. A method for estimating scene tensors from gray-scale images is presented. The method is based on orientation tensors, where the scene tensor can be estimated by correlations of the elements in the orientation tensor with a number of 1<em>D</em> filters. Mechanisms for analyzing the scene tensor are described and an algorithm for detecting interest points and estimating feature parameters is presented. It is shown that the algorithm works on a wide spectrum of images with good result.</p><p>Representations that are invariant with respect to a set of transformations are useful in many applications, such as pose estimation, tracking and wide baseline stereo. The scene tensor itself is not invariant and three different methods for implementing an invariant representation based on the scene tensor is presented. One is based on a non-linear transformation of the scene tensor and is invariant to perspective transformations. Two versions of a tensor doublet is presented, which is based on a geometry of two interest points and is invariant to translation, rotation and scaling. The tensor doublet is used in a framework for view centered pose estimation of 3<em>D</em> objects. It is shown that the pose estimation algorithm has good performance even though the object is occluded and has a different scale compared to the training situation.</p><p>An industrial implementation of a bin picking application have to cope with several different types of objects. All pose estimation algorithms use some kind of model and there is yet no model that can cope with all kinds of situations and objects. This thesis presents a method for integrating cues from several pose estimation algorithms for increasing the system stability. It is also shown that the same framework can also be used for increasing the accuracy of the system by using cues from several views of the object. An extensive test with several different objects, lighting conditions and backgrounds shows that multi-cue integration makes the system more robust and increases the accuracy.</p><p>Finally, a system for bin picking is presented, built from the previous parts of this thesis. An eye in hand setup is used with a standard industrial robot arm. It is shown that the system works for real bin-picking situations with a positioning error below 1 mm and an orientation error below 1<sup>o</sup> degree for most of the different situations.</p> / Report code: LiU-TEK-LIC-2005:15. Electrical engineering bin picking application pose estimation multi-cue integration Elektroteknik Electrical engineering Elektroteknik
210	Triangulation Based Fusion of Sonar Data with Application in Mobile Robot Mapping and Localization Wijk, Olle January 2001 (has links) No description available. mobile robots sensor fusion sonars odometry triangulation based fusion mapping occupancy grids pose tracking localization Kalman filter condensation navigation

Search results