Global ETD Search

61	Pose Classification of Horse Behavior in Video : A deep learning approach for classifying equine poses based on 2D keypoints / Pose-klassificering av Hästbeteende i Video : En djupinlärningsmetod för klassificering av hästposer baserat på 2D-nyckelpunkter Söderström, Michaela January 2021 (has links) This thesis investigates whether Computer Vision can be a useful tool in interpreting the behaviors of monitored horses. In recent years, research in the field of Computer Vision has primarily focused on people, where pose estimation and action recognition are popular research areas. The thesis presents a pose classification network, where input features are described by estimated 2D key- points of horse body parts. The network output classifies three poses: ’Head above the wither’, ’Head aligned with the wither’ and ’Head below the wither’. The 2D reconstructions of keypoints are obtained using DeepLabCut applied to raw video surveillance data of a single horse. The estimated keypoints are then fed into a Multi-layer preceptron, which is trained to classify the mentioned classes. The network shows promising results with good performance. We found label noise when we spot-checked random samples of predicted poses and comparing them to the ground truth, as some of the labeled data consisted of false ground truth samples. Despite this fact, the conclusion is that satisfactory results are achieved with our method. Particularly, the keypoint estimates were sufficient enough for these poses for the model to succeed to classify a hold-out set of poses. / Uppsatsen undersöker främst om datorseende kan vara ett användbart verktyg för att tolka beteendet hos övervakade hästar. Under de senaste åren har forskning inom datorseende främst fokuserat på människor, där pose-estimering och händelseigenkänning är populära forskningsområden. Denna avhandling presenterar ett poseklassificeringsnätverk där indata beskrivs av uppskattade 2Dnyckelpunkter (eller så kallade intressepunkter) för hästkroppsdelar. Nätverket klassificerar tre poser: ’Huvud ovanför manken’, ’Huvud i linje med manken’ och ’Huvudet nedanför manken’. 2D-rekonstruktioner av nyckelpunkter erhålls med hjälp av DeepLabCut, applicerad på rå videoövervakningsdata för en häst. De uppskattade nyckelpunkterna matas sedan in i ett flerskikts- preceptron, som tränas för att klassificera de nämnda klasserna. Nätverket visar lovande resultat med bra prestanda. Vi hittade brus i etiketterna vid slumpmässiga stickprover av förutspådda poser som jämfördes med sanna etiketter där några etiketter bestod av falska sanna etiketter. Trots detta är slutsatsen att tillfredsställande resultat uppnås med vår metod. Speciellt var de estimerade nyckelpunkterna tillräckliga för dessa poser för att nätverket skulle lyckas med att klassificera ett separat dataset av samma osedda poser. Deep Learning Computer Visison Horse behavior Pose estimation 2D key- points Pose classification DeepLabCut Djupinlärning Datorseende Hästbeteende Pose-estimering Nyckelpunkter Intressepunkter Pose-klassificering DeepLabCut Computer and Information Sciences Data- och informationsvetenskap
62	A Comparison of Two-Dimensional Pose Estimation Algorithms Based on Natural Features Korte, Christopher M. 23 September 2011 (has links) No description available. Aerospace Materials Pose-Estimation Image Processing Model Matching
63	3D Deep Learning for Object-Centric Geometric Perception Li, Xiaolong 30 June 2022 (has links) Object-centric geometric perception aims at extracting the geometric attributes of 3D objects. These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images. Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation. To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective. / Doctor of Philosophy / 3D sensors these days are widely equipped on various mobile devices like a depth camera on iPhone, or laser LiDAR sensors on an autonomous driving vehicle. These 3D sensing techniques could help us get accurate measurements of the 3D world. For the field of machine intel- ligence, we also want to build intelligent system and algorithm to learn useful information and understand the 3D world better. We human beings have the incredible ability to sense and understand this 3D world through our visual or tactile system. For example, humans could infer the geometry structure and arrangement of furniture in a room without seeing the full room, we are able to track an 3D object no matter how its appearance, shape and scale changes, we could also predict the future motion of multiple objects based on sequential observation and complex reasoning. Here my work designs various frameworks to learn such 3D information from geometric data represented by a lot of 3D points, which achieves fine-grained geometric understanding of individual objects, and we can help machine tell the target objects' geometry, states, and dynamics. The work in this dissertation serves as building blocks towards a better understanding of this dynamic world. point cloud pose estimation equivariance motion forecasting shape completion
64	Reflectance Maps for Non-Lambertian 3D Reconstruction / 反射マップを用いた非ランバート面の３次元形状復元 Yamashita, Kohei 25 March 2024 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25421号 / 情博第859号 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授西野恒, 教授西田眞也, 教授河原達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM 3D Reconstruction Non-Lambertian Reflectance Reflectance Map Camera Pose 007
65	[en] AN EVALUATION OF AUTOMATIC FACE RECOGNITION METHODS FOR SURVEILLANCE / [pt] ESTUDO DE MÉTODOS AUTOMÁTICOS DE RECONHECIMENTO FACIAL PARA VÍDEO MONITORAMENTO VICTOR HUGO AYMA QUIRITA 26 March 2015 (has links) [pt] Esta dissertação teve por objetivo comparar o desempenho de diversos algoritmos que representam o estado da arte em reconhecimento facial a imagens de sequências de vídeo. Três objetivos específicos foram perseguidos: desenvolver um método para determinar quando uma face está em posição frontal com respeito à câmera (detector de face frontal); avaliar a acurácia dos algoritmos de reconhecimento com base nas imagens faciais obtidas com ajuda do detector de face frontal; e, finalmente, identificar o algoritmo com melhor desempenho quando aplicado a tarefas de verificação e identificação. A comparação dos métodos de reconhecimento foi realizada adotando a seguinte metodologia: primeiro, foi criado um detector de face frontal que permitiu o captura das imagens faciais frontais; segundo, os algoritmos foram treinados e testados com a ajuda do facereclib, uma biblioteca desenvolvida pelo Grupo de Biometria no Instituto de Pesquisa IDIAP; terceiro, baseando-se nas curvas ROC e CMC como métricas, compararam-se os algoritmos de reconhecimento; e por ultimo, as análises dos resultados foram realizadas e as conclusões estão relatadas neste trabalho. Experimentos realizados sobre os bancos de vídeo: MOBIO, ChokePOINT, VidTIMIT, HONDA, e quatro fragmentos de diversos filmes, indicam que o Inter Session Variability Modeling e Gaussian Mixture Model são os algoritmos que fornecem a melhor acurácia quando são usados em tarefas tanto de verificação quanto de identificação, o que os indica como técnicas de reconhecimento viáveis para o vídeo monitoramento automático em vídeo. / [en] This dissertation aimed to compare the performance of state-of-the-arte face recognition algorithms in facial images captured from multiple video sequences. Three specific objectives were pursued: to develop a method for determining when a face is in frontal position with respect to the camera (frontal face detector); to evaluate the accuracy for recognition algorithms based on the facial images obtained with the help of the frontal face detector; and finally, to identify the algorithm with better performance when applied to verification and identification tasks in video surveillance systems. The comparison of the recognition methods was performed adopting the following approach: first, a frontal face detector, which allowed the capture of facial images was created; second, the algorithms were trained and tested with the help of facereclib, a library developed by the Biometrics Group at the IDIAP Research Institute; third, ROC and CMC curves were used as metrics to compare the recognition algorithms; and finally, the results were analyzed and the conclusions were reported in this manuscript. Experiments conducted on the video datasets: MOBIO, ChokePOINT, VidTIMIT, HONDA, and four fragments of several films, indicate that the Inter-Session Variability Modelling and Gaussian Mixture Model algorithms provide the best accuracy on classification when the algorithms are used in verification and identification tasks, which indicates them as a good automatic recognition techniques for video surveillance applications. [pt] BIOMETRIA [en] BIOMETRICS [pt] RECONHECIMENTO FACIAL [en] FACE RECOGNITION [pt] VIDEO MONITORAMENTO [en] VIDEO SURVEILLANCE [pt] ESTIMATIVA DE POSE [en] POSE ESTIMATION
66	Avancements dans l'estimation de pose et la reconstruction 3D de scènes à 2 et 3 vues / Advances on Pose Estimation and 3D Resconstruction of 2 and 3-View Scenes Fernandez Julia, Laura 13 December 2018 (has links) L'étude des caméras et des images a été un sujet prédominant depuis le début de la vision par ordinateur, l'un des principaux axes étant l'estimation de la pose et la reconstruction 3D. Le but de cette thèse est d'aborder et d'étudier certains problèmes et méthodes spécifiques du pipeline de la structure-from-motion afin d'améliorer la précision, de réaliser de vastes études pour comprendre les avantages et les inconvénients des modèles existants et de créer des outils mis à la disposition du public. Plus spécifiquement, nous concentrons notre attention sur les pairs stéréoscopiques et les triplets d'images et nous explorons certaines des méthodes et modèles capables de fournir une estimation de la pose et une reconstruction 3D de la scène.Tout d'abord, nous abordons la tâche d'estimation de la profondeur pour les pairs stéréoscopiques à l'aide de la correspondance de blocs. Cette approche suppose implicitement que tous les pixels du patch ont la même profondeur, ce qui produit l'artefact commun dénommé "foreground-fattening effect". Afin de trouver un support plus approprié, Yoon et Kweon ont introduit l'utilisation de poids basés sur la similarité des couleurs et la distance spatiale, analogues à ceux utilisés dans le filtre bilatéral. Nous présentons la théorie de cette méthode et l'implémentation que nous avons développée avec quelques améliorations. Nous discutons de quelques variantes de la méthode et analysons ses paramètres et ses performances.Deuxièmement, nous considérons l'ajout d'une troisième vue et étudions le tenseur trifocal, qui décrit les contraintes géométriques reliant les trois vues. Nous explorons les avantages offerts par cet opérateur dans la tâche d'estimation de pose d'un triplet de caméras par opposition au calcul des poses relatives paire par paire en utilisant la matrice fondamentale. De plus, nous présentons une étude et l’implémentation de plusieurs paramétrisations du tenseur. Nous montrons que l'amélioration initiale de la précision du tenseur trifocal n'est pas suffisante pour avoir un impact remarquable sur l'estimation de la pose après ajustement de faisceau et que l'utilisation de la matrice fondamentale avec des triplets d'image reste pertinente.Enfin, nous proposons d'utiliser un modèle de projection différent de celui de la caméra à sténopé pour l'estimation de la pose des caméras en perspective. Nous présentons une méthode basée sur la factorisation matricielle due à Tomasi et Kanade qui repose sur la projection orthographique. Cette méthode peut être utilisée dans des configurations où d'autres méthodes échouent, en particulier lorsque l'on utilise des caméras avec des objectifs à longue distance focale. La performance de notre implémentation de cette méthode est comparée à celle des méthodes basées sur la perspective, nous considérons que l'exactitude obtenue et la robustesse démontré en font un élément à considérer dans toute procédure de la SfM / The study of cameras and images has been a prominent subject since the beginning of computer vision, one of the main focus being the pose estimation and 3D reconstruction. The goal of this thesis is to tackle and study some specific problems and methods of the structure-from-motion pipeline in order to provide improvements in accuracy, broad studies to comprehend the advantages and disadvantages of the state-of-the-art models and useful implementations made available to the public. More specifically, we center our attention to stereo pairs and triplets of images and discuss some of the methods and models able to provide pose estimation and 3D reconstruction of the scene.First, we address the depth estimation task for stereo pairs using block-matching. This approach implicitly assumes that all pixels in the patch have the same depth producing the common artifact known as the ``foreground fattening effect''. In order to find a more appropriate support, Yoon and Kweon introduced the use of weights based on color similarity and spatial distance, analogous to those used in the bilateral filter. We present the theory of this method and the implementation we have developed with some improvements. We discuss some variants of the method and analyze its parameters and performance.Secondly, we consider the addition of a third view and study the trifocal tensor, which describes the geometric constraints linking the three views. We explore the advantages offered by this operator in the pose estimation task of a triplet of cameras as opposed to computing the relative poses pair by pair using the fundamental matrix. In addition, we present a study and implementation of several parameterizations of the tensor. We show that the initial improvement in accuracy of the trifocal tensor is not enough to have a remarkable impact on the pose estimation after bundle adjustment and that using the fundamental matrix with image triplets remains relevant.Finally, we propose using a different projection model than the pinhole camera for the pose estimation of perspective cameras. We present a method based on the matrix factorization due to Tomasi and Kanade that relies on the orthographic projection. This method can be used in configurations where other methods fail, in particular, when using cameras with long focal length lenses. The performance of our implementation of this method is compared to that given by the perspective-based methods, we consider that the accuracy achieved and its robustness make it worth considering in any SfM procedure Tenseur trifocal Reconstruction 3D Projection orthographique Stereovision Estimation de pose Stereovision Pose estimation Orthographic projection 3D reconstruction Trifocal tensor
67	Neutralisation des expressions faciales pour améliorer la reconnaissance du visage / Cancelling facial expressions for reliable 2D face recognition Chu, Baptiste 02 March 2015 (has links) Les variations de pose et d’expression constituent des limitations importantes à la reconnaissance de visages en deux dimensions. Dans cette thèse, nous proposons d’augmenter la robustesse des algorithmes de reconnaissances faciales aux changements de pose et d’expression. Pour cela, nous proposons d’utiliser un modèle 3D déformable de visage permettant d’isoler les déformations d’identité de celles relatives à l’expression. Plus précisément, étant donné une image de probe avec expression, une nouvelle vue synthétique du visage est générée avec une pose frontale et une expression neutre. Nous présentons deux méthodes de correction de l’expression. La première est basée sur une connaissance a priori dans le but de changer l’expression de l’image vers une expression neutre. La seconde méthode, conçue pour les scénarios de vérification, est basée sur le transfert de l’expression de l’image de référence vers l’image de probe. De nombreuses expérimentations ont montré une amélioration significative des performances et ainsi valider l’apport de nos méthodes. Nous proposons ensuite une extension de ces méthodes pour traiter de la problématique émergente de reconnaissance de visage à partir d’un flux vidéo. Pour finir, nous présentons différents travaux permettant d’améliorer les performances obtenues dans des cas spécifiques et ainsi améliorer les performances générales obtenues grâce à notre méthode. / Expression and pose variations are major challenges for reliable face recognition (FR) in 2D. In this thesis, we aim to endow state of the art face recognition SDKs with robustness to simultaneous facial expression variations and pose changes by using an extended 3D Morphable Model (3DMM) which isolates identity variations from those due to facial expressions. Specifically, given a probe with expression, a novel view of the face is generated where the pose is rectified and the expression neutralized. We present two methods of expression neutralization. The first one uses prior knowledge to infer the neutral expression from an input image. The second method, specifically designed for verification, is based on the transfer of the gallery face expression to the probe. Experiments using rectified and neutralized view with a standard commercial FR SDK on two 2D face databases show significant performance improvement and demonstrates the effectiveness of the proposed approach. Then, we aim to endow the state of the art FR SDKs with the capabilities to recognize faces in videos. Finally, we present different methods for improving biometric performances for specific cases. Reconnaissance de visage Expression Pose Modèle déformable 3D Neutralisation Video Face Recognition Expression Pose 3D Morphable Model Neutralization Video
68	Estimation de paramètres évoluant sur des groupes de Lie : application à la cartographie et à la localisation d'une caméra monoculaire / Parameter estimation on Lie groups : Application to mapping and localization from a monocular camera Bourmaud, Guillaume 06 November 2015 (has links) Dans ce travail de thèse, nous proposons plusieurs algorithmespermettant d'estimer des paramètres évoluant sur des groupes de Lie. Cesalgorithmes s’inscrivent de manière générale dans un cadre bayésien, ce qui nouspermet d'établir une notion d'incertitude sur les paramètres estimés. Pour ce faire,nous utilisons une généralisation de la distribution normale multivariée aux groupesde Lie, appelée distribution normale concentrée sur groupe de Lie.Dans une première partie, nous nous intéressons au problème du filtrage de Kalmanà temps discret et continu-discret où l’état et les d’observations appartiennent à desgroupes de Lie. Cette étude nous conduit à la proposition de deux filtres ; le CD-LGEKFqui permet de résoudre un problème à temps continu-discret et le D-LG-EKF quipermet de résoudre un problème à temps discret.Dans une deuxième partie, nous nous inspirons du lien entre optimisation et filtragede Kalman, qui a conduit au développement du filtrage de Kalman étendu itéré surespace euclidien, en le transposant aux groupes de Lie. Nous montrons ainsicomment obtenir une généralisation du filtre de Kalman étendu itéré aux groupes deLie, appelée LG-IEKF, ainsi qu’une généralisation du lisseur de Rauch-Tung-Striebelaux groupes de Lie, appelée LG-RTS.Finalement, dans une dernière partie, les concepts et algorithmes d’estimation surgroupes de Lie proposés dans la thèse sont utilisés dans le but de fournir dessolutions au problème de la cartographie d'un environnement à partir d'une caméramonoculaire d'une part, et au problème de la localisation d'une caméra monoculairese déplaçant dans un environnement préalablement cartographié d'autre part. / In this thesis, we derive novel parameter estimation algorithmsdedicated to parameters evolving on Lie groups. These algorithms are casted in aBayesian formalism, which allows us to establish a notion of uncertainty for theestimated parameters. To do so, a generalization of the multivariate normaldistribution to Lie groups, called concentrated normal distribution on Lie groups, isemployed.In a first part, we generalize the Continuous-Discrete Extended Kalman Filter (CDEKF),as well as the Discrete Extended Kalman Filter (D-EKF), to the case where thestate and the observations evolve on Lie groups. We obtain two novel algorithmscalled Continuous-Discrete Extended Kalman Filter on Lie Groups (CD-LG-EKF) andDiscrete Extended Kalman Filter on Lie Groups (D-LG-EKF).In a second part, we focus on bridging the gap between the formulation of intrinsicnon linear least squares criteria and Kalman filtering/smoothing on Lie groups. Wepropose a generalization of the Euclidean Iterated Extended Kalman Filter (IEKF) toLie groups, called LG-IEKF. We also derive a generalization of the Rauch-Tung-Striebel smoother (RTS), also known as Extended Kalman Smoother, to Lie groups,called LG-RTS.Finally, the concepts and algorithms presented in the thesis are employed in a seriesof applications. Firstly, we propose a novel simultaneous localization and mappingapproach. Secondly we develop an indoor camera localization framework. For thislatter purpose, we derived a novel Rao-Blackwellized particle smoother on Liegroups, which builds upon the LG-IEKF and the LG-RTS. Filtrage Lissage Variété Groupe de Lie Bayésien Pose de caméra Incertitude Filtering Smoothing Manifold Lie group Bayesian Camera pose
69	Localiza??o de um rob? m?vel usando odometria e marcos naturais Bezerra, Clauber Gomes 08 March 2004 (has links) Made available in DSpace on 2014-12-17T14:56:01Z (GMT). No. of bitstreams: 1 ClauberGB.pdf: 726956 bytes, checksum: d3fb1b2d7c6ad784a1b7d40c1a54f8f8 (MD5) Previous issue date: 2004-03-08 / Several methods of mobile robot navigation request the mensuration of robot position and orientation in its workspace. In the wheeled mobile robot case, techniques based on odometry allow to determine the robot localization by the integration of incremental displacements of its wheels. However, this technique is subject to errors that accumulate with the distance traveled by the robot, making unfeasible its exclusive use. Other methods are based on the detection of natural or artificial landmarks present in the environment and whose location is known. This technique doesnt generate cumulative errors, but it can request a larger processing time than the methods based on odometry. Thus, many methods make use of both techniques, in such a way that the odometry errors are periodically corrected through mensurations obtained from landmarks. Accordding to this approach, this work proposes a hybrid localization system for wheeled mobile robots in indoor environments based on odometry and natural landmarks. The landmarks are straight lines de.ned by the junctions in environments floor, forming a bi-dimensional grid. The landmark detection from digital images is perfomed through the Hough transform. Heuristics are associated with that transform to allow its application in real time. To reduce the search time of landmarks, we propose to map odometry errors in an area of the captured image that possesses high probability of containing the sought mark / Diversos m?todos de navega??o de rob?s m?veis requerem a medi??o da posi??o e orienta??o do rob? no seu espa?o de trabalho. No caso de rob?s m?veis com rodas, t?cnicas baseadas em odometria permitem determinar a localiza??o do rob? atrav?s da integra??o de medi??es dos deslocamentos incrementais de suas rodas. No entanto, essa t?cnica est? sujeita a erros que se acumulam com a dist?ncia percorrida pelo rob?, o que inviabiliza o seu uso exclusivo. Outros m?todos se baseiam na detec??o de marcos naturais ou artificiais, cuja localiza??o ? conhecida, presentes no ambiente. Apesar desta t?cnica n?o gerar erros cumulativos, ela pode requisitar um tempo de processamento bem maior do que o uso de odometria. Assim, muitos m?todos fazem uso de ambas as t?cnicas, de modo a corrigir periodicamente os erros de odometria, atrav?s de medi??es obtidas a partir dos marcos. De acordo com esta abordagem, propomos neste trabalho um sistema h?brido de localiza??o para rob?s m?veis com rodas em ambientes internos, baseado em odometria e marcos naturais, onde os marcos adotados s?o linhas retas definidas pelas jun??es existentes no piso do ambiente, formando uma grade bi-dimensional no ch?o. Para a detec??o deste tipo de marco, a partir de imagens digitais, ? utilizada a transformada de Hough, associada a heur?sticas que permitem a sua aplica??o em tempo real. Em particular, para reduzir o tempo de busca dos marcos, propomos mapear erros de odometria em uma regi?o da imagem capturada que possua grande probabilidade de conter o marco procurado Rob? M?vel Determina??o de Pose Marcos Naturais Odometria Mobile Robot Pose Determination Natural Landmarks Odometry CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
70	Synthetic Data Generation for 6D Object Pose and Grasping Estimation Martínez González, Pablo 16 March 2023 (has links) Teaching a robot how to behave so it becomes completely autonomous is not a simple task. When robotic systems become truly intelligent, interactions with them will feel natural and easy, but nothing could be further from truth. Make a robot understand its surroundings is a huge task that the computer vision field tries to address, and deep learning techniques are bringing us closer. But at the cost of the data. Synthetic data generation is the process of generating artificial data that is used to train machine learning models. This data is generated using computer algorithms and simulations, and is designed to resemble real-world data as closely as possible. The use of synthetic data has become increasingly popular in recent years, particularly in the field of deep learning, due to the shortage of high-quality annotated real-world data and the high cost of collecting it. For that reason, in this thesis we are addressing the task of facilitating the generation of synthetic data with the creation of a framework which leverages advances in modern rendering engines. In this context, the generated synthetic data can be used to train models for tasks such as 6D object pose estimation or grasp estimation. 6D object pose estimation refers to the problem of determining the position and orientation of an object in 3D space, while grasp estimation involves predicting the position and orientation of a robotic hand or gripper that can be used to pick up and manipulate the object. These are important tasks in robotics and computer vision, as they enable robots to perform complex manipulation and grasping tasks. In this work we propose a way of extracting grasping information from hand-object interactions in virtual reality, so that synthetic data can also boost research in that area. Finally, we use this synthetically generated data to test the proposal of applying 6D object pose estimation architectures to grasping region estimation. This idea is based on both problems sharing several underlying concepts such as object detection and orientation. / Enseñar a un robot a ser completamente autónomo no es tarea fácil. Cuando los sistemas robóticos sean realmente inteligentes, las interacciones con ellos parecerán naturales y fáciles, pero nada más lejos de la realidad. Hacer que un robot comprenda y asimile su entorno es una difícil cruzada que el campo de la visión por ordenador intenta abordar, y las técnicas de aprendizaje profundo nos están acercando al objetivo. Pero el precio son los datos. La generación de datos sintéticos es el proceso de generar datos artificiales que se utilizan para entrenar modelos de aprendizaje automático. Estos datos se generan mediante algoritmos informáticos y simulaciones, y están diseñados para parecerse lo más posible a los datos del mundo real. El uso de datos sintéticos se ha vuelto cada vez más popular en los últimos años, especialmente en el campo del aprendizaje profundo, debido a la escasez de datos reales anotados de alta calidad y al alto coste de su recopilación. Por ello, en esta tesis abordamos la tarea de facilitar la generación de datos sintéticos con la creación de una herramienta que aprovecha los avances de los motores modernos de renderizado. En este contexto, los datos sintéticos generados pueden utilizarse para entrenar modelos para tareas como la estimación de la pose 6D de objetos o la estimación de agarres. La estimación de la pose 6D de objetos se refiere al problema de determinar la posición y orientación de un objeto en el espacio 3D, mientras que la estimación del agarre implica predecir la posición y orientación de una mano robótica o pinza que pueda utilizarse para coger y manipular el objeto. Se trata de tareas importantes en robótica y visión por computador, ya que permiten a los robots realizar tareas complejas de manipulación y agarre. En este trabajo proponemos una forma de extraer información de agarres a partir de interacciones mano-objeto en realidad virtual, de modo que los datos sintéticos también puedan impulsar la investigación en esa área. Por último, utilizamos estos datos generados sintéticamente para poner a prueba la propuesta de aplicar arquitecturas de estimación de pose 6D de objetos a la estimación de regiones de agarre. Esta propuesta se basa en que ambos problemas comparten varios conceptos subyacentes, como la detección y orientación de objetos. / This thesis has been funded by the Spanish Ministry of Education [FPU17/00166] 6D Object Pose estimation Synthetic Data Sim2real Grasping Virtual Reality / Estimación de pose 6D Datos Sintéticos Simulación a real Agarres Realidad Virtual

Search results