41 |
Adaptive registration using 2D and 3D features for indoor scene reconstruction. / Registro adaptativo usando características 2D e 3D para reconstrução de cenas em ambientes internos.Juan Carlos Perafán Villota 27 October 2016 (has links)
Pairwise alignment between point clouds is an important task in building 3D maps of indoor environments with partial information. The combination of 2D local features with depth information provided by RGB-D cameras are often used to improve such alignment. However, under varying lighting or low visual texture, indoor pairwise frame registration with sparse 2D local features is not a particularly robust method. In these conditions, features are hard to detect, thus leading to misalignment between consecutive pairs of frames. The use of 3D local features can be a solution as such features come from the 3D points themselves and are resistant to variations in visual texture and illumination. Because varying conditions in real indoor scenes are unavoidable, we propose a new framework to improve the pairwise frame alignment using an adaptive combination of sparse 2D and 3D features based on both the levels of geometric structure and visual texture contained in each scene. Experiments with datasets including unrestricted RGB-D camera motion and natural changes in illumination show that the proposed framework convincingly outperforms methods using 2D or 3D features separately, as reflected in better level of alignment accuracy. / O alinhamento entre pares de nuvens de pontos é uma tarefa importante na construção de mapas de ambientes em 3D. A combinação de características locais 2D com informação de profundidade fornecida por câmeras RGB-D são frequentemente utilizadas para melhorar tais alinhamentos. No entanto, em ambientes internos com baixa iluminação ou pouca textura visual o método usando somente características locais 2D não é particularmente robusto. Nessas condições, as características 2D são difíceis de serem detectadas, conduzindo a um desalinhamento entre pares de quadros consecutivos. A utilização de características 3D locais pode ser uma solução uma vez que tais características são extraídas diretamente de pontos 3D e são resistentes a variações na textura visual e na iluminação. Como situações de variações em cenas reais em ambientes internos são inevitáveis, essa tese apresenta um novo sistema desenvolvido com o objetivo de melhorar o alinhamento entre pares de quadros usando uma combinação adaptativa de características esparsas 2D e 3D. Tal combinação está baseada nos níveis de estrutura geométrica e de textura visual contidos em cada cena. Esse sistema foi testado com conjuntos de dados RGB-D, incluindo vídeos com movimentos irrestritos da câmera e mudanças naturais na iluminação. Os resultados experimentais mostram que a nossa proposta supera aqueles métodos que usam características 2D ou 3D separadamente, obtendo uma melhora da precisão no alinhamento de cenas em ambientes internos reais.
|
42 |
Registro de imagens por correlação de fase para geração de imagens coloridas em retinógrafos digitais utilizando câmera CCD monocromática / Image registration using phase correlation to generate color images in digital fundus cameras using monochromatic CCD cameraJosé Augusto Stuchi 10 June 2013 (has links)
A análise da retina permite o diagnostico de muitas patologias relacionadas ao olho humano. A qualidade da imagem e um fator importante já que o médico normalmente examina os pequenos vasos da retina e a sua coloração. O equipamento normalmente utilizado para a visualização da retina e o retinógrafo digital, que utiliza sensor colorido com filtro de Bayer e luz (flash) branca. No entanto, esse filtro causa perda na resolução espacial, uma vez que e necessário um processo de interpolação matemática para a formação da imagem. Com o objetivo de melhorar a qualidade da imagem da retina, um retinógrafo com câmera CCD monocromática de alta resolução foi desenvolvido. Nele, as imagens coloridas são geradas pela combinação dos canais monocromáticos R (vermelho), G (verde) e B (azul), adquiridos com o chaveamento da iluminação do olho com LED vermelho, verde e azul, respectivamente. Entretanto, o pequeno período entre os flashes pode causar desalinhamento entre os canais devido a pequenos movimentos do olho. Assim, este trabalho apresenta uma técnica de registro de imagens, baseado em correlação de fase no domínio da frequência, para realizar precisamente o alinhamento dos canais RGB no processo de geração de imagens coloridas da retina. A validação do método foi realizada com um olho mecânico (phantom) para a geração de 50 imagens desalinhadas que foram corrigidas pelo método proposto e comparadas com as imagens alinhadas obtidas como referência (ground-truth). Os resultados mostraram que retinógrafo com câmera monocromática e o método de registro proposto nesse trabalho podem produzir imagens coloridas da retina com alta resolução espacial, sem a perda de qualidade intrínseca às câmeras CCD coloridas que utilizam o filtro de Bayer. / The analysis of retina allows the diagnostics of several pathologies related to the human eye. Image quality is an important factor since the physician often examines the small vessels of the retina and its color. The device usually used to observe the retina is the fundus camera, which uses color sensor with Bayer filter and white light. However, this filter causes loss of spatial resolution, since it is necessary a mathematical interpolation process to create the final image. Aiming at improving the retina image quality, a fundus camera with monochromatic CCD camera was developed. In this device, color images are generated by combining the monochromatic channels R (red), G (green) and B (blue), which were acquired by switching the eye illumination with red, green and blue light, respectively. However, the short period between the flashes may cause misalignment among the channels because of the small movements of the eye. Thus, this work presents an image registration technique based on phase correlation in the frequency domain, for accurately aligning the RGB channels on the process of generating retina color images. Validation of the method was performed by using a mechanical eye (phantom) for generating 50 misaligned images, which were aligned by the proposed method and compared to the aligned images obtained as references (ground-truth). Results showed that the fundus camera with monochromatic camera and the method proposed in this work can produce high spatial resolution images without the loss of quality intrinsic to color CCD cameras that uses Bayer filter.
|
43 |
Unconstrained Gaze Estimation Using RGB-D Camera. / Estimation du regard avec une caméra RGB-D dans des environnements utilisateur non-contraintsKacete, Amine 15 December 2016 (has links)
Dans ce travail, nous avons abordé le problème d’estimation automatique du regard dans des environnements utilisateur sans contraintes. Ce travail s’inscrit dans la vision par ordinateur appliquée à l’analyse automatique du comportement humain. Plusieurs solutions industrielles sont aujourd’hui commercialisées et donnent des estimations précises du regard. Certaines ont des spécifications matérielles très complexes (des caméras embarquées sur un casque ou sur des lunettes qui filment le mouvement des yeux) et présentent un niveau d’intrusivité important, ces solutions sont souvent non accessible au grand public. Cette thèse vise à produire un système d’estimation automatique du regard capable d’augmenter la liberté du mouvement de l’utilisateur par rapport à la caméra (mouvement de la tête, distance utilisateur-capteur), et de réduire la complexité du système en utilisant des capteurs relativement simples et accessibles au grand public. Dans ce travail, nous avons exploré plusieurs paradigmes utilisés par les systèmes d’estimation automatique du regard. Dans un premier temps, Nous avons mis au point deux systèmes basés sur deux approches classiques: le premier basé caractéristiques et le deuxième basé semi apparence. L’inconvénient majeur de ces paradigmes réside dans la conception des systèmes d'estimation du regard qui supposent une indépendance totale entre l'image d'apparence des yeux et la pose de la tête. Pour corriger cette limitation, Nous avons convergé vers un nouveau paradigme qui unifie les deux blocs précédents en construisant un espace regard global, nous avons exploré deux directions en utilisant des données réelles et synthétiques respectivement. / In this thesis, we tackled the automatic gaze estimation problem in unconstrained user environments. This work takes place in the computer vision research field applied to the perception of humans and their behaviors. Many existing industrial solutions are commercialized and provide an acceptable accuracy in gaze estimation. These solutions often use a complex hardware such as range of infrared cameras (embedded on a head mounted or in a remote system) making them intrusive, very constrained by the user's environment and inappropriate for a large scale public use. We focus on estimating gaze using cheap low-resolution and non-intrusive devices like the Kinect sensor. We develop new methods to address some challenging conditions such as head pose changes, illumination conditions and user-sensor large distance. In this work we investigated different gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous components and building a global gaze manifold, we explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples.
|
44 |
Laboratorní pracoviště pro měření věrnosti barev ve videotechnice / Laboratory site for color measurement in video technologyMelo, Jan January 2009 (has links)
The diploma thesis is dividend into four parts. The first part describes basic terms in video technology (luminance, hue, diagrams CIE). The second part includes types of colour spaces RGB, HSV, CMY(K), YUV, YCbCr, YIQ. In the third and fourth part, these theoretical findings are used to propound laboratory observations. The laboratory observation processes the colour rendition of the colours in video technology. In the Matlab software, a user system environment was developed for operations with measured values. The software is capable of recalculating chromaticity coordinates between different colour spaces, to screen colours into diagrams CIE and to show the vectors of colours. The device used for measuring was Chromametr Konica Minolta CS-100A. A manual for the device was created. The laboratory observation was measured and processed in the form of a laboratory protocol.
|
45 |
Inpainting de modèles 3D pour la réalité diminuée : "couper/coller" réaliste pour l'aménagement d'intérieur / Inpainting of 3D models applied to the Diminished Reality : realistic "Cut/Paste" for indoor arrangementFayer, Julien 19 April 2019 (has links)
Par opposition à la Réalité Augmentée qui consiste à ajouter des éléments virtuels à un environnement réel, la Réalité Diminuée consiste à supprimer des éléments réels d'un environnement. Le but est d'effectuer un rendu visuel d'une scène 3D où les éléments "effacés" ne sont plus présents : la difficulté consiste à créer une image de sorte que la diminution ne soit pas perceptible par l'utilisateur. Il faut donc venir compléter la scène initialement cachée par ces éléments, en effectuant une opération d'inpainting qui prenne en compte la géométrie de la pièce, sa texture (structurée ou non), et la luminosité ambiante de l'environnement. Par exemple, l’œil humain est sensible à la régularité d'une texture. L'un des objectifs d'Innersense, entreprise spécialisée dans l'aménagement virtuel d’intérieurs, est de développer un produit capable d'enlever des éléments présents dans une pièce d'intérieur. Une fois la suppression virtuelle des meubles existants effectuée , il sera alors possible d'ajouter des meubles virtuels dans l'espace laissé vacant. L'objectif de cette thèse CIFRE est donc de mettre en place un scénario de réalité diminuée pouvant être exécuté sur un système mobile (tablette IOS ou Android) qui génère des images photo-réalistes de la scène diminuée. Pour cela, à partir d’un modèle géométrique de la pièce d'intérieur que l'on veut altérer, nous adaptons et améliorons des procédures d'effacement d'éléments d'une image appelées inpainting dans une image 2D. Ensuite, nous appliquons ces techniques dans le contexte 3D intérieur pour tenir compte de la géométrie de la scène. Enfin, nous analysons la luminosité pour augmenter le réalisme des zones complétées.Dans cette thèse, nous rappelons d'abord les différents travaux académiques et les solutions industrielles existantes. Nous évoquons leurs avantages et leurs limites. Nous abordons ensuite les différentes techniques d'inpainting existantes pour introduire notre première contribution qui propose d'adapter une des méthodes de l’état de l’art pour prendre en compte de la structure du motif de la texture. La problématique de la luminosité est ensuite abordée en proposant un processus qui traite séparément la texture et la variation de la luminosité. Nous présentons ensuite une troisième contribution qui propose un critère de confiance basé sur des considérations radiométriques pour sélectionner une information selon sa qualité dans le processus d'inpainting. Nous proposons une dernière contribution basée sur la complétion de texture de modèles 3D non planaires reconstruits à partir de peu d’images et donc présentant une texture incomplète. Enfin, nous montrons les applications développées grâce à ces travaux dans le contexte des scènes d'intérieur considérées par Innersense / In contrast to Augmented Reality, which consists in adding virtual elements to a real environment,Diminished Reality consists in removing real elements from an environment. The goal is to visuallyrender a 3D scene where the "deleted" elements are no longer present: the difficulty is to createan image so that the processing is not perceptible to the user. It is therefore necessary tocomplete the scene initially hidden by these elements, by performing an inpainting operation thattakes into account the geometry of the part, its texture (structured or not), and the ambientbrightness of the environment. For example, the human eye is sensitive to the regularity of atexture. One of the objectives of Innersense, a company specializing in virtual interior design, is todevelop a product that can remove elements from an interior room. Once the virtual removal ofexisting furniture has been completed, it will then be possible to add virtual furniture in the vacantspace. The objective of this CIFRE thesis is therefore to set up a scenario of diminished realitythat can be executed on a mobile system (IOS or Android tablet) that generates photorealisticimages of the diminished scene. To do this, based on a geometric model of the interior part thatwe want to alter, we adapt and improve procedures for erasing elements of an image calledinpainting in a 2D image. Then, we apply these techniques in the 3D indoor context to take intoaccount the geometry of the scene. Finally, we analyze the brightness to increase the realism ofthe completed areas. In this thesis, we first review the various academic works and existingindustrial solutions. We discuss their advantages and limitations. We then discuss the differentexisting inpainting techniques to introduce our first contribution which proposes to adapt one of thestate of the art methods to take into account the structure of the texture pattern. The problem ofbrightness is then discussed by proposing a process that deals separately with texture andvariation of brightness. We then present a third contribution that proposes a confidence criterionbased on radiometric considerations to select information according to its quality in the inpaintingprocess. We propose a last contribution based on the texture completion of non-planar 3D modelsreconstructed from few images and therefore presenting an incomplete texture. Finally, we showthe applications developed through this work in the context of the interior scenes considered byInnersense.
|
46 |
[en] GRAPH OPTIMIZATION AND PROBABILISTIC SLAM OF MOBILE ROBOTS USING AN RGB-D SENSOR / [pt] OTIMIZAÇÃO DE GRAFOS E SLAM PROBABILÍSTICO DE ROBÔS MÓVEIS USANDO UM SENSOR RGB-D23 March 2021 (has links)
[pt] Robôs móveis têm uma grande gama de aplicações, incluindo veículos
autônomos, robôs industriais e veículos aéreos não tripulados. Navegação
móvel autônoma é um assunto desafiador devido à alta incerteza e nãolinearidade
inerente a ambientes não estruturados, locomoção e medições de
sensores. Para executar navegação autônoma, um robô precisa de um mapa
do ambiente e de uma estimativa de sua própria localização e orientação
em relação ao sistema de referência global. No entando, geralmente o
robô não possui informações prévias sobre o ambiente e deve criar o
mapa usando informações de sensores e se localizar ao mesmo tempo,
um problema chamado Mapeamento e Localização Simultâneos (SLAM).
As formulações de SLAM usam algoritmos probabilísticos para lidar com
as incertezas do problema, e a abordagem baseada em grafos é uma das
soluções estado-da-arte para SLAM. Por muitos anos os sensores LRF (laser
range finders) eram as escolhas mais populares de sensores para SLAM.
No entanto, sensores RGB-D são uma alternativa interessante, devido ao
baixo custo. Este trabalho apresenta uma implementação de RGB-D SLAM
com uma abordagem baseada em grafos. A metodologia proposta usa o
Sistema Operacional de Robôs (ROS) como middleware do sistema. A
implementação é testada num robô de baixo custo e com um conjunto de
dados reais obtidos na literatura. Também é apresentada a implementação
de uma ferramenta de otimização de grafos para MATLAB. / [en] Mobile robots have a wide range of applications, including autonomous
vehicles, industrial robots and unmanned aerial vehicles. Autonomous mobile
navigation is a challenging subject due to the high uncertainty and nonlinearity
inherent to unstructured environments, robot motion and sensor
measurements. To perform autonomous navigation, a robot need a map of
the environment and an estimation of its own pose with respect to the global
coordinate system. However, usually the robot has no prior knowledge about
the environment, and has to create a map using sensor information and localize
itself at the same time, a problem called Simultaneous Localization and
Mapping (SLAM). The SLAM formulations use probabilistic algorithms to
handle the uncertainties of the problem, and the graph-based approach is
one of the state-of-the-art solutions for SLAM. For many years, the LRF
(laser range finders) were the most popular sensor choice for SLAM. However,
RGB-D sensors are an interesting alternative, due to their low cost.
This work presents an RGB-D SLAM implementation with a graph-based
probabilistic approach. The proposed methodology uses the Robot Operating
System (ROS) as middleware. The implementation is tested in a low
cost robot and with real-world datasets from literature. Also, it is presented
the implementation of a pose-graph optimization tool for MATLAB.
|
47 |
Improving Visual Question Answering by Leveraging Depth and Adapting Explainability / Förbättring av Visual Question Answering (VQA) genom utnyttjandet av djup och anpassandet av förklaringsförmåganPanesar, Amrita Kaur January 2022 (has links)
To produce smooth human-robot interactions, it is important for robots to be able to answer users’ questions accurately and provide a suitable explanation for why they arrive to the answer they provide. However, in the wild, the user may ask the robot questions relating to aspects of the scene that the robot is unfamiliar with and hence be unable to answer correctly all of the time. In order to gain trust in the robot and resolve failure cases where an incorrect answer is provided, we propose a method that uses Grad-CAM explainability on RGB-D data. Depth is a critical component in producing more intelligent robots that can respond correctly most of the time as some questions might rely on spatial relations within the scene, for which 2D RGB data alone would be insufficient. To our knowledge, this work is the first of its kind to leverage depth and an explainability module to produce an explainable Visual Question Answering (VQA) system. Furthermore, we introduce a new dataset for the task of VQA on RGB-D data, VQA-SUNRGBD. We evaluate our explainability method against Grad-CAM on RGB data and find that ours produces better visual explanations. When we compare our proposed model on RGB-D data against the baseline VQN network on RGB data alone, we show that ours outperforms, particularly in questions relating to depth such as asking about the proximity of objects and relative positions of objects to one another. / För att skapa smidiga interaktioner mellan människa och robot är det viktigt för robotar att kunna svara på användarnas frågor korrekt och ge en lämplig förklaring till varför de kommer fram till det svar de ger. Men i det vilda kan användaren ställa frågor till roboten som rör aspekter av miljön som roboten är obekant med och därmed inte kunna svara korrekt hela tiden. För att få förtroende för roboten och lösa de misslyckade fall där ett felaktigt svar ges, föreslår vi en metod som använder Grad-CAM-förklarbarhet på RGB-D-data. Djup är en kritisk komponent för att producera mer intelligenta robotar som kan svara korrekt för det mesta, eftersom vissa frågor kan förlita sig på rumsliga relationer inom scenen, för vilka enbart 2D RGB-data skulle vara otillräcklig. Såvitt vi vet är detta arbete det första i sitt slag som utnyttjar djup och en förklaringsmodul för att producera ett förklarabart Visual Question Answering (VQA)-system. Dessutom introducerar vi ett nytt dataset för uppdraget av VQA på RGB-D-data, VQA-SUNRGBD. Vi utvärderar vår förklaringsmetod mot Grad-CAM på RGB-data och finner att vår modell ger bättre visuella förklaringar. När vi jämför vår föreslagna modell för RGB-Ddata mot baslinje-VQN-nätverket på enbart RGB-data visar vi att vår modell överträffar, särskilt i frågor som rör djup, som att fråga om objekts närhet och relativa positioner för objekt jämntemot varandra.
|
48 |
Eye Tracking Using a Smartphone Camera and Deep Learning / Blickspårning med mobilkamera och djupinlärningSkowronek, Adam, Kuleshov, Oleksandr January 2020 (has links)
Tracking eye movements has been a central part in understanding attention and visual processing in the mind. Studying how the eyes move and what they fixate on during specific moments has been considered by some to offer a direct way to measure spatial attention. The underlying technology, known as eye tracking, has been used in order to reliably and accurately measure gaze. Despite the numerous benefits of eye tracking, research and development as well as commercial applications have been limited due to the cost and lack of scalability which the technology usually entails. The purpose and goal of this project is to make eye tracking more available to the common user by implementing and evaluating a new promising technique. The thesis explores the possibility of implementing a gaze tracking prototype using a normal smartphone camera. The hypothesis is to achieve accurate gaze estimation by utilizing deep learning neural networks and personalizing them to fit each individual. The resulting prototype is highly inaccurate in its estimations; however, adjusting a few key components such as the neural network initialization weights may lead to improved results. / Att spåra ögonrörelser har varit en central del i att förstå uppmärksamhet och visuell bearbetning i hjärnan. Att studera hur ögonen rör sig och vad de fokuserar på under specifika moment har av vissa ansetts vara ett sätt att mäta visuell uppmärksamhet. Den bakomliggande tekniken, känd som blickspårning, har använts för att pålitligt och noggrant mäta blickens riktning. Trots de fördelar som finns med blickspårning, har forskning och utveckling samt även kommersiella produkter begränsats av kostnaden och oförmågan till skalbarhet som tekniken ofta medför. Syftet och målet med arbetet är att göra blickspårning mer tillgängligt för vardagliga användare genom att implementera och utvärdera en ny lovande teknik. Arbetet undersöker möjligheten att implementera en blickspårningsprototyp genom användning av en vanlig mobilkamera. Hypotesen är att uppnå noggrann blickspårning genom användning av djupinlärning och neuronnät, samt att personalisera dem till att passa den enskilda individen. Den resulterande prototypen är väldigt oprecis i dess uppskattning av blickriktningen, dock kan justeringen av ett fåtal nyckelkomponenter, som initialiseringsvikterna till det neurala nätverket leda till bättre resultat.
|
49 |
Genomskinlig touchsensor för pålitlig styrning av RGB-lysdioder / Transparent touch sensor for reliable control of RGB LEDsCalderon, Olle January 2017 (has links)
Many electronic products of today utilize some form of touch technology. Looking at everything from smartphone screens to ticket vending machines, it is obvious that the number of applications is big and the demand is huge. Touch technologies generally require no force to use, which reduces mechanical wear-and-tear and thus increases their lifespan. In this thesis, a touch system was constructed to control RGB LEDs. The sensor surface was made from a white, semi-clear plastic, through which the LEDs’ light should be visible. Since the plastic both needed to transmit visible light and act as a touch surface, a problem arose: how do you construct a transparent touch sensor that can control RGB LEDs in a reliable way? Firstly, this thesis describes and discusses many of the different available touch technologies and their strengths and weaknesses. From this information, a specific sensor technology was chosen, from which a prototype of the transparent touch sensor was built. The sensor prototype was a capacitive sensor, made from a thin metallic mesh, placed on the back of the plastic surface. Using an embedded system, based on a differential capacitance touch IC and a microcontroller, the capacitance of the sensor was measured and converted into signals which controlled the LEDs. In order to ensure the sensor’s reliability, the environmental factors which affected the sensor had to be determined and handled. To do this, measurements were performed on the sensor to see how its capacitance changed with environmental changes. It was concluded that moisture, temperature and frequency had negligible effect on the sensor’s dielectric properties. However, it was discovered that proximity to ground greatly affected the sensor and that the sensor was significantly dependent on its enclosure and grounding. / Många av de elektronikprodukter som produceras idag använder någon form av touchteknik. Då den används i allt från skärmar på smartphones till biljettautomater är det tydligt att användningsområdena är många och att efterfrågan är stor. Touchtekniker kräver i regel ingen kraft för att användas, vilket minskar mekaniskt slitage och därför ökar dess livslängd. I detta arbete skulle en touchstyrning till en uppsättning RGB-lysdioder byggas. Problemet var att sensorytan skulle vara en vit, halvgenomskinlig plast, genom vilken lysdioderna skulle lysa. Eftersom plasten både skulle släppa igenom ljus och agera touchyta uppstod problemet: hur konstruerar man en genomskinlig touchsensor som kan styra RGBlysdioder på ett pålitligt sätt? Denna rapport inleds med att beskriva och diskutera många av de touchtekniker som finns idag samt vilka föroch nackdelar de har. Utifrån denna information valdes en specifik sensorteknik, varifrån en prototyp på den genomskinliga touchsensorn byggdes. Sensorprototypen var en kapacitiv sensor uppbyggd av ett tunt metallnät placerat bakom plastpanelen. Med ett inbyggt system, bestående av en integrerad touchkrets för differentiell kapacitansmätning och en mikrokontroller, mättes sensorns kapacitans och en styrning till lysdioderna implementerades. För att säkerställa sensorns pålitlighet var det viktigt att analysera vilka miljöfaktorer som påverkade sensorn och hur de kunde hanteras. Mätningar utfördes därför på sensorn för att se hur dess kapacitans förändrades med avseende på dessa. Det kunde konstateras att fukt, temperatur och frekvens hade försumbar påverkan på sensorns dielektrum. Däremot kunde det visas att närhet till jordplan påverkade sensorn avsevärt och att sensorns tillförlitlighet berodde signifikant på dess inkapsling och jordning.
|
50 |
En jämförelse av inlärningsbaserade lösningar för mänsklig positionsuppskattning i 3D / A comparison of learning-based solutions for 3D human pose estimationLange, Alfons, Lindfors, Erik January 2019 (has links)
Inom områden som idrottsvetenskap och underhållning kan det finnas behov av att analysera en människas kroppsposition i 3D. Dessa behov kan innefatta att analysera en golfsving eller att möjliggöra mänsklig interaktion med spel. För att tillförlitligt uppskatta kroppspositioner krävs det idag specialiserad hårdvara som ofta är dyr och svårtillgänglig. På senare tid har det även tillkommit inlärningsbaserade lösningar som kan utföra samma uppskattning på vanliga bilder. Syftet med arbetet har varit att identifiera och jämföra populära inlärningsbaserade lösningar samt undersöka om någon av dessa presterar i paritet med en etablerad hårdvarubaserad lösning. För detta har testverktyg utvecklats, positionsuppskattningar genomförts och resul- tatdata för samtliga tester analyserats. Resultatet har visat att lösningarna inte pre- sterar likvärdigt med Kinect och att de i nuläget inte är tillräckligt välutvecklade för att användas som substitut för specialiserad hårdvara. / In fields such as sports science and entertainment, there’s occasionally a need to an- alyze a person's body pose in 3D. These needs may include analyzing a golf swing or enabling human interaction with games. Today, in order to reliably perform a human pose estimation, specialized hardware is usually required, which is often expensive and difficult to access. In recent years, multiple learning-based solutions have been developed that can perform the same kind of estimation on ordinary images. The purpose of this report has been to identify and compare popular learning-based so- lutions and to investigate whether any of these perform on par with an established hardware-based solution. To accomplish this, tools for testing have been developed, pose estimations have been conducted and result data for each test have been ana- lyzed. The result has shown that the solutions do not perform on par with Kinect and that they are currently not sufficiently well-developed to be used as a substitute for specialized hardware.
|
Page generated in 0.0538 seconds