Global ETD Search

141	6-DOF lokalizace objektů v průmyslových aplikacích / 6-DOF Object Localization in Industrial Applications Macurová, Nela January 2021 (has links) The aim of this work is to design a method for the object localization in the point could and as accurately as possible estimates the 6D pose of known objects in the industrial scene for bin picking. The design of the solution is inspired by the PoseCNN network. The solution also includes a scene simulator that generates artificial data. The simulator is used to generate a training data set containing 2 objects for training a convolutional neural network. The network is tested on annotated real scenes and achieves low success, only 23.8 % and 31.6 % success for estimating translation and rotation for one type of obejct and for another 12.4 % and 21.6 %, while the tolerance for correct estimation is 5 mm and 15°. However, by using the ICP algorithm on the estimated results, the success of the translation estimate is 81.5 % and the rotation is 51.8 % and for the second object 51.9 % and 48.7 %. The benefit of this work is the creation of a generator and testing the functionality of the network on small objects
142	Automated Gait Analysis : Using Deep Metric Learning Engström, Isak January 2021 (has links) Sectors of security, safety, and defence require methods for identifying people on the individual level. Automation of these tasks has the potential of outperforming manual labor, as well as relieving workloads. The ever-extending surveillance camera networks, advances in human pose estimation from monocular cameras, together with the progress of deep learning techniques, pave the way for automated walking gait analysis as an identification method. This thesis investigates the use of 2D kinematic pose sequences to represent gait, monocularly extracted from a limited dataset containing walking individuals captured from five camera views. The sequential information of the gait is captured using recurrent neural networks. Techniques in deep metric learning are applied to evaluate two network models, with contrasting output dimensionalities, against deep-metric-, and non-deep-metric-based embedding spaces. The results indicate that the gait representation, network designs, and network learning structure show promise when identifying individuals, scaling particularly well to unseen individuals. However, with the limited dataset, the network models performed best when the dataset included the labels from both the individuals and the camera views simultaneously, contrary to when the data only contained the labels from the individuals without the information of the camera views. For further investigations, an extension of the data would be required to evaluate the accuracy and effectiveness of these methods, for the re-identification task of each individual. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> machine learning deep learning deep metric learning artificial neural network recurrent neural network computer vision gait pose estimation Media and Communication Technology Medieteknik
143	Simutaneous real-time object recognition and pose estimation for artificial systems operating in dynamic environments Van Wyk, Frans-Pieter January 2013 (has links) Recent advances in technology have increased awareness of the necessity for automated systems in people’s everyday lives. Artificial systems are more frequently being introduced into environments previously thought to be too perilous for humans to operate in. Some robots can be used to extract potentially hazardous materials from sites inaccessible to humans, while others are being developed to aid humans with laborious tasks. A crucial aspect of all artificial systems is the manner in which they interact with their immediate surroundings. Developing such a deceivingly simply aspect has proven to be significantly challenging, as it not only entails the methods through which the system perceives its environment, but also its ability to perform critical tasks. These undertakings often involve the coordination of numerous subsystems, each performing its own complex duty. To complicate matters further, it is nowadays becoming increasingly important for these artificial systems to be able to perform their tasks in real-time. The task of object recognition is typically described as the process of retrieving the object in a database that is most similar to an unknown, or query, object. Pose estimation, on the other hand, involves estimating the position and orientation of an object in three-dimensional space, as seen from an observer’s viewpoint. These two tasks are regarded as vital to many computer vision techniques and regularly serve as input to more complex perception algorithms. An approach is presented which regards the object recognition and pose estimation procedures as mutually dependent. The core idea is that dissimilar objects might appear similar when observed from certain viewpoints. A feature-based conceptualisation, which makes use of a database, is implemented and used to perform simultaneous object recognition and pose estimation. The design incorporates data compression techniques, originally suggested by the image-processing community, to facilitate fast processing of large databases. System performance is quantified primarily on object recognition, pose estimation and execution time characteristics. These aspects are investigated under ideal conditions by exploiting three-dimensional models of relevant objects. The performance of the system is also analysed for practical scenarios by acquiring input data from a structured light implementation, which resembles that obtained from many commercial range scanners. Practical experiments indicate that the system was capable of performing simultaneous object recognition and pose estimation in approximately 230 ms once a novel object has been sensed. An average object recognition accuracy of approximately 73% was achieved. The pose estimation results were reasonable but prompted further research. The results are comparable to what has been achieved using other suggested approaches such as Viewpoint Feature Histograms and Spin Images. / Dissertation (MEng)--University of Pretoria, 2013. / gm2014 / Electrical, Electronic and Computer Engineering / unrestricted Object recognition Pose estimation Real-time Partial object matching 3D features Free form deformation Data compression Locality sensitive hashing Structured light Intelligent systems UCTD
144	Stéréotomie et vision artificielle pour la construction robotisée de structures maçonnées complexes / Stereotomy and computer vision for robotic construction of complex masonry structures Loing, Vianney 22 January 2019 (has links) Ce travail de thèse s'inscrit dans le contexte du développement de la robotique dans la construction. On s’intéresse ici à la construction robotisée de structures maçonnées complexes en ayant recours à de la vision artificielle. La construction sans cintre étant un enjeu important en ce qui concerne la productivité sur un chantier et la quantité de déchets produits, nous explorons, à cet effet, les possibilités qu'offre la rigidité en flexion inhérente aux maçonneries topologiquement autobloquantes. La génération de ces dernières, classique dans le cas plan, est généralisée ici à la conception de structures courbes, à partir de maillages de quadrangles plans et de manière paramétrique, grâce aux logiciels Rhinoceros 3D / Grasshopper. Pour cela, nous proposons un ensemble d'inégalités à respecter afin que la structure obtenue soit effectivement topologiquement autobloquante. Ces inégalités permettent, par ailleurs, d'introduire un résultat nouveau ; à savoir qu'il est possible d'avoir un assemblage de blocs dans lequel chacun des blocs est topologiquement bloqué en translation, mais un sous-ensemble — constitué de plusieurs de ces blocs — ne l'est pas. Un prototype de maçonnerie à topologie autobloquante est finalement conçu. Sa conception repose sur une découpe des joints d'inclinaison variable qui permet de le construire sans cintre. En parallèle, nous abordons des aspects de vision artificielle robuste pour un environnement chantier, environnement complexe dans lequel les capteurs peuvent subir des chocs, être salis ou déplacés accidentellement. Le problème est d'estimer la position relative d'un bloc de maçonnerie par rapport à un bras robot, à partir de simples caméras 2D ne nécessitant pas d'étape de calibration. Notre approche repose sur l'utilisation de réseaux de neurones convolutifs de classification, entraînés à partir de centaines de milliers d'images synthétiques de l’ensemble bras robot + bloc, présentant des variations aléatoires en terme de dimensions et positions du bloc, textures, éclairage, etc, et ce afin que le robot puisse apprendre à repérer le bloc sans trop de biais d’environnement. La génération de ces images est réalisée grâce à Unreal Engine 4. Cette méthode permet la localisation du bloc par rapport au robot avec une précision millimétrique, sans utiliser une seule image réelle pour la phase d'apprentissage ; ce qui constitue un avantage certain puisque l'acquisition de données représentatives pour l'apprentissage est un processus long et fastidieux. Nous avons également construit une base de données riche, constituée d’environ 12000 images réelles contenant un robot et un bloc précisément localisés, permettant d’évaluer quantitativement notre approche et de la rendre comparable aux approches alternatives. Un démonstrateur réel intégrant un bras ABB IRB 120, des blocs parallélépipédiques et trois webcams a été mis en place pour démontrer la faisabilité de la méthode / The context of this thesis work is the development of robotics in the construction industry. We explore the robotic construction of complex masonry structures with the help of computer vision. Construction without the use of formwork is an important issue in relation to both productivity on a construction site and the amount of waste generated. To this end, we study topological interlocking masonries and the possibilities they present. The design of this kind of masonry is standard for planar structures. We generalize it to the design of curved structures in a parametrical way, using PQ meshes and the softwares Rhinoceros 3D and Grasshopper. To achieve this, we introduce a set of inequalities to respect in order to have a topological interlocked structure. These inequalities allow us to present a new result. Namely, it is possible to have an assembly of blocks in which each block is interlocked in translation, while having a subset — composed of several of these blocks — that is not interlocked. We also present a prototype of topological interlocking masonry. Its design is based on variable inclination joints, allowing construction without formwork. In parallel, we are studying robust computer vision for unstructured environments like construction sites, in which sensors are vulnerable to dust or could be accidentally jostled. The goal is to estimate the relative pose (position + orientation) of a masonry block with respect to a robot, using only cheap cameras without the need for calibration. Our approach relies on a classification Convolutional Neural Network trained using hundreds of thousands of synthetically rendered scenes with a robot and a block, and randomized parameters such as block dimensions and poses, light, textures, etc, so that the robot can learn to locate the block without being influenced by the environment. The generation of these images is performed with Unreal Engine 4. This method allows us to estimate a block pose very accurately, with only millimetric errors, without using a single real image for training. This is a strong advantage since acquiring representative training data is a long and expensive process. We also built a new rich dataset of real robot images (about 12,000 images) with accurately localized blocks so that we can evaluate our approach and compare it to alternative approaches. A real demonstrator, including a ABB IRB 120 robot, cuboid blocks and three webcams was set up to prove the feasibility of the method Structure topologiquement autobloquante Vision artificielle Construction robotisée Estimation de pose relative Construction sans cintre Stéréotomie Topological interlocking structures Computer vision Robotic construction Relative pose estimation Building without formwork Stereotomy
145	Towards Color-Based Two-Hand 3D Global Pose Estimation Lin, Fanqing 14 June 2022 (has links) Pose estimation and tracking is essential for applications involving human controls. Specifically, as the primary operating tool for human activities, hand pose estimation plays a significant role in applications such as hand tracking, gesture recognition, human-computer interaction and VR/AR. As the field develops, there has been a trend to utilize deep learning to estimate the 2D/3D hand poses using color-based information without depth data. Within the depth-based as well as color-based approaches, the research community has primarily focused on single-hand scenarios in a localized/normalized coordinate system. Due to the fact that both hands are utilized in most applications, we propose to push the frontier by addressing two-hand pose estimation in the global coordinate system using only color information. Our first chapter introduces the first system capable of estimating global 3D joint locations for both hands via only monocular RGB input images. To enable training and evaluation of the learning-based models, we propose to introduce a large-scale synthetic 3D hand pose dataset Ego3DHands. As knowledge in synthetic data cannot be directly applied to the real-world domain, a natural two-hand pose dataset is necessary for real-world applications. To this end, we present a large-scale RGB-based egocentric hand dataset Ego2Hands in two chapters. In chapter 2, we address the task of two-hand segmentation/detection using images in the wild. In chapter 3, we focus on the task of two-hand 2D/3D pose estimation using real-world data. In addition to research in hand pose estimation, chapter 4 includes our work on interactive refinement that generalizes the backpropagating refinement technique for dense prediction models. computer vision deep learning 2D 3D two-hand hand pose estimation synthetic real-world segmentation detection interactive refinement Physical Sciences and Mathematics
146	Human pose estimation in low-resolution images / Estimering av mänskliga poser i lågupplösta bilder Nilsson, Hugo January 2022 (has links) This project explores the understudied, yet important, case of human pose estimation in low-resolution images. This is done in the use-case of images with football players of known scale in the image. Human pose estimation can mainly be done in two different ways, the bottom-up method and the top-down method. This project explores the bottom-up method, which first finds body keypoints and then groups them to get the person, or persons, within the image. This method is generally faster and has been shown to have an advantage when there is occlusion or crowded scenes, but suffers from false positive errors. Low-resolution makes human pose estimation harder, due to the decreased information that can be extracted. Furthermore, the output heatmap risks becoming too small to correctly locate the keypoints. However, low-resolution human pose estimation is needed in many cases, if the camera has a low-resolution sensor or the person occupies a small portion of the image. Several neural networks are evaluated and, in conclusion, there are multiple ways to improve the current state of the art network HigherHRNet for lower resolution human pose estimation. Maintaining large feature maps through the network turns out to be crucial for low-resolution images and can be achieved by modifying the feature extractor in HigherHRNet. Furthermore, as the resolution decreases, the need for sub-pixel accuracy grows. To improve this, various heatmap encoding-decoding methods are investigated, and by using unbiased data processing, both heatmap encoding-decoding and coordinate system transformation can be improved. / Detta projekt utforskar det understuderade, men ändå viktiga, fallet med uppskattning av mänskliga poser i lågupplösta bilder. Detta görs i användningsområdet av bilder med fotbollsspelare av en förutbestämd storlek i bilden. Mänskliga poseuppskattningar kan huvudsakligen göras på två olika sätt, nedifrån-och-upp- metoden och uppifrån-och-ned-metoden. Detta projekt utforskar nedifrån-och- upp-metoden, som först hittar kroppsdelar och sedan grupperar dem för att få fram personen, eller personerna, i bilden. Denna metod är generellt sett snabbare och har visat sig vara fördelaktig i scenarion med ocklusion eller mycket folk, men lider av falska positiva felaktigheter. Låg upplösning gör uppskattning av mänskliga poser svårare, på grund av den minskade informationen som kan extraheras. Dessutom riskerar färgdiagramet att bli för liten för att korrekt lokalisera kroppsdelarna. Ändå behövs uppskattning av lågupplöst mänskliga poser i många fall, exempelvis om kameran har en lågupplöst sensor eller om personen upptar en liten del av bilden. Flera neurala nätverk utvärderas och sammanfattningsvis finns flera sätt att förbättra det nuvarande toppklassade nätverket HigherHRNet för uppskattning av mänskliga poser med lägre upplösning. Att bibehålla stora särdragskartor genom nätverket visar sig vara avgörande för lågupplösta bilder och kan uppnås genom att modifiera särdragsextraktorn i HigherHRNet. Dessutom, när upplösningen minskar, ökar behovet av subpixel-noggrannhet. För att förbättra detta undersöktes olika färgdiagram-kodning-avkodningsmetoder, och genom att använda opartisk databehandling kan både färgdiagram-kodning-avkodning och koordinatsystemtransformationen förbättras. Human Pose Estimation Low-Resolution Images Heatmap Decoding Computer Vision Deep Learning Mänsklig Poseuppskattning Lågupplösta Bilder Färgdiagram Avkodning Datorseende Djupinlärning Computer and Information Sciences Data- och informationsvetenskap
147	An Autonomous Intelligent Robotic Wheelchair to Assist People in Need: Standing-up, Turning-around and Sitting-down Papadakis Ktistakis, Iosif January 2018 (has links) No description available. Computer Engineering Computer Science Mechanical Engineering Robotics Assistive Robotics Wheelchairs Human Machine Interaction Speech Recognition Pose Estimation Active Participation System Integration Prototype Robotic Arm Decision Making
148	Unsupervised 3D Human Pose Estimation / Oövervakad mänsklig poseuppskattning i 3D Budaraju, Sri Datta January 2021 (has links) The thesis proposes an unsupervised representation learning method to predict 3D human pose from a 2D skeleton via a VAEGAN (Variational Autoencoder Generative Adversarial Network) hybrid network. The method learns to lift poses from 2D to 3D using selfsupervision and adversarial learning techniques. The method does not use images, heatmaps, 3D pose annotations, paired/unpaired 2Dto3D skeletons, 3D priors, synthetic 2D skeletons, multiview or temporal information in any shape or form. The 2D skeleton input is taken by a VAE that encodes it in a latent space and then decodes that latent representation to a 3D pose. The 3D pose is then reprojected to 2D for a constrained, selfsupervised optimization using the input 2D pose. Parallelly, the 3D pose is also randomly rotated and reprojected to 2D to generate a ’novel’ 2D view for unconstrained adversarial optimization using a discriminator network. The combination of the optimizations of the original and the novel 2D views of the predicted 3D pose results in a ’realistic’ 3D pose generation. The thesis shows that the encoding and decoding process of the VAE addresses the major challenge of erroneous and incomplete skeletons from 2D detection networks as inputs and that the variance of the VAE can be altered to get various plausible 3D poses for a given 2D input. Additionally, the latent representation could be used for crossmodal training and many downstream applications. The results on Human3.6M datasets outperform previous unsupervised approaches with less model complexity while addressing more hurdles in scaling the task to the real world. / Uppsatsen föreslår en oövervakad metod för representationslärande för att förutsäga en 3Dpose från ett 2D skelett med hjälp av ett VAE GAN (Variationellt Autoenkodande Generativt Adversariellt Nätverk) hybrid neuralt nätverk. Metoden lär sig att utvidga poser från 2D till 3D genom att använda självövervakning och adversariella inlärningstekniker. Metoden använder sig vare sig av bilder, värmekartor, 3D poseannotationer, parade/oparade 2D till 3D skelett, a priori information i 3D, syntetiska 2Dskelett, flera vyer, eller tidsinformation. 2Dskelettindata tas från ett VAE som kodar det i en latent rymd och sedan avkodar den latenta representationen till en 3Dpose. 3D posen är sedan återprojicerad till 2D för att genomgå begränsad, självövervakad optimering med hjälp av den tvådimensionella posen. Parallellt roteras dessutom 3Dposen slumpmässigt och återprojiceras till 2D för att generera en ny 2D vy för obegränsad adversariell optimering med hjälp av ett diskriminatornätverk. Kombinationen av optimeringarna av den ursprungliga och den nya 2Dvyn av den förutsagda 3Dposen resulterar i en realistisk 3Dposegenerering. Resultaten i uppsatsen visar att kodningsoch avkodningsprocessen av VAE adresserar utmaningen med felaktiga och ofullständiga skelett från 2D detekteringsnätverk som indata och att variansen av VAE kan modifieras för att få flera troliga 3D poser för givna 2D indata. Dessutom kan den latenta representationen användas för crossmodal träning och flera nedströmsapplikationer. Resultaten på datamängder från Human3.6M är bättre än tidigare oövervakade metoder med mindre modellkomplexitet samtidigt som de adresserar flera hinder för att skala upp uppgiften till verkliga tillämpningar. Computer Vision Projective Geometry Deep Learning Unsupervised Learning 3D Human Pose Estimation GAN AutoEncoder Hybrid Generative Model Self Supervision Computer and Information Sciences Data- och informationsvetenskap
149	En jämförelse av inlärningsbaserade lösningar för mänsklig positionsuppskattning i 3D / A comparison of learning-based solutions for 3D human pose estimation Lange, Alfons, Lindfors, Erik January 2019 (has links) Inom områden som idrottsvetenskap och underhållning kan det finnas behov av att analysera en människas kroppsposition i 3D. Dessa behov kan innefatta att analysera en golfsving eller att möjliggöra mänsklig interaktion med spel. För att tillförlitligt uppskatta kroppspositioner krävs det idag specialiserad hårdvara som ofta är dyr och svårtillgänglig. På senare tid har det även tillkommit inlärningsbaserade lösningar som kan utföra samma uppskattning på vanliga bilder. Syftet med arbetet har varit att identifiera och jämföra populära inlärningsbaserade lösningar samt undersöka om någon av dessa presterar i paritet med en etablerad hårdvarubaserad lösning. För detta har testverktyg utvecklats, positionsuppskattningar genomförts och resul- tatdata för samtliga tester analyserats. Resultatet har visat att lösningarna inte pre- sterar likvärdigt med Kinect och att de i nuläget inte är tillräckligt välutvecklade för att användas som substitut för specialiserad hårdvara. / In fields such as sports science and entertainment, there’s occasionally a need to an- alyze a person's body pose in 3D. These needs may include analyzing a golf swing or enabling human interaction with games. Today, in order to reliably perform a human pose estimation, specialized hardware is usually required, which is often expensive and difficult to access. In recent years, multiple learning-based solutions have been developed that can perform the same kind of estimation on ordinary images. The purpose of this report has been to identify and compare popular learning-based so- lutions and to investigate whether any of these perform on par with an established hardware-based solution. To accomplish this, tools for testing have been developed, pose estimations have been conducted and result data for each test have been ana- lyzed. The result has shown that the solutions do not perform on par with Kinect and that they are currently not sufficiently well-developed to be used as a substitute for specialized hardware. pose estimation body pose 3D RGB image Microsoft Kinect positionsuppskattning kroppsposition 3D RGB-bild Microsoft Kinect Other Computer and Information Science Annan data- och informationsvetenskap
150	Conformal Tracking For Virtual Environments Davis, Larry Dennis, Jr. 01 January 2004 (has links) A virtual environment is a set of surroundings that appears to exist to a user through sensory stimuli provided by a computer. By virtual environment, we mean to include environments supporting the full range from VR to pure reality. A necessity for virtual environments is knowledge of the location of objects in the environment. This is referred to as the tracking problem, which points to the need for accurate and precise tracking in virtual environments. Marker-based tracking is a technique which employs fiduciary marks to determine the pose of a tracked object. A collection of markers arranged in a rigid configuration is called a tracking probe. The performance of marker-based tracking systems depends upon the fidelity of the pose estimates provided by tracking probes. The realization that tracking performance is linked to probe performance necessitates investigation into the design of tracking probes for proponents of marker-based tracking. The challenges involved with probe design include prediction of the accuracy and precision of a tracking probe, the creation of arbitrarily-shaped tracking probes, and the assessment of the newly created probes. To address these issues, we present a pioneer framework for designing conformal tracking probes. Conformal in this work means to adapt to the shape of the tracked objects and to the environmental constraints. As part of the framework, the accuracy in position and orientation of a given probe may be predicted given the system noise. The framework is a methodology for designing tracking probes based upon performance goals and environmental constraints. After presenting the conformal tracking framework, the elements used for completing the steps of the framework are discussed. We start with the application of optimization methods for determining the probe geometry. Two overall methods for mapping markers on tracking probes are presented, the Intermediary Algorithm and the Viewpoints Algorithm. Next, we examine the method used for pose estimation and present a mathematical model of error propagation used for predicting probe performance in pose estimation. The model uses a first-order error propagation, perturbing the simulated marker locations with Gaussian noise. The marker locations with error are then traced through the pose estimation process and the effects of the noise are analyzed. Moreover, the effects of changing the probe size or the number of markers are discussed. Finally, the conformal tracking framework is validated experimentally. The assessment methods are divided into simulation and post-fabrication methods. Under simulation, we discuss testing of the performance of each probe design. Then, post-fabrication assessment is performed, including accuracy measurements in orientation and position. The framework is validated with four tracking probes. The first probe is a six-marker planar probe. The predicted accuracy of the probe was 0.06 deg and the measured accuracy was 0.083 plus/minus 0.015 deg. The second probe was a pair of concentric, planar tracking probes mounted together. The smaller probe had a predicted accuracy of 0.206 deg and a measured accuracy of 0.282 plus/minus 0.03 deg. The larger probe had a predicted accuracy of 0.039 deg and a measured accuracy of 0.017 plus/minus 0.02 deg. The third tracking probe was a semi-spherical head tracking probe. The predicted accuracy in orientation and position was 0.54 plus/minus 0.24 deg and 0.24 plus/minus 0.1 mm, respectively. The experimental accuracy in orientation and position was 0.60 plus/minus 0.03 deg and 0.225 plus/minus 0.05 mm, respectively. The last probe was an integrated, head-mounted display probe, created using the conformal design process. The predicted accuracy of this probe was 0.032 plus/minus 0.02 degrees in orientation and 0.14 plus/minus 0.08 mm in position. The measured accuracy of the probe was 0.028 plus/minus 0.01 degrees in orientation and 0.11 plus/minus 0.01 mm in position. These results constitute an order of magnitude improvement over current marker-based tracking probes in orientation, indicating the benefits of a conformal tracking approach. Also, this result translates to a predicted positional overlay error of a virtual object presented at 1m of less than 0.5 mm, which is well above reported overlay performance in virtual environments. Augmented reality Conformal tracking Marker based tracking Mixed reality Pose error Pose estimation Tracking Virtual environments Virtual reality Electrical and Computer Engineering Engineering

Search results