Global ETD Search

281	New object grasp synthesis with gripper selection: process development Legrand, Tanguy January 2022 (has links) A fundamental aspect to consider in factories is the transportation of the items at differentsteps in the production process. Conveyor belts do a great to bring items from point A topoint B but to load the item onto a working station it can demands a more precise and,in some cases, delicate approach. Nowadays this part is mostly handled by robotic arms.The issue encountered is that a robot arm extremity, its gripper, cannot directly instinctivelyknow how to grip an object. It is usually up to a technician to configure how andwhere the gripper goes to grip an item.The goal of this thesis is to analyse a problem given by a company which is to find a wayto automate the grasp pose synthesis of a new object with the adapted gripper.This automatized process can be separated into two sub-problems.First, how to choose the adapted gripper for a new object.Second, how to find a grasp pose on the object, with the previously chosen gripper.In the problem given by the company, the computer-aided design (CAD) 3D model of theconcerned object is given. Also, the grasp shall always be done vertically, i.e., the grippercomes vertically to the object and the gripper does not rotate on the x and y axis. Thegripper for a new object is selected between two kinds of grippers: two-finger paralleljawgripper and three-finger parallel-jaw gripper. No dataset of objects is provided.Object grasping is a well researched subject, especially for 2 finger grippers. However,few research is done for the 3 finger grippers grasp pose synthesis, or for gripper comparison,which are key part of the studied problem.To answer the sub-problems mentioned above, machine learning will be used for the gripperselection and a grasp synthesis method will be used for the grasp pose finding. However,due to the lack of gripper comparison in the related work, a new approach needsto be created, which will be inspired by the findings in the literature about grasp posesynthesis in general.This approach will consist of two parts.First, for each gripper and each object combination are generated some grasp poses, eachassociated with a corresponding score. The scores are used to have an idea of the bestgripper for an object, the best score for each gripper indicating how good a grasp couldbe on the object with said gripper.Secondly, the objects with their associated best score for each gripper will be used astraining data for a machine learning algorithm that will assist in the choice of the gripper.This approach leads to two research questions:“How to generate grasps of satisfying quality for an object with a certain gripper?”“Is it possible to determine the best gripper for a new object via machine learning ?”The first question is answered by using mathematical operations on the point cloud representationof the objects, and a cost function (that will be used to attribute a score), whileithe second question is answered using machine learning classification and regression togain insight on how machine learning can learn to associate object proprieties to gripperefficiency.The found results show that the grasp generation with the chosen cost function givesgrasp poses that are similar to the grasp poses a human operator would choose, but themachine learning models seem unable to assess grasp quality, either with regression orclassification. grasps pose synthesis point cloud classification point cloud regression machine learning 3D model CAD model robotic grasping pick and place gripper selection Computer Sciences Datavetenskap (datalogi)
282	Dense Foot Pose Estimation From Images Sharif, Sharif January 2023 (has links) There is ongoing research into building dense correspondence between digital images of objects in the world and estimating the 3D pose of these objects. This is a difficult area to conduct research due to the lack of availability of annotated data. Annotating each pixel is too time-consuming. At the time of this writing, current research has managed to use neural networks to establish a dense pose estimation of human body parts (feet, chest, legs etc.). The aim of this thesis is to investigate if a model can be developed using neural networks to perform dense pose estimation on human feet. The data used in evaluating the model is generated using proprietary tools. Since this thesis is using a custom model and custom dataset, one model will be developed and tested with various experiments to gain an understanding of the different parameters that influence the model’s performance. Experiments showed that a model based on DeepLabV3 is able to achieve a dense pose estimation of feet with a mean error of 1.0cm. The limiting factor for a model’s ability to estimate a dense pose is based on the model’s ability to classify the pixels in an image accurately. It was also shown that discontinuous UV unwrapping greatly reduced the model’s dense pose estimation ability. The results from this thesis should be considered preliminary and need to be repeated multiple times to account for the stochastic nature of training neural networks. / Pågående forskning undersöker hur man kan skapa tät korrespondens mellan digitala bilder av objekt i världen och uppskatta de objektens 3D-pose. Detta är ett svårt område att forska inom på grund av bristen på tillgänglig annoterad data. Att annotera varje pixel är tidskrävande. Vid tiden för detta skrivande har aktuell forskning lyckats använda neurala nätverk för att etablera en tät pose-estimering av mänskliga kroppsdelar (fötter, bröst, ben osv.). Syftet med denna arbete är att undersöka om en modell kan utvecklas med hjälp av neurala nätverk för att utföra dense pose-estimering av mänskliga fötter. Data som används för att utvärdera modellen genereras med hjälp av proprietära verktyg. Eftersom denna arbete använder en anpassad modell och anpassad dataset kommer en modell att utvecklas och testas med olika experiment för att förstå de olika parametrarna som påverkar modellens prestanda. Experiment visade att en modell baserad på DeepLabV3 kan uppnå en dense pose-estimering av fötter med en medelfel på 1,0 cm. Den begränsande faktorn för en modells förmåga att uppskatta en dense pose baseras på modellens förmåga att klassificera pixlarna i en bild korrekt. Det visades också att oregelbunden UV-uppackning avsevärt minskade modellens förmåga att estimera dense pose. Resultaten från denna avhandling bör betraktas som preliminära och behöver upprepas flera gånger för att ta hänsyn till den stokastiska naturen hos träning av neurala nätverk. Dense Foot Pose Estimation Computer vision Deep Learning Dense Correspondence Uppskattning Av Tät Fotställning Datorseende Djupinlärning Tät Korrespondens Computer and Information Sciences Data- och informationsvetenskap
283	Deep Visual Inertial-Aided Feature Extraction Network for Visual Odometry : Deep Neural Network training scheme to fuse visual and inertial information for feature extraction / Deep Visual Inertial-stöttat Funktionsextraktionsnätverk för Visuell Odometri : Träningsalgoritm för djupa Neurala Nätverk som sammanför visuell- och tröghetsinformation för särdragsextraktion Serra, Franco January 2022 (has links) Feature extraction is an essential part of the Visual Odometry problem. In recent years, with the rise of Neural Networks, the problem has shifted from a more classical to a deep learning approach. This thesis presents a fine-tuned feature extraction network trained on pose estimation as a proxy task. The architecture aims at integrating inertial information coming from IMU sensor data in the deep local feature extraction paradigm. Specifically, visual features and inertial features are extracted using Neural Networks. These features are then fused together and further processed to regress the pose of a moving agent. The visual feature extraction network is effectively fine-tuned and is used stand-alone for inference. The approach is validated via a qualitative analysis on the keypoints extracted and also in a more quantitative way. Quantitatively, the feature extraction network is used to perform Visual Odometry on the Kitti dataset where the ATE for various sequences is reported. As a comparison, the proposed method, the proposed without IMU and the original pre-trained feature extraction network are used to extract features for the Visual Odometry task. Their ATE results and relative trajectories show that in sequences with great change in orientation the proposed system outperforms the original one, while on mostly straight sequences the original system performs slightly better. / Feature extraktion är en viktig del av visuell odometri (VO). Under de senaste åren har framväxten av neurala nätverk gjort att tillvägagångsättet skiftat från klassiska metoder till Deep Learning metoder. Denna rapport presenterar ett kalibrerat feature extraheringsnätverk som är tränat med posesuppskattning som en proxyuppgift. Arkitekturen syftar till att integrera tröghetsinformation som kommer från sensordata i feature extraheringsnätverket. Specifikt extraheras visuella features och tröghetsfeatures med hjälp av neurala nätverk. Dessa features slås ihop och bearbetas ytterligare för att estimera position och riktning av en rörlig kamera. Metoden har undersökts genom en kvalitativ analys av featurepunkternas läge men även på ett mer kvantitativt sätt där VO-estimering på olika bildsekvenser från KITTI-datasetet har jämförts. Resultaten visar att i sekvenser med stora riktningsförändringar överträffar det föreslagna systemet det ursprungliga, medan originalsystemet presterar något bättre på sekvenser som är mestadels raka. Feature extraction network Visual Odometry IMU Neural Network Pose estimation Feature extraction Visuell Odometri IMU Neuralt nätverk Poseuppskattning Computer Sciences Datavetenskap (datalogi)
284	Real-Time Visual Multi-Target Tracking in Realistic Tracking Environments White, Jacob Harley 01 May 2019 (has links) This thesis focuses on visual multiple-target tracking (MTT) from a UAV. Typical state-of-the-art multiple-target trackers rely on an object detector as the primary detection source. However, object detectors usually require a GPU to process images in real-time, which may not be feasible to carry on-board a UAV. Additionally, they often do not produce consistent detections for small objects typical of UAV imagery.In our method, we instead detect motion to identify objects of interest in the scene. We detect motion at corners in the image using optical flow. We also track points long-term to continue tracking stopped objects. Since our motion detection algorithm generates multiple detections at each time-step, we use a hybrid probabilistic data association filter combined with a single iteration of expectation maximization to improve tracking accuracy.We also present a motion detection algorithm that accounts for parallax in non-planar UAV imagery. We use the essential matrix to distinguish between true object motion and apparent object motion due to parallax. Instead of calculating the essential matrix directly, which can be time-consuming, we design a new algorithm that optimizes the rotation and translation between frames. This new algorithm requires only 4 ms instead of 47 ms per frame of the video sequence.We demonstrate the performance of these algorithms on video data. These algorithms are shown to improve tracking accuracy, reliability, and speed. All these contributions are capable of running in real-time without a GPU. unmanned aerial vehicle multiple target tracking motion detection stationary object tracking homography probabistic data association relative pose estimation essential matrix parallax Electrical and Computer Engineering Engineering
285	Using pose estimation to support video annotation for linguistic use : Semi-automatic tooling to aid researchers / Användning av poseuppskattning för att stödja videoannoteringsprocessen inom lingvistik : Halvautomatiska verktyg för att underlätta för forskare Gerholm, Gustav January 2022 (has links) Video annotating is a lengthy manual process. A previous research project, MINT, produced a few thousand videos of child-parent interactions in a controlled environment in order to study children’s language development. These videos were filmed across multiple sessions, tracking the same children from the age of 3 months to 7 years. In order to study the gathered material, all these videos have to be annotated with multiple kinds of annotations including transcriptions, gaze of the children, physical distances between parent and child, etc. These annotations are currently far from complete, which is why this project aimed to be a stepping point for the development of semi-automatic tooling in order to aid the process. To do this, state-of-the-art pose estimators were used to process hundreds of videos, creating pseudo-anonymized pose estimations. The pose estimations were then used in order to gauge the distance between the child and parent, and annotate the corresponding frame of the videos. Everything was packaged as a CLI tool. The results of first applying the CLI and then correcting the automatic annotations manually (compared to manually annotating everything) showed a large decrease in overall time taken to complete the annotating of videos. The tool lends itself to further development for more advanced annotations since both the tool and its related libraries are open source. / Videoannotering är en lång manuell process. Ett tidigare forskningsprojekt, MINT, producerade några tusen videor av barn-förälder-interaktioner i en kontrollerad miljö för att studera barns språkutveckling. Dessa videor filmades under flera sessioner och spårade samma barn från 3 månaders ålder till 7 år. För att studera det insamlade materialet måste alla dessa videor annoteras med flera olika typer av taggar inklusive transkriptioner, barnens blick, fysiska avstånd mellan förälder och barn, m.m. Denna annoteringsprocess är för närvarande långt ifrån avslutad, vilket är anledningen till detta projekt syftade till att vara ett första steg för utvecklingen av halvautomatiska verktyg för att underlätta processen. Detta projekt syftade till att semi-automatiskt annotera om ett barn och en förälder, i varje videobild, var inom räckhåll eller utom räckhåll för varandra. För att göra detta användes toppmoderna pose-estimators för att bearbeta hundratals videor, vilket skapade pseudoanonymiserade poseuppskattningar. Poseuppskattningarna användes sedan för att gissa avståndet mellan barnet och föräldern och annotera resultat i motsvarande bildruta för videorna. Allt paketerades som ett CLI-verktyg. Resultaten av att först tillämpa CLI-verktyget och sedan korrigera de automatiska annoteringarna manuellt (jämfört med manuellt annotering av allt) visade en stor minskning av den totala tiden det tog att slutföra annoteringen av videor. Framför allt lämpar sig verktyget för vidareutveckling för mer avancerade taggar eftersom både verktyget och dess relaterade bibliotek är öppen källkod. MINT project pose estimation video annotating HyperPose 3D-distance approximation MINT-projektet poseuppskattning videoannotering HyperPose uppskattning av 3d-sträcka Computer Sciences Datavetenskap (datalogi)
286	Feasibility of Mobile Phone-Based 2D Human Pose Estimation for Golf : An analysis of the golf swing focusing on selected joint angles / Lämpligheten av mobiltelefonbaserad 2D mänskligposeuppskattning i golf : En analys av golfsvingar medfokus på utvalda ledvinklar Perini, Elisa January 2023 (has links) Golf is a sport where the correct technical execution is important for performance and injury prevention. The existing feedback systems are often cumbersome and not readily available to recreational players. To address this issue, this thesis explores the potential of using 2D Human Pose Estimation as a mobile phone-based swing analysis tool. The developed system allows to identify three events in the swing movement (toe-up, top and impact) and to measure specific angles during these events by using an algorithmic approach. The system focuses on quantifying the knee flexion and primary spine angle during the address, and lateral bending at the top of the swing. By using only the wrist coordinates in the vertical direction, the developed system identified 37% of investigated events, independently of whether the swing was filmed in the frontal of sagittal frame. Within five frames, 95% of the events were correctly identified. Using additional joint coordinates and the event data obtained by the above-mentioned event identification algorithm, the knee flexion at address was correctly assessed in 66% of the cases, with a mean absolute error of 3.7°. The mean absolute error of the primary spine angle measurement at address was of 10.5°. The lateral bending angle was correctly identified in 87% ofthe videos. This system highlights the potential of using 2D Human Pose Estimation for swing analysis. This thesis primarily focused on exploring the feasibility of the approach and further research is needed to expand the system and improve its accuracy. This work serves as a foundation, providing valuable insights for future advancements in the field of 2D Human Pose Estimation-based swing analysis. / Golf är en sport där korrekt tekniskt utförande är avgörande för prestation och skadeförebyggelse. Feedbacksystem som finns är ofta besvärliga och inte lättillgängliga för fritidsspelare. För att åtgärda detta problem undersöker detta examensarbete potentialen att använda 2D mänsklig poseuppskattning som mobiltelefonsbaserat svinganalysverktyg. Det utvecklade systemet gör det möjligt att identifiera tre händelser i svingen (toe-up, top och impact) och att mäta specifika vinklar under dessa händelser genom en algoritmisk metod. Systemet fokuserar på att kvantifiera knäböjningen och primära ryggradsvinkeln under uppställningen, och laterala böjningen vid svingtoppen. Genom att endast använda handledskoordinater i vertikalriktning identifierade det utvecklade systemet 37% av de undersökta händelserna oavsett om svingen filmades från frontal- eller medianplanet. Inom fem bildrutor identifierades 95% av händelserna korrekt. Genom att använda ytterligare ledkoordinater och händelsedata som erhållits genom den tidigare nämnda algoritmen för händelseidentifiering, bedömdes knäböjningen vid uppställningen vara korrekt i 66% av fallen med en medelabsolutfel på 3.7°. Medelabsolutfelet för mätningen av primär ryggradsvinkel vid uppställningen var 10.5°. Laterala böjningen identifierades korrekt i 87% av tillfällena. Detta system belyser potentialen i 2D mänsklig poseuppskattning för svinganalys. Detta examensarbete fokuserade främst på att utforska tillvägagångssättets genomförbarhet och ytterligare forskning behövs för att utveckla systemet och förbättra dess noggrannhet. Detta arbete är grundläggande och ger värdefulla insikter för framtida forskning inom området för svinganalys baserad på 2D mänsklig poseuppskattning. Golf Human Pose Estimation Sports Analytics Computer Vision Golf Mänsklig Poseuppskattning Sportanalys Datorseende Sport and Fitness Sciences Idrottsvetenskap Computer Systems Datorsystem Medical Image Processing Medicinsk bildbehandling
287	Deep Image Processing with Spatial Adaptation and Boosted Efficiency & Supervision for Accurate Human Keypoint Detection and Movement Dynamics Tracking Chao Yang Dai (14709547) 31 May 2023 (has links) <p>This thesis aims to design and develop the spatial adaptation approach through spatial transformers to improve the accuracy of human keypoint recognition models. We have studied different model types and design choices to gain an accuracy increase over models without spatial transformers and analyzed how spatial transformers increase the accuracy of predictions. A neural network called Widenet has been leveraged as a specialized network for providing the parameters for the spatial transformer. Further, we have evaluated methods to reduce the model parameters, as well as the strategy to enhance the learning supervision for further improving the performance of the model. Our experiments and results have shown that the proposed deep learning framework can effectively detect the human key points, compared with the baseline methods. Also, we have reduced the model size without significantly impacting the performance, and the enhanced supervision has improved the performance. This study is expected to greatly advance the deep learning of human key points and movement dynamics. </p> Computer vision Deep learning computer vision method Artifical intelligence HUMAN POSE ESTIMATION human keypoint estimation Deep Learning (DL) spatial transformers Machine Learning (ML)
288	Compact Representations and Multi-cue Integration for Robotics Söderberg, Robert January 2005 (has links) This thesis presents methods useful in a bin picking application, such as detection and representation of local features, pose estimation and multi-cue integration. The scene tensor is a representation of multiple line or edge segments and was first introduced by Nordberg in [30]. A method for estimating scene tensors from gray-scale images is presented. The method is based on orientation tensors, where the scene tensor can be estimated by correlations of the elements in the orientation tensor with a number of 1D filters. Mechanisms for analyzing the scene tensor are described and an algorithm for detecting interest points and estimating feature parameters is presented. It is shown that the algorithm works on a wide spectrum of images with good result. Representations that are invariant with respect to a set of transformations are useful in many applications, such as pose estimation, tracking and wide baseline stereo. The scene tensor itself is not invariant and three different methods for implementing an invariant representation based on the scene tensor is presented. One is based on a non-linear transformation of the scene tensor and is invariant to perspective transformations. Two versions of a tensor doublet is presented, which is based on a geometry of two interest points and is invariant to translation, rotation and scaling. The tensor doublet is used in a framework for view centered pose estimation of 3D objects. It is shown that the pose estimation algorithm has good performance even though the object is occluded and has a different scale compared to the training situation. An industrial implementation of a bin picking application have to cope with several different types of objects. All pose estimation algorithms use some kind of model and there is yet no model that can cope with all kinds of situations and objects. This thesis presents a method for integrating cues from several pose estimation algorithms for increasing the system stability. It is also shown that the same framework can also be used for increasing the accuracy of the system by using cues from several views of the object. An extensive test with several different objects, lighting conditions and backgrounds shows that multi-cue integration makes the system more robust and increases the accuracy. Finally, a system for bin picking is presented, built from the previous parts of this thesis. An eye in hand setup is used with a standard industrial robot arm. It is shown that the system works for real bin-picking situations with a positioning error below 1 mm and an orientation error below 1o degree for most of the different situations. / <p>Report code: LiU-TEK-LIC-2005:15.</p> Electrical engineering bin picking application pose estimation multi-cue integration Elektroteknik Annan elektroteknik och elektronik
289	Monocular 3D Human Pose Estimation / Monokulär 3D-människans hållningsuppskattning Rey, Robert January 2023 (has links) The focus of this work is the task of 3D human pose estimation, more specifically by making use of key points located in single monocular images in order to estimate the location of human body joints in a 3D space. It was done in association with Tracab, a company based in Stockholm, who specialises in advanced sports tracking and analytics solutions. Tracab’s core product is their optical tracking system for football, which involves installing multiple highspeed cameras around the sports venue. One of the main benefits of this work will be to reduce the number of cameras required to create the 3D skeletons of the players, hence reducing production costs as well as making the whole process of creating the 3D skeletons much simpler in the future. The main problem we are tackling consists in going from a set of 2D joint locations and lifting them to a 3D space, which would add an information of depth to the joint locations. One problem with this task is the limited availability of in-thewild datasets with corresponding 3D ground truth labels. We hope to tackle this issue by making use of the restricted Human3.6m dataset along with the Tracab dataset in order to achieve adequate results. Since the Tracab dataset is very large, i.e millions of unique poses and skeletons, we have focused our experiments on a single football game. Although extensive research has been done in the field by using architectures such as convolutional neural networks, transformers, spatial-temporal architectures and more, we are tackling this issue by making use of a simple feedforward neural network developed by Martinez et al, this is mainly possible due to the abundance of data available at Tracab. / Fokus för detta arbete är att estimera 3D kroppspositioner, genom att använda detekterade punkter på människokroppen i enskilda monokulära bilder för att uppskatta 3D positionen av dessa ledpunkter. Detta arbete genomfördes i samarbete med Tracab, ett företag baserat i Stockholm, som specialiserar sig på avancerade lösningar för följning och analys inom idrott. Tracabs huvudprodukt är deras optiska följningssystem, som innebär att flera synkroniserade höghastighetskameror installeras runt arenan. En av de främsta fördelarna med detta arbete kommer att vara att minska antalet kameror som krävs för att skapa 3D-skelett av spelarna, vilket minskar produktionskostnaderna och förenklar hela processen för att skapa 3D-skelett i framtiden. Huvudproblemet vi angriper är att gå från en uppsättning 2D-ledpunkter och lyfta dem till 3D-utrymme. Ett problem är den begränsade tillgången till datamängder med 3D ground truth från realistiska miljöer. Vi angriper detta problem genom att använda den begränsade Human3.6m-datasetet tillsammans med Tracab-datasetet för att uppnå tillräckliga resultat. Eftersom Tracab-datamängden är mycket stor, med miljontals unika poser och skelett, .har vi begränsat våra experiment till en fotbollsmatch. Omfattande forskning har gjorts inom området med användning av arkitekturer som konvolutionella neurala nätverk, transformerare, rumsligttemporala arkitekturer med mera. Här använder vi ett enkelt framåtriktat neuralt nätverk utvecklat av Martinez et al, vilket är möjligt tack vare den stora mängden data som är tillgänglig hos Tracab. 3D Human Pose Estimation Monocular Images Deep Learning Artificial Neural Networks 3D Människokroppspositionsuppskattning Monokulära bilder Djupinlärning Konstgjorda neurala nätverk Computer and Information Sciences Data- och informationsvetenskap
290	Spherically-actuated platform manipulator with passive prismatic joints Nyzen, Ronald A. January 2002 (has links) No description available. Engineering, Mechanical 2-SPU Platform Robot Manipulator Robot Pose Kinematics Iterative Newton-Raphson Method Cartesian Controller Closed-Form solution Kinematic Solution

Search results