Global ETD Search

41	Multi-site Organ Detection in CT Images using Deep Learning / Regionsoberoende organdetektion i CT-bilder meddjupinlärning Jacobzon, Gustaf January 2020 (has links) When optimizing a controlled dose in radiotherapy, high resolution spatial information about healthy organs in close proximity to the malignant cells are necessary in order to mitigate dispersion into these organs-at-risk. This information can be provided by deep volumetric segmentation networks, such as 3D U-Net. However, due to limitations of memory in modern graphical processing units, it is not feasible to train a volumetric segmentation network on full image volumes and subsampling the volume gives a too coarse segmentation. An alternative is to sample a region of interest from the image volume and train an organ-specific network. This approach requires knowledge of which region in the image volume that should be sampled and can be provided by a 3D object detection network. Typically the detection network will also be region specific, although a larger region such as the thorax region, and requires human assistance in choosing the appropriate network for a certain region in the body. Instead, we propose a multi-site object detection network based onYOLOv3 trained on 43 different organs, which may operate on arbitrary chosen axial patches in the body. Our model identifies the organs present (whole or truncated) in the image volume and may automatically sample a region from the input and feed to the appropriate volumetric segmentation network. We train our model on four small (as low as 20 images) site-specific datasets in a weakly-supervised manner in order to handle the partially unlabeled nature of site-specific datasets. Our model is able to generate organ-specific regions of interests that enclose 92% of the organs present in the test set. / Vid optimering av en kontrollerad dos inom strålbehandling krävs det information om friska organ, så kallade riskorgan, i närheten av de maligna cellerna för att minimera strålningen i dessa organ. Denna information kan tillhandahållas av djupa volymetriskta segmenteringsnätverk, till exempel 3D U-Net. Begränsningar i minnesstorleken hos moderna grafikkort gör att det inte är möjligt att träna ett volymetriskt segmenteringsnätverk på hela bildvolymen utan att först nedsampla volymen. Detta leder dock till en lågupplöst segmentering av organen som inte är tillräckligt precis för att kunna användas vid optimeringen. Ett alternativ är att endast behandla en intresseregion som innesluter ett eller ett fåtal organ från bildvolymen och träna ett regionspecifikt nätverk på denna mindre volym. Detta tillvägagångssätt kräver dock information om vilket område i bildvolymen som ska skickas till det regionspecifika segmenteringsnätverket. Denna information kan tillhandahållas av ett 3Dobjektdetekteringsnätverk. I regel är även detta nätverk regionsspecifikt, till exempel thorax-regionen, och kräver mänsklig assistans för att välja rätt nätverk för en viss region i kroppen. Vi föreslår istället ett multiregions-detekteringsnätverk baserat påYOLOv3 som kan detektera 43 olika organ och fungerar på godtyckligt valda axiella fönster i kroppen. Vår modell identifierar närvarande organ (hela eller trunkerade) i bilden och kan automatiskt ge information om vilken region som ska behandlas av varje regionsspecifikt segmenteringsnätverk. Vi tränar vår modell på fyra små (så lågt som 20 bilder) platsspecifika datamängder med svag övervakning för att hantera den delvis icke-annoterade egenskapen hos datamängderna. Vår modell genererar en organ-specifik intresseregion för 92 % av organen som finns i testmängden. Organ Detection Organs-at-risk 3D Object Detection Segmentation Deep Learning Machine Learning Weakly-supervised Learning YOLOv3 3D U-Net Elektroteknik och elektronik
42	Data Augmentation for Safe 3D Object Detection for Autonomous Volvo Construction Vehicles Zhao, Xun January 2021 (has links) Point cloud data can express the 3D features of objects, and is an important data type in the field of 3D object detection. Since point cloud data is more difficult to collect than image data and the scale of existing datasets is smaller, point cloud data augmentation is introduced to allow more features to be discovered on existing data. In this thesis, we propose a novel method to enhance the point cloud scene, based on the generative adversarial network (GAN) to realize the augmentation of the objects and then integrate them into the existing scenes. A good fidelity and coverage are achieved between the fake sample and the real sample, with JSD equal to 0.027, MMD equal to 0.00064, and coverage equal to 0.376. In addition, we investigated the functional data annotation tools and completed the data labeling task. The 3D object detection task is carried out on the point cloud data, and we have achieved a relatively good detection results in a short processing of around 22ms. Quantitative and qualitative analysis is carried out on different models. / Punktmolndata kan uttrycka 3D-egenskaperna hos objekt och är en viktig datatyp inom området för 3D-objektdetektering. Eftersom punktmolndata är svarare att samla in än bilddata och omfattningen av befintlig data är mindre, introduceras punktmolndataförstärkning för att tillåta att fler funktioner kan upptäckas på befintlig data. I det här dokumentet föreslår vi en metod för att förbättra punktmolnsscenen, baserad på det generativa motstridiga nätverket (GAN) för att realisera förstärkningen av objekten och sedan integrera dem i de befintliga scenerna. En god trohet och tackning uppnås mellan det falska provet och det verkliga provet, med JSD lika med 0,027, MMD lika med 0,00064 och täckning lika med 0,376. Dessutom undersökte vi de funktionella verktygen för dataanteckningar och slutförde uppgiften for datamärkning. 3D- objektdetekteringsuppgiften utförs på punktmolnsdata och vi har uppnått ett relativt bra detekteringsresultat på en kort bearbetningstid runt 22ms. Kvantitativ och kvalitativ analys utförs på olika modeller. Point Cloud Data Augmentation Data Annotation 3D Object Detection Generative Adversarial Network Computer Vision. Elektroteknik och elektronik
43	Meta-Pseudo Labelled Multi-View 3D Shape Recognition / Meta-pseudomärking med Bilder från Flera Kameravinklar för 3D Objektigenkänning Uçkun, Fehmi Ayberk January 2023 (has links) The field of computer vision has long pursued the challenge of understanding the three-dimensional world. This endeavour is further fuelled by the increasing demand for technologies that rely on accurate perception of the 3D environment such as autonomous driving and augmented reality. However, the labelled data scarcity in the 3D domain continues to be a hindrance to extensive research and development. Semi-Supervised Learning is a valuable tool to overcome data scarcity yet most of the state-of-art methods are primarily developed and tested for two-dimensional vision problems. To address this challenge, there is a need to explore innovative approaches that can bridge the gap between 2D and 3D domains. In this work, we propose a technique that both leverages the existing abundance of two-dimensional data and makes the state-of-art semi-supervised learning methods directly applicable to 3D tasks. Multi-View Meta Pseudo Labelling (MV-MPL) combines one of the best-performing architectures in 3D shape recognition, Multi-View Convolutional Neural Networks, together with the state-of-art semi-supervised method, Meta Pseudo Labelling. To evaluate the performance of MV-MPL, comprehensive experiments are conducted on widely used shape recognition benchmarks ModelNet40, ShapeNetCore-v1, and ShapeNetCore-v2, as well as, Objaverse-LVIS. The results demonstrate that MV-MPL achieves competitive accuracy compared to fully supervised models, even when only \(10%\) of the labels are available. Furthermore, the study reveals that the object descriptors extracted from the MV-MPL model exhibit strong performance on shape retrieval tasks, indicating the effectiveness of the approach beyond classification objectives. Further analysis includes the evaluation of MV-MPL under more restrained scenarios, the enhancements to the view aggregation and pseudo-labelling processes; and the exploration of the potential of employing multi-views as augmentations for semi-supervised learning. / Forskningsområdet för datorseende har länge strävat efter utmaningen att förstå den tredimensionella världen. Denna strävan drivs ytterligare av den ökande efterfrågan på teknologier som är beroende av en korrekt uppfattning av den tredimensionella miljön, såsom autonom körning och förstärkt verklighet. Dock fortsätter bristen på märkt data inom det tredimensionella området att vara ett hinder för omfattande forskning och utveckling. Halv-vägledd lärning (semi-supervised learning) framträder som ett värdefullt verktyg för att övervinna bristen på data, ändå är de flesta av de mest avancerade semisupervised-metoderna primärt utvecklade och testade för tvådimensionella problem inom datorseende. För att möta denna utmaning krävs det att utforska innovativa tillvägagångssätt som kan överbrygga klyftan mellan 2D- och 3D-domänerna. I detta arbete föreslår vi en teknik som både utnyttjar den befintliga överflöd av tvådimensionella data och gör det möjligt att direkt tillämpa de mest avancerade semisupervised-lärandemetoderna på 3D-uppgifter. Multi-View Meta Pseudo Labelling (MV-MPL) kombinerar en av de bästa arkitekturerna för 3D-formigenkänning, Multi-View Convolutional Neural Networks, tillsammans med den mest avancerade semisupervised-metoden, Meta Pseudo Labelling. För att utvärdera prestandan hos MV-MPL genomförs omfattande experiment på väl använda uvärderingar för formigenkänning., ModelNet40, ShapeNetCore-v1 och ShapeNetCore-v2. Resultaten visar att MV-MPL uppnår konkurrenskraftig noggrannhet jämfört med helt vägledda modeller, även när endast \(10%\) av etiketterna är tillgängliga. Dessutom visar studien att objektbeskrivningarna som extraherats från MV-MPL-modellen uppvisar en stark prestanda i formåterhämtningsuppgifter, vilket indikerar effektiviteten hos tillvägagångssättet bortom klassificeringsmål. Vidare analys inkluderar utvärderingen av MV-MPL under mer begränsade scenarier, förbättringar av vyaggregerings- och pseudomärkningsprocesserna samt utforskning av potentialen att använda bilder från flera vinklar som en metod att få mer data för halv-vägledd lärande. 3D shape recognition 3D object classification 3D shape retrieval 3D object retrieval Automatic labelling Semi-supervised learning Pseudo labelling Meta Pseudo Labelling Multi-View Convolutional Neural Networks Shape descriptors Multi-view representations Deeplearning 3D-formigenkänning 3D-objektklassificering 3D-formhämtning Hämtning av 3D-objekt Automatisk märkning Halv-vägledd lärning Pseudomärkning Meta Pseudo-märkning Multi-View Faltningsnät Formbeskrivningar Multi-view representation Djupinlärning Computer Sciences Datavetenskap (datalogi)
44	Inexact graph matching : application to 2D and 3D Pattern Recognition / Appariement inexact de graphes : application à la reconnaissance de formes 2D et 3D Madi, Kamel 13 December 2016 (has links) Les Graphes sont des structures mathématiques puissantes constituant un outil de modélisation universel utilisé dans différents domaines de l'informatique, notamment dans le domaine de la reconnaissance de formes. L'appariement de graphes est l'opération principale dans le processus de la reconnaissance de formes à base de graphes. Dans ce contexte, trouver des solutions d'appariement de graphes, garantissant l'optimalité en termes de précision et de temps de calcul est un problème de recherche difficile et d'actualité. Dans cette thèse, nous nous intéressons à la résolution de ce problème dans deux domaines : la reconnaissance de formes 2D et 3D. Premièrement, nous considérons le problème d'appariement de graphes géométriques et ses applications sur la reconnaissance de formes 2D. Dance cette première partie, la reconnaissance des Kites (structures archéologiques) est l'application principale considérée. Nous proposons un "framework" complet basé sur les graphes pour la reconnaissance des Kites dans des images satellites. Dans ce contexte, nous proposons deux contributions. La première est la proposition d'un processus automatique d'extraction et de transformation de Kites a partir d'images réelles en graphes et un processus de génération aléatoire de graphes de Kites synthétiques. En utilisant ces deux processus, nous avons généré un benchmark de graphes de Kites (réels et synthétiques) structuré en 3 niveaux de bruit. La deuxième contribution de cette première partie, est la proposition d'un nouvel algorithme d'appariement pour les graphes géométriques et par conséquent pour les Kites. L'approche proposée combine les invariants de graphes au calcul de l'édition de distance géométrique. Deuxièmement, nous considérons le problème de reconnaissance des formes 3D ou nous nous intéressons à la reconnaissance d'objets déformables représentés par des graphes c.à.d. des tessellations de triangles. Nous proposons une décomposition des tessellations de triangles en un ensemble de sous structures que nous appelons triangle-étoiles. En se basant sur cette décomposition, nous proposons un nouvel algorithme d'appariement de graphes pour mesurer la distance entre les tessellations de triangles. L'algorithme proposé assure un nombre minimum de structures disjointes, offre une meilleure mesure de similarité en couvrant un voisinage plus large et utilise un ensemble de descripteurs qui sont invariants ou au moins tolérants aux déformations les plus courantes. Finalement, nous proposons une approche plus générale de l'appariement de graphes. Cette approche est fondée sur une nouvelle formalisation basée sur le problème de mariage stable. L'approche proposée est optimale en terme de temps d'exécution, c.à.d. la complexité est quadratique O(n2), et flexible en terme d'applicabilité (2D et 3D). Cette approche se base sur une décomposition en sous structures suivie par un appariement de ces structures en utilisant l'algorithme de mariage stable. L'analyse de la complexité des algorithmes proposés et l'ensemble des expérimentations menées sur les bases de graphes des Kites (réelle et synthétique) et d'autres bases de données standards (2D et 3D) attestent l'efficacité, la haute performance et la précision des approches proposées et montrent qu'elles sont extensibles et générales / Graphs are powerful mathematical modeling tools used in various fields of computer science, in particular, in Pattern Recognition. Graph matching is the main operation in Pattern Recognition using graph-based approach. Finding solutions to the problem of graph matching that ensure optimality in terms of accuracy and time complexity is a difficult research challenge and a topical issue. In this thesis, we investigate the resolution of this problem in two fields: 2D and 3D Pattern Recognition. Firstly, we address the problem of geometric graphs matching and its applications on 2D Pattern Recognition. Kite (archaeological structures) recognition in satellite images is the main application considered in this first part. We present a complete graph based framework for Kite recognition on satellite images. We propose mainly two contributions. The first one is an automatic process transforming Kites from real images into graphs and a process of generating randomly synthetic Kite graphs. This allowing to construct a benchmark of Kite graphs (real and synthetic) structured in different level of deformations. The second contribution in this part, is the proposition of a new graph similarity measure adapted to geometric graphs and consequently for Kite graphs. The proposed approach combines graph invariants with a geometric graph edit distance computation. Secondly, we address the problem of deformable 3D objects recognition, represented by graphs, i.e., triangular tessellations. We propose a new decomposition of triangular tessellations into a set of substructures that we call triangle-stars. Based on this new decomposition, we propose a new algorithm of graph matching to measure the distance between triangular tessellations. The proposed algorithm offers a better measure by assuring a minimum number of triangle-stars covering a larger neighbourhood, and uses a set of descriptors which are invariant or at least oblivious under most common deformations. Finally, we propose a more general graph matching approach founded on a new formalization based on the stable marriage problem. The proposed approach is optimal in term of execution time, i.e. the time complexity is quadratic O(n2) and flexible in term of applicability (2D and 3D). The analyze of the time complexity of the proposed algorithms and the extensive experiments conducted on Kite graph data sets (real and synthetic) and standard data sets (2D and 3D) attest the effectiveness, the high performance and accuracy of the proposed approaches and show that the proposed approaches are extensible and quite general Appariement de graphes Distance d’édition de graphes Décomposition de graphes Modélisation à base de graphes Reconnaissance de formes Reconnaissance d’objets 3D Reconnaissance d’objets déformables Kites Graph matching Graph edit distance Graph decomposition Graph based modelling Pattern recognition 3D object recognition Deformable object recognition Kites 006.4
45	Efficient Feature Extraction for Shape Analysis, Object Detection and Tracking Solis Montero, Andres January 2016 (has links) During the course of this thesis, two scenarios are considered. In the first one, we contribute to feature extraction algorithms. In the second one, we use features to improve object detection solutions and localization. The two scenarios give rise to into four thesis sub-goals. First, we present a new shape skeleton pruning algorithm based on contour approximation and the integer medial axis. The algorithm effectively removes unwanted branches, conserves the connectivity of the skeleton and respects the topological properties of the shape. The algorithm is robust to significant boundary noise and to rigid shape transformations. It is fast and easy to implement. While shape-based solutions via boundary and skeleton analysis are viable solutions to object detection, keypoint features are important for textured object detection. Therefore, we present a keypoint featurebased planar object detection framework for vision-based localization. We demonstrate that our framework is robust against illumination changes, perspective distortion, motion blur, and occlusions. We increase robustness of the localization scheme in cluttered environments and decrease false detection of targets. We present an off-line target evaluation strategy and a scheme to improve pose. Third, we extend planar object detection to a real-time approach for 3D object detection using a mobile and uncalibrated camera. We develop our algorithm based on two novel naive Bayes classifiers for viewpoint and feature matching that improve performance and decrease memory usage. Our algorithm exploits the specific structure of various binary descriptors in order to boost feature matching by conserving descriptor properties. Our novel naive classifiers require a database with a small memory footprint because we only store efficiently encoded features. We improve the feature-indexing scheme to speed up the matching process creating a highly efficient database for objects. Finally, we present a model-free long-term tracking algorithm based on the Kernelized Correlation Filter. The proposed solution improves the correlation tracker based on precision, success, accuracy and robustness while increasing frame rates. We integrate adjustable Gaussian window and sparse features for robust scale estimation creating a better separation of the target and the background. Furthermore, we include fast descriptors and Fourier spectrum packed format to boost performance while decreasing the memory footprint. We compare our algorithm with state-of-the-art techniques to validate the results. Computer Vision Image Processing Skeleton pruning Object Detection Model-free Tracking Long-Term Tracking Planar Detection Ferns Classifiers Naive Bayes VOT Challenge Medial Axis 3D Object Binary Raster Shapes Binary descriptors fast HOG
46	Recalage hétérogène pour la reconstruction 3D de scènes sous-marines / Heterogeneous Registration for 3D Reconstruction of Underwater Scene Mahiddine, Amine 30 June 2015 (has links) Le relevé et la reconstruction 3D de scènes sous-marine deviennent chaque jour plus incontournable devant notre intérêt grandissant pour l’étude des fonds sous-marins. La majorité des travaux existants dans ce domaine sont fondés sur l’utilisation de capteurs acoustiques l’image n’étant souvent qu’illustrative.L’objectif de cette thèse consiste à développer des techniques permettant la fusion de données hétérogènes issues d’un système photogrammétrique et d’un système acoustique.Les travaux présentés dans ce mémoire sont organisés en trois parties. La première est consacrée au traitement des données 2D afin d’améliorer les couleurs des images sous-marines pour augmenter la répétabilité des descripteurs en chaque point 2D. Puis, nous proposons un système de visualisation de scène en 2D sous forme de mosaïque.Dans la deuxième partie, une méthode de reconstruction 3D à partir d’un ensemble non ordonné de plusieurs images a été proposée. Les données 3D ainsi calculées seront fusionnées avec les données provenant du système acoustique dans le but de reconstituer le site sous-marin.Dans la dernière partie de ce travail de thèse, nous proposons une méthode de recalage 3D originale qui se distingue par la nature du descripteur extrait en chaque point. Le descripteur que nous proposons est invariant aux transformations isométriques (rotation, transformation) et permet de s’affranchir du problème de la multi-résolution. Nous validons à l’aide d’une étude effectuée sur des données synthétiques et réelles où nous montrons les limites des méthodes de recalages existantes dans la littérature. Au final, nous proposons une application de notre méthode à la reconnaissance d’objets 3D. / The survey and the 3D reconstruction of underwater become indispensable for our growing interest in the study of the seabed. Most of the existing works in this area are based on the use of acoustic sensors image.The objective of this thesis is to develop techniques for the fusion of heterogeneous data from a photogrammetric system and an acoustic system.The presented work is organized in three parts. The first is devoted to the processing of 2D data to improve the colors of the underwater images, in order to increase the repeatability of the feature descriptors. Then, we propose a system for creating mosaics, in order to visualize the scene.In the second part, a 3D reconstruction method from an unordered set of several images was proposed. The calculated 3D data will be merged with data from the acoustic system in order to reconstruct the underwater scene.In the last part of this thesis, we propose an original method of 3D registration in terms of the nature of the descriptor extracted at each point. The descriptor that we propose is invariant to isometric transformations (rotation, transformation) and addresses the problem of multi-resolution. We validate our approach with a study on synthetic and real data, where we show the limits of the existing methods of registration in the literature. Finally, we propose an application of our method to the recognition of 3D objects. Traitement d'images sous-Marine Amélioration des couleurs Mosaïque Reconstruction 3D Triangulation Stéréovision Recalage 3D Reconnaissance de formes 3D Underwater Image processing Couleurs Enhancement Mosaicing Multiview 3D Reconstruction Triangulation Stereo-Vision 3D registration 3D Object Recognition. 004
47	3D model vybraného objektu / 3D model of the selected object Raclavský, David January 2020 (has links) The result of the diploma thesis is a photogrammetrically evaluated georeferenced 3D model of an object with its environment, located in the AdMaS complex. The work describes in detail all phases of creating a 3D model of the object from the selection and calibration of the camera to editing the 3D model. Discuss about software and methods for evaluating 3D models. The thesis deals with the optimal setting of ContectCapture software. The accuracy of the resulting 3D model is tested by the methodology according to ČSN 013410 on the basis of control measurements.
48	3D Object Detection based on Unsupervised Depth Estimation Manoharan, Shanmugapriyan 25 January 2022 (has links) Estimating depth and detection of object instances in 3D space is fundamental in autonomous navigation, localization, and mapping, robotic object manipulation, and augmented reality. RGB-D images and LiDAR point clouds are the most illustrative formats of depth information. However, depth sensors offer many shortcomings, such as low effective spatial resolutions and capturing of a scene from a single perspective. The thesis focuses on reproducing denser and comprehensive 3D scene structure for given monocular RGB images using depth and 3D object detection. The first contribution of this thesis is the pipeline for the depth estimation based on an unsupervised learning framework. This thesis proposes two architectures to analyze structure from motion and 3D geometric constraint methods. The proposed architectures trained and evaluated using only RGB images and no ground truth depth data. The architecture proposed in this thesis achieved better results than the state-of-the-art methods. The second contribution of this thesis is the application of the estimated depth map, which includes two algorithms: point cloud generation and collision avoidance. The predicted depth map and RGB image are used to generate the point cloud data using the proposed point cloud algorithm. The collision avoidance algorithm predicts the possibility of collision and provides the collision warning message based on decoding the color in the estimated depth map. This algorithm design is adaptable to different color map with slight changes and perceives collision information in the sequence of frames. Our third contribution is a two-stage pipeline to detect the 3D objects from a monocular image. The first stage pipeline used to detect the 2D objects and crop the patch of the image and the same provided as the input to the second stage. In the second stage, the 3D regression network train to estimate the 3D bounding boxes to the target objects. There are two architectures proposed for this 3D regression network model. This approach achieves better average precision than state-of-theart for truncation of 15% or fully visible objects and lowers but comparable results for truncation more than 30% or partly/fully occluded objects. info:eu-repo/classification/ddc/000 ddc:000
49	3D Object Manipulation in Volumetric Video Production / 3D-objektmanipulation i volymetrisk videoproduktion Wang, Xinyi January 2023 (has links) Remote communication methods have been changing these years and becoming even more important due to the global pandemic. Holographic communication, often represented by volumetric video, is one of the emerging communication methods. Unfortunately, there are few researches on combining 3D objects with volumetric videos. Based on a review of 3D object manipulation methods, two input modalities including laptop trackpad and mobile touchscreen are selected to combine with the volumetric video production. This study aims to explore human factors in volumetric video production and determines the differences between different 3D object input modalities in volumetric video production. A prototype of a volumetric video production tool is built and refined. NASA-TLX, SUS sub-scale, and semi-structured interview are performed in a pilot study and the main study, in order to measure the perceived workload and learnability of the prototype. Analysis of the subjective data demonstrates that there are no significant differences between those two input modalities. Several implications for the design and research gap of combining 3D object manipulation with volumetric video production have been brought up based on the results of this study. / Metoder för fjärrkommunikation har förändrats under dessa år och blivit ännu viktigare på grund av den globala pandemin. Holografisk kommunikation, ofta representerad av volymetrisk video, är en av de framväxande kommunikationsmetoderna. Tyvärr finns det få undersökningar om att kombinera 3D-objekt med volymetriska videor. Baserat på en genomgång av 3D-objektmanipulationsmetoder, väljs två inmatningsmodaliteter inklusive styrplatta för bärbar dator och mobil pekskärm för att kombineras med den volymetriska videoproduktionen. Denna studie syftar till att utforska mänskliga faktorer i volymetrisk videoproduktion och fastställa skillnaderna mellan olika 3D-objektinmatningsmodaliteter i volymetrisk videoproduktion. En prototyp av ett volymetriskt videoproduktionsverktyg byggs och förfinas. NASA-TLX, SUS sub-scale och semi-strukturerad intervju utförs i en pilotstudie och huvudstudien, för att mäta den upplevda arbetsbelastningen och inlärbarheten av prototypen. Analys av subjektiva data visar att det inte finns några signifikanta skillnader mellan dessa två inmatningsmodaliteter. Flera implikationer för design- och forskningsgapet av att kombinera 3D-objektmanipulation med volymetrisk videoproduktion har tagits upp baserat på resultaten av denna studie. Volumetric Video Production Holographic Communication Human-Computer Interaction Volumetrisk videoproduktion holografisk kommunikation människadatorinteraktion Computer and Information Sciences Data- och informationsvetenskap
50	Evaluation of probabilistic representations for modeling and understanding shape based on synthetic and real sensory data / Utvärdering av probabilistiska representationer för modellering och förståelse av form baserat på syntetisk och verklig sensordata Zarzar Gandler, Gabriela January 2017 (has links) The advancements in robotic perception in the recent years have empowered robots to better execute tasks in various environments. The perception of objects in the robot work space significantly relies on how sensory data is represented. In this context, 3D models of object’s surfaces have been studied as a means to provide useful insights on shape of objects and ultimately enhance robotic perception. This involves several challenges, because sensory data generally presents artifacts, such as noise and incompleteness. To tackle this problem, we employ Gaussian Process Implicit Surface (GPIS), a non-parametric probabilistic reconstruction of object’s surfaces from 3D data points. This thesis investigates different configurations for GPIS, as a means to tackle the extraction of shape information. In our approach we interpret an object’s surface as the level-set of an underlying sparse Gaussian Process (GP) with variational formulation. Results show that the variational formulation for sparse GP enables a reliable approximation to the full GP solution. Experiments are performed on a synthetic and a real sensory data set. We evaluate results by assessing how close the reconstructed surfaces are to the ground-truth correspondences, and how well objects from different categories are clustered based on the obtained representation. Finally we conclude that the proposed solution derives adequate surface representations to reason about object shape and to discriminate objects based on shape information. / Framsteg inom robotperception de senaste åren har resulterat i robotar som är bättre på attutföra uppgifter i olika miljöer. Perception av objekt i robotens arbetsmiljö är beroende avhur sensorisk data representeras. I det här sammanhanget har 3D-modeller av objektytorstuderats för att ge användbar insikt om objektens form och i slutändan bättre robotperception. Detta innebär flera utmaningar, eftersom sensoriska data ofta innehåller artefakter, såsom brus och brist på data. För att hantera detta problem använder vi oss av Gaussian Process Implicit Surface (GPIS), som är en icke-parametrisk probabilistisk rekonstruktion av ett objekts yta utifrån 3D-punkter. Detta examensarbete undersöker olika konfigurationer av GPIS för att på detta sätt kunna extrahera forminformation. I vår metod tolkar vi ett objekts yta som nivåkurvor hos en underliggande gles variational Gaussian Process (GP) modell. Resultat visar att en gles variational GP möjliggör en tillförlitlig approximation av en komplett GP-lösningen. Experiment utförs på ett syntetisk och ett reellt sensorisk dataset. Vi utvärderar resultat genom att bedöma hur nära de rekonstruerade ytorna är till grundtruth- korrespondenser, och hur väl objektkategorier klustras utifrån den erhållna representationen. Slutligen konstaterar vi att den föreslagna lösningen leder till tillräckligt goda representationer av ytor för tolkning av objektens form och för att diskriminera objekt utifrån forminformation. shape learning surface reconstruction 3D reconstruction 3D object modeling probabilistic 3D reconstruction probabilistic surface learning Gaussian Process GP Sparse Gaussian Process Sparse GP Gaussian Process Implicit Surface GPIS variational inference perception learning robotic perception Computer Sciences Datavetenskap (datalogi)

Search results