Global ETD Search

11	Apprentissage autosupervisé de modèles prédictifs de segmentation à partir de vidéos / Self-supervised learning of predictive segmentation models from video Luc, Pauline 25 June 2019 (has links) Les modèles prédictifs ont le potentiel de permettre le transfert des succès récents en apprentissage par renforcement à de nombreuses tâches du monde réel, en diminuant le nombre d’interactions nécessaires avec l’environnement.La tâche de prédiction vidéo a attiré un intérêt croissant de la part de la communauté ces dernières années, en tant que cas particulier d’apprentissage prédictif dont les applications en robotique et dans les systèmes de navigations sont vastes.Tandis que les trames RGB sont faciles à obtenir et contiennent beaucoup d’information, elles sont extrêmement difficile à prédire, et ne peuvent être interprétées directement par des applications en aval.C’est pourquoi nous introduisons ici une tâche nouvelle, consistant à prédire la segmentation sémantique ou d’instance de trames futures.Les espaces de descripteurs que nous considérons sont mieux adaptés à la prédiction récursive, et nous permettent de développer des modèles de segmentation prédictifs performants jusqu’à une demi-seconde dans le futur.Les prédictions sont interprétables par des applications en aval et demeurent riches en information, détaillées spatialement et faciles à obtenir, en s’appuyant sur des méthodes état de l’art de segmentation.Dans cette thèse, nous nous attachons d’abord à proposer pour la tâche de segmentation sémantique, une approche discriminative se basant sur un entrainement par réseaux antagonistes.Ensuite, nous introduisons la tâche nouvelle de prédiction de segmentation sémantique future, pour laquelle nous développons un modèle convolutionnel autoregressif.Enfin, nous étendons notre méthode à la tâche plus difficile de prédiction de segmentation d’instance future, permettant de distinguer entre différents objets.Du fait du nombre de classes variant selon les images, nous proposons un modèle prédictif dans l’espace des descripteurs d’image convolutionnels haut niveau du réseau de segmentation d’instance Mask R-CNN.Cela nous permet de produire des segmentations visuellement plaisantes en haute résolution, pour des scènes complexes comportant un grand nombre d’objets, et avec une performance satisfaisante jusqu’à une demi seconde dans le futur. / Predictive models of the environment hold promise for allowing the transfer of recent reinforcement learning successes to many real-world contexts, by decreasing the number of interactions needed with the real world.Video prediction has been studied in recent years as a particular case of such predictive models, with broad applications in robotics and navigation systems.While RGB frames are easy to acquire and hold a lot of information, they are extremely challenging to predict, and cannot be directly interpreted by downstream applications.Here we introduce the novel tasks of predicting semantic and instance segmentation of future frames.The abstract feature spaces we consider are better suited for recursive prediction and allow us to develop models which convincingly predict segmentations up to half a second into the future.Predictions are more easily interpretable by downstream algorithms and remain rich, spatially detailed and easy to obtain, relying on state-of-the-art segmentation methods.We first focus on the task of semantic segmentation, for which we propose a discriminative approach based on adversarial training.Then, we introduce the novel task of predicting future semantic segmentation, and develop an autoregressive convolutional neural network to address it.Finally, we extend our method to the more challenging problem of predicting future instance segmentation, which additionally segments out individual objects.To deal with a varying number of output labels per image, we develop a predictive model in the space of high-level convolutional image features of the Mask R-CNN instance segmentation model.We are able to produce visually pleasing segmentations at a high resolution for complex scenes involving a large number of instances, and with convincing accuracy up to half a second ahead. Apprentissage profond Segmentation sémantique Segmentation d’instance Modèles génératifs Apprentissage prédictif Compréhension vidéo Deep learning Semantic segmentation Instance segmentation Generative modeling Predictive learning Video understanding 004 510
12	3D Instance Segmentation of Cluttered Scenes : A Comparative Study of 3D Data Representations Konradsson, Albin, Bohman, Gustav January 2021 (has links) This thesis provides a comparison between instance segmentation methods using point clouds and depth images. Specifically, their performance on cluttered scenes of irregular objects in an industrial environment is investigated. Recent work by Wang et al. [1] has suggested potential benefits of a point cloud representation when performing deep learning on data from 3D cameras. However, little work has been done to enable quantifiable comparisons between methods based on different representations, particularly on industrial data. Generating synthetic data provides accurate grayscale, depth map, and point cloud representations for a large number of scenes and can thus be used to compare methods regardless of datatype. The datasets in this work are created using a tool provided by SICK. They simulate postal packages on a conveyor belt scanned by a LiDAR, closely resembling a common industry application. Two datasets are generated. One dataset has low complexity, containing only boxes.The other has higher complexity, containing a combination of boxes and multiple types of irregularly shaped parcels. State-of-the-art instance segmentation methods are selected based on their performance on existing benchmarks. We chose PointGroup by Jiang et al. [2], which uses point clouds, and Mask R-CNN by He et al. [3], which uses images. The results support that there may be benefits of using a point cloud representation over depth images. PointGroup performs better in terms of the chosen metric on both datasets. On low complexity scenes, the inference times are similar between the two methods tested. However, on higher complexity scenes, MaskR-CNN is significantly faster. Deep Learning Computer Vision Point Cloud Depth Map 3D Instance Segmentation Cluttered Scenes
13	Using Mask R-CNN for Instance Segmentation of Eyeglass Lenses / Användning av Mask R-CNN för instanssegmentering av glasögonlinser Norrman, Marcus, Shihab, Saad January 2021 (has links) This thesis investigates the performance of Mask R-CNN when utilizing transfer learning on a small dataset. The aim was to instance segment eyeglass lenses as accurately as possible from self-portrait images. Five different models were trained, where the key difference was the types of eyeglasses the models were trained on. The eyeglasses were grouped into three types, fully rimmed, semi-rimless, and rimless glasses. 1550 images were used for training, validation, and testing. The model's performances were evaluated using TensorBoard training data and mean Intersection over Union scores (mIoU). No major differences in performance were found in four of the models, which grouped all three types of glasses into one class. Their mIoU scores range from 0.913 to 0.94 whereas the model with one class for each group of glasses, performed worse, with a mIoU of 0.85. The thesis revealed that one can achieve great instance segmentation results using a limited dataset when taking advantage of transfer learning. / Denna uppsats undersöker prestandan för Mask R-CNN vid användning av överföringsinlärning på en liten datamängd. Syftet med arbetet var att segmentera glasögonlinser så exakt som möjligt från självporträttbilder. Fem olika modeller tränades, där den viktigaste skillnaden var de typer av glasögon som modellerna tränades på. Glasögonen delades in i 3 typer, helbåge, halvbåge och båglösa. Totalt samlades 1550 träningsbilder in, dessa annoterades och användes för att träna modellerna. Modellens prestanda utvärderades med TensorBoard träningsdata samt genomsnittlig Intersection over Union (IoU). Inga större skillnader i prestanda hittades mellan modellerna som endast tränades på en klass av glasögon. Deras genomsnittliga IoU varierar mellan 0,913 och 0,94. Modellen där varje glasögonkategori representerades som en unik klass, presterade sämre med en genomsnittlig IoU på 0,85. Resultatet av uppsatsen påvisar att goda instanssegmenteringsresultat går att uppnå med hjälp av en begränsad datamängd om överföringsinlärning används. Machine Learning Computer Vision Instance Segmentation Mask R-CNN CNN Convolutional Neural Networks Transfer Learning Maskininlärning Datorseende Instanssegmentering Mask R-CNN CNN Konvolutionella neurala nätverk Överföringsinlärning Mathematics Matematik
14	Learning to Measure Invisible Fish Gustafsson, Stina January 2022 (has links) In recent years, the EU has observed a decrease in the stocks of certain fish species due to unrestricted fishing. To combat the problem, many fisheries are investigating how to automatically estimate the catch size and composition using sensors onboard the vessels. Yet, measuring the size of fish in marine imagery is a difficult task. The images generally suffer from complex conditions caused by cluttered fish, motion blur and dirty sensors. In this thesis, we propose a novel method for automatic measurement of fish size that can enable measuring both visible and occluded fish. We use a Mask R-CNN to segment the visible regions of the fish, and then fill in the shape of the occluded fish using a U-Net. We train the U-Net to perform shape completion in a semi-supervised manner, by simulating occlusions on an open-source fish dataset. Different to previous shape completion work, we teach the U-Net when to fill in the shape and not by including a small portion of fully visible fish in the input training data. Our results show that our proposed method succeeds to fill in the shape of the synthetically occluded fish as well as of some of the cluttered fish in real marine imagery. We achieve an mIoU score of 93.9 % on 1 000 synthetic test images and present qualitative results on real images captured onboard a fishing vessel. The qualitative results show that the U-Net can fill in the shapes of lightly occluded fish, but struggles when the tail fin is hidden and only parts of the fish body is visible. This task is difficult even for a human, and the performance could perhaps be increased by including the fish appearance in the shape completion task. The simulation-to-reality gap could perhaps also be reduced by finetuning the U-Net on some real occlusions, which could increase the performance on the heavy occlusions in the real marine imagery. Instance Segmentation Shape Completion Automatic Size Measurement Fisheries Electronic Monitoring Mask R-CNN U-Net
15	Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation Dickens, James 01 September 2021 (has links) The rise of convolutional neural networks (CNNs) in the context of computer vision has occurred in tandem with the advancement of depth sensing technology. Depth cameras are capable of yielding two-dimensional arrays storing at each pixel the distance from objects and surfaces in a scene from a given sensor, aligned with a regular color image, obtaining so-called RGBD images. Inspired by prior models in the literature, this work develops a suite of RGBD CNN models to tackle the challenging tasks of object detection, instance segmentation, and semantic segmentation. Prominent architectures for object detection and image segmentation are modified to incorporate dual backbone approaches inputting RGB and depth images, combining features from both modalities through the use of novel fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection approach achieves 53.5% mAP on the SUN RGBD 19-class object detection benchmark, while the proposed RGBD semantic segmentation architecture yields 69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4% mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and SUN RGBD datasets. These benchmarks offer researchers a baseline for the task of RGBD panoptic segmentation on these datasets, where the novel depth-aware model outperforms a comparable RGB counterpart. Deep learning Computer vision CNN Object detection Semantic segmentation Instance segmentation Multi-modal deep learning Panoptic segmentation Artificial intelligence Convolutional neural networks Neural networks RGBD Depth images
16	Thermal Imaging-Based Instance Segmentation for Automated Health Monitoring of Steel Ladle Refractory Lining / Infraröd-baserad Instanssegmentering för Automatiserad Övervakning av Eldfast Murbruk i Stålskänk Bråkenhielm, Emil, Drinas, Kastrati January 2022 (has links) Equipment and machines can be exposed to very high temperatures in the steel mill industry. One particularly critical part is the ladles used to hold and pour molten iron into mouldings. A refractory lining is used as an insulation layer between the outer steel shell and the molten iron to protect the ladle from the hot iron. Over time, or if the lining is not completely cured, the lining wears out or can potentially fail. Such a scenario can lead to a breakout of molten iron, which can cause damage to equipment and, in the worst case, workers. Previous work analyses how critical areas can be identified in a proactive matter. Using thermal imaging, the failing spots on the lining could show as high-temperature areas on the outside steel shell. The idea is that the outside temperature corresponds to the thickness of the insulating lining. The detection of these spots is identified when temperatures over a given threshold are registered within the thermal camera's field of view. The images must then be manually analyzed over time, to follow the progression of a detected spot. The existing solution is also prone to the background noise of other hot objects. This thesis proposes an initial step to automate monitoring the health of refractory lining in steel ladles. The report will investigate the usage of Instance Segmentation to isolate the ladle from its background. Thus, reducing false alarms and background noise in an autonomous monitoring setup. The model training is based on Mask R-CNN on our own thermal images, with pre-trained weights from visual images. Detection is done on two classes: open or closed ladle. The model proved reasonably successful on a small dataset of 1000 thermal images. Different models were trained with and without augmentation, pre-trained weights as well multi-phase fine-tuning. The highest mAP of 87.5\% was achieved on a pre-trained model with image augmentation without fine-tuning. Though it was not tested in production, temperature readings could lastly be extracted on the segmented ladle, decreasing the risk of false alarms from background noise. Computer Vision Deep Learning Thermal Imaging Instance Segmentation Mask R-CNN Steel Ladle Breakout Prevention
17	Point clouds in the application of Bin Picking Anand, Abhijeet January 2023 (has links) Automatic bin picking is a well-known problem in industrial automation and computer vision, where a robot picks an object from a bin and places it somewhere else. There is continuous ongoing research for many years to improve the contemporary solution. With camera technology advancing rapidly and available fast computation resources, solving this problem with deep learning has become a current interest for several researchers. This thesis intends to leverage the current state-of-the-art deep learning based methods of 3D instance segmentation and point cloud registration and combine them to improve the bin picking solution by improving the performance and make them robust. The problem of bin picking becomes complex when the bin contains identical objects with heavy occlusion. To solve this problem, a 3D instance segmentation is performed with Fast Point Cloud Clustering (FPCC) method to detect and locate the objects in the bin. Further, an extraction strategy is proposed to choose one predicted instance at a time. Inthe next step, a point cloud registration technique is implemented based on PointNetLK method to estimate the pose of the selected object from the bin. The above implementation is trained, tested, and evaluated on synthetically generated datasets. The synthetic dataset also contains several noisy point clouds to imitate a real situation. The real data captured at the company ’SICK IVP’ is also tested with the implemented model. It is observed that the 3D instance segmentation can detect and locate the objects available in the bin. In a noisy environment, the performance degrades as the noise level increase. However, the decrease in the performance is found to be not so significant. Point cloud registration is observed to register best with the full point cloud of the object, when compared to point cloud with missing points. Point clouds Computer vision 3D instance segmentation Point cloud registration Deep learning Clustering
18	Strategies for the Characterization and Virtual Testing of SLM 316L Stainless Steel Hendrickson, Michael Paul 02 August 2023 (has links) The selective laser melting (SLM) process allows for the control of unique part form and function characteristics not achievable with conventional manufacturing methods and has thus gained interest in several industries such as the aerospace and biomedical fields. The fabrication processing parameters selected to manufacture a given part influence the created material microstructure and the final mechanical performance of the part. Understanding the process-structure and structure-performance relationships is very important for the design and quality assurance of SLM parts. Image based analysis methods are commonly used to characterize material microstructures, but are very time consuming, traditionally requiring manual segmentation of imaged features. Two Python-based image analysis tools are developed here to automate the instance segmentation of manufacturing defects and subgranular cell features commonly found in SLM 316L stainless steel (SS) for quantitative analysis. A custom trained mask region-based convolution neural network (Mask R-CNN) model is used to segment cell features from scanning electron microscopy (SEM) images with an instance segmentation accuracy nearly identical to that of a human researcher, but about four orders of magnitude faster. The defect segmentation tool uses techniques from the OpenCV Python library to identify and segment defect instances from optical images. A melt pool structure generation tool is also developed to create custom melt-pool geometries based on a few user inputs with the ability to create functionally graded structures for use in a virtual testing framework. This tool allows for the study of complex melt-pool geometries and graded structures commonly seen in SLM parts and is applied to three finite element analyses to investigate the effects of different melt-pool geometries on part stress concentrations. / Master of Science / Recent advancements in additive manufacturing (AM) processes like the selective laser melting (SLM) process are revolutionizing the way many products are manufactured. The geometric form and material microstructure of SLM parts can be controlled by manufacturing settings, referred to as fabrication processing parameters, in ways not previously possible via conventional manufacturing techniques such as machining and casting. The improved geometric control of SLM parts has enabled more complex part geometries as well as significant manufacturing cost savings for some parts. With improved control over the material microstructure, the mechanical performance of SLM parts can be finely tailored and optimized for a particular application. Complex functionally graded materials (FGM) can also easily be created with the SLM process by varying the fabrication processing parameters spatially within the manufactured part to improve mechanical performance for a desired application. The added control offered by the SLM process has created a need for understanding how changes in the fabrication processing parameters affect the material structure, and in turn, how the produced structure affects the mechanical properties of the part. This study presents three different tools developed for the automated characterization of SLM 316L stainless steel (SS) material structures and the generation of realistic material structures for numerical simulation of mechanical performance. A defect content tool is presented to automatically identify and create binary segmentations of defects in SLM parts, consisting of small air pockets within the volume of the parts, from digital optical images. A machine learning based instance segmentation tool is also trained on a custom data set and used to measure the size of nanoscale cell features unique to 316L (SS) and some other metal alloys processed with SLM from scanning electron microscopy (SEM) images. Both these tools automate the laborious process of segmenting individual objects of interest from hundreds or thousands of images and are shown to have an accuracy very close to that of manually produced results from a human. The results are also used to analyze three different samples produced with different fabrication processing parameters which showed similar process-structure relationships with other studies. The SLM structure generation tool is developed to create melt pool structures similar to those seen in SLM parts from the successive melting and solidification of material from the laser scanning path. This structural feature is unique to AM processes such as SLM, and the example test cases investigated in this study shows that changes in the melt pool structure geometry have a measurable effect, slightly above 10% difference, on the stress and strain response of the material when a tensile load is applied. The melt pool structure generation tool can create complex geometries capable of varying spatially to create FGMs from a few user inputs, and when applied to existing simulation methods for SLM parts, offers improved estimates for the mechanical response of SLM parts. Selective Laser Melting Functionally Graded Materials 316L Stainless Steel Additive Manufacturing Material Characterization Mast R-CNN Instance Segmentation Finite Element Analysis
19	Online Panoptic Mapping of Indoor Environments : A complete panoptic mapping framework / Realtid Panoptisk Kartläggning av Inomhusmiljöer : Ett komplett panoptiskt kartläggningsramverk G Sneltvedt, Isak January 2024 (has links) Replicating a real-world environment is crucial for creating simulations, computer vision, global and local path planning, and localization. While computer-aided design software is a standard tool for such a task, it may not always be practical or effective. An alternative approach is mapping, which uses sensory input and computer vision technologies to reconstruct the environment. However, developing such software requires knowledge of various fields, making it a challenging task. This thesis deep-dives into a state-of-the-art mapping framework and explores potential improvements, providing a foundation for an open-source project. The resulting software can replicate a real-world environment while storing panoptic classification data on a voxel level. Through 3D object matching and probability theory, the mapping software is resilient to object misclassifications and retains consistency in the different instances of observed objects. The final software is designed to make it easy to use in a different project by substituting the simulation data with a semantic, instance, or panoptic segmentation model. Additionally, the software integrates certain functionalities that facilitate the visualization of diverse classes or a particular class instance. / Att replikera en verklig miljö är avgörande för att skapa simuleringar, datorseende, global och lokal vägplanering samt lokalisering. Trots att ett datorstött designprogram är ett standardverktyg för sådana uppgifter kanske det inte alltid är praktiskt eller effektivt. Ett alternativt tillvägagångssätt är kartläggning, som använder sensorisk input och datorseendeteknik för att uppnå reskonstruering av omgivningar. Att utveckla sådan programvara kräver dock kunskap inom olika områden, vilket gör det till en utmanande uppgift. Den här avhandlingen fördjupar sig i ett toppmodernt kartläggningsramverk och utforskar potentiella förbättringar, vilket ger en grund för ett öppet källkodsprojekt. Resultatet av denna avhandling är en programvara som kan replikera en verklig miljö samtidigt som den lagrar panoptisk klassificeringsdata på en voxelnivå. Genom 3D-objektmatchning och sannolikhetsteori är kartläggningsprogramvaran motståndskraftig mot felaktiga objektklassificeringar och är koncekvent avseende förekomsten av olika observerade objekt. Den slutliga programvaran är utformad med fokus på att göra den enkel att använda i andra projekt genom att ersätta simuleringsdata med en semantisk, instans eller panoptisk segmenteringsmodell. Dessutom integrerar programvaran funktioner som underlättar visualiseringen av antingen olika klasser eller en specifik instans av en klass. Semantic segmentation Instance segmentation Indoor probabilistic mapping Computer Vision Real-time Semantisk segmentering Instans segmentering Kartläggning av Inomhusmiljöer Datorseende Realtid Computer and Information Sciences Data- och informationsvetenskap
20	Instance segmentation using 2.5D data Öhrling, Jonathan January 2023 (has links) Multi-modality fusion is an area of research that has shown promising results in the domain of 2D and 3D object detection. However, multi-modality fusion methods have largely not been utilized in the domain of instance segmentation. This master’s thesis investigated if multi-modality fusion methods can be applied to deep learning instance segmentation models to improve their performance on multi-modality data. The two multi-modality fusion methods presented, input extension and feature fusions, were applied to a two-stage instance segmentation model, Mask R-CNN, and a single-stage instance segmentation model, RTMDet. Models were trained on different variations of preprocessed RGBD and ToF data provided by SICK IVP, as well as RGBD data from the publicly available NYUDepth dataset. The master’s thesis concludes that the multi-modality fusion method presented as feature fusion can be applied to the Mask R-CNN model to improve the networks performance by 1.8%points (1.8%pt.) bounding box mAP and 1.6%pt. segmentation mAP on SICK RGBD, 7.7%pt. bounding box mAP and 7.4%pt. segmentation mAP on ToF, and 7.4%pt. bounding box mAP and 7.4%pt. segmentation mAP on NYUDepth. The RTMDet model saw little to no improvements from the inclusion of depth but had similar baseline performance as the improved Mask R-CNN model that utilized feature fusion. The input extension method saw no improvements to performance as it faced technical implementation limitations. instance segmentation multi-modality segmentation multi-modality fusion CNN RGBD ToF Mask R-CNN RTMDet MMDetection COCO NYUDepth Computer Engineering Datorteknik

Search results