Spelling suggestions: "subject:"beet++"" "subject:"tet++""
1 |
Evaluation of Tree Planting using Computer Vision models YOLO and U-NetLiszka, Sofie January 2023 (has links)
Efficient and environmentally responsible tree planting is crucial to sustainable land management. Tree planting processes involve significant machinery and labor, impacting efficiency and ecosystem health. In response, Södra Skogsägarna introduced the BraSatt initiative to develop an autonomous planting vehicle called E-Beaver. This vehicle aims to simultaneously address efficiency and ecological concerns by autonomously planting saplings in clear-felled areas. BIT ADDICT, partnering with Södra Skogsägarna, is re- sponsible for developing the control system for E-Beaver’s autonomous navigation and perception. In this thesis work, we examine the possibility of using the computer vision models YOLO and U-Net for detecting and segmenting newly planted saplings in a clear felled area. We also compare the models’ performances with and without augmenting the dataset to see if that would yield better-performing models. RGB and RGB-D images were gath- ered with the ZED 2i stereo camera. Two different models are presented, one for detecting saplings in RGB images taken with a top-down perspective and the other for segmenting saplings trunks from RGB-D images taken with a side perspective. The purpose of this the- sis work is to be able to use the models for evaluating the plating of newly planted saplings so that autonomous tree planting can be done. The outcomes of this research showcase that YOLOv8s has great potential in detecting tree saplings from a top-down perspective and the YOLOv8s-seg models in segmenting sapling trunks. The YOLOv8s-seg models performed significantly better on segmenting the trunks compared to U-Net models. The research contributes insights into using computer vision for efficient and ecologi- cally sound tree planting practices, poised to reshape the future of sustainable land man- agement. / BraSatt
|
2 |
Detekce a lokalizace mikrobiálních kolonií pomocí algoritmů hlubokého učení / Detection and localization of microbial colonies by means of deep learning algorithmsČičatka, Michal January 2021 (has links)
Due to massive expansion of the mass spectrometry and constant price growth of the human labour the optimalisation of the microbial samples preparation comes into question. This master thesis deals with design and implementation of a machine learning algorithm for segmentation of images of microbial colonies cultivated on Petri dishes. This algorithm is going to be a part of a controlling software of a MBT Pathfinder device developed by the company Bruker s. r. o. that automates the process of smearing microbial colonies onto a MALDI target plates. In terms of this thesis a several models of neural networks based on the UNet, UNet++ and ENet architecture were implemented. Based on a number of experiments investigating various configurations of the networks and pre-processing of the training datatset there was chosen an ENet model with quadruplet filter count and additional convolutional block of the encoder trained on a dataset pre-processed with round mask.
|
3 |
FGSSNet: Applying Feature-Guided Semantic Segmentation on real world floorplansNorrby, Hugo, Färm, Gabriel January 2024 (has links)
This master thesis introduces FGSSNet, a novel multi-headed feature-guided semantic segmentation (FGSS) architecture designed to improve the generalization ability of segmentation models on floorplans by injecting domain-specific information into the latent space, guiding the segmentation process. FGSSNet features a U-Net segmentation backbone with a jointly trained reconstruction head attached to the U-Net decoder, tasked with reconstructing the injected feature maps, forcing their utilization throughout the decoding process. A multi-headed dedicated feature extractor is used to extract the domain-specific feature maps used by the FGSSNet while also predicting the wall width used for our novel dynamic scaling algorithm, designed to ensure spatial consistency between the training and real-world floorplans. The results show that the reconstruction head proved redundant, diverting the networks attention away from the segmentation task, ultimately hindering its performance. Instead, the ablated reconstruction head model, FGSSNet-NoRec, showed increased performance by utilizing the injected features freely, showcasing their importance. FGSSNet-NoRec slightly improves the IoU performance of comparable U-Net models by achieving 79.3 wall IoU(%) on a preprocessed CubiCasa5K dataset while showing an average IoU increase of 3.0 (5.3%) units on the more challenging real-world floorplans, displaying a superior generalization performance by leveraging the injected domain-specific information.
|
4 |
Semantic Segmentation of Iron Ore Pellets in the CloudLindberg, Hampus January 2021 (has links)
This master's thesis evaluates data annotation, semantic segmentation and Docker for use in AWS. The data provided has to be annotated and is to be used as a dataset for the creation of a neural network. Different neural network models are then to be compared based on performance. AWS has the option to use Docker containers and thus that option is to be examined, and lastly the different tools available in AWS SageMaker will be analyzed for bringing a neural network to the cloud. Images were annotated in Ilastik and the dataset size is 276 images, then a neural network was created in PyTorch by using the library Segmentation Models PyTorch which gave the option of trying different models. This neural network was created in a notebook in Google Colab for a quick setup and easy testing. The dataset was then uploaded to AWS S3 and the notebook was brought from Colab to an AWS instance where the dataset then could be loaded from S3. A Docker container was created and packaged with the necessary packages and libraries as well as the training and inference code, to then be pushed to the ECR (Elastic Container Registry). This container could then be used to perform training jobs in SageMaker which resulted in a trained model stored in S3, and the hyperparameter tuning tool was also examined to get a better performing model. The two different deployment methods in SageMaker was then investigated to understand the entire machine learning solution. The images annotated in Ilastik were deemed sufficient as the neural network results were satisfactory. The neural network created was able to use all of the models accessible from Segmentation Models PyTorch which enabled a lot of options. By using a Docker container all of the tools available in SageMaker could be used with the created neural network packaged in the container and pushed to the ECR. Training jobs were run in SageMaker by using the container to get a trained model which could be saved to AWS S3. Hyperparameter tuning was used and got better results than the manually tested parameters which resulted in the best neural network produced. The model that was deemed the best was Unet++ in combination with the Dpn98 encoder. The two different deployment methods in SageMaker was explored and is believed to be beneficial in different ways and thus has to be reconsidered for each project. By analysis the cloud solution was deemed to be the better alternative compared to an in-house solution, in all three aspects measured, which was price, performance and scalability.
|
5 |
Upscaling of pictures using convolutional neural networksNorée Palm, Caspar, Granström, Hugo January 2021 (has links)
The task of upscaling pictures is very ill-posed since it requires the creation of novel data. Any algorithm or model trying to perform this task will have to interpolate and guess the missing pixels in the pictures. Classical algorithms usually result in blurred or pixelated interpolations, especially visible around sharp edges. The reason it could be considered a good idea to use neural networks to upscale pictures is because they can infer context when upsampling different parts of an image. In this report, a special deep learning structure called U-Net is trained on reconstructing high-resolution images from the Div2k dataset. Multiple loss functions are tested and a combination of a GAN-based loss function, simple pixel loss and also a Sobel-based edge loss was used to get the best results. The proposed model scored a PSNR score of 33.11dB compared to Lanczos 30.23dB, one of the best classical algorithms, on the validation dataset.
|
6 |
Segmenting the Left Atrium in Cardic CT Images using Deep LearningNayak, Aman Kumar January 2021 (has links)
Convolution neural networks have achieved a state of the art accuracy for multi-class segmentation in biomedical image science. In this thesis, a 2-Stage binary 2D UNet and MultiResUNet are used to segment the 3D cardiac CT Volumes. 3D volumes have been sliced into 2D images. The 2D networks learned to classify the pixels by transforming the information about the segmentation into latent feature space in a contracting path and upsampling them to semantic segmentation in an expanding path. The network trained on diastole and systole timestamp volumes will be able to handle much more extreme morphological differences between the subjects. Evaluation of the results is based on the Dice coefficient as a segmentation metric. The thesis work also explores the impact of the various loss function in image segmentation for the imbalanced dataset. Results show that2-Stage binary UNet has higher performance than MultiResUnet considering segmentation done in all planes. In this work, Convolution neural network prediction uncertainty is estimated using Monte Carlo dropout estimation and it shows that 2-Stage Binary UNet has lower prediction uncertainty than MultiResUNet.
|
7 |
Medical Image Segmentation using Attention-Based Deep Neural Networks / Medicinsk bildsegmentering med attention-baserade djupa neurala nätverkAhmed, Mohamed January 2020 (has links)
During the last few years, segmentation architectures based on deep learning achieved promising results. On the other hand, attention networks have been invented years back and used in different tasks but rarely used in medical applications. This thesis investigated four main attention mechanisms; Squeeze and Excitation, Dual Attention Network, Pyramid Attention Network, and Attention UNet to be used in medical image segmentation. Also, different hybrid architectures proposed by the author were tested. Methods were tested on a kidney tumor dataset and against UNet architecture as a baseline. One version of Squeeze and Excitation attention outperformed the baseline. Original Dual Attention Network and Pyramid Attention Network showed very poor performance, especially for the tumor class. Attention UNet architecture achieved close results to the baseline but not better. Two more hybrid architectures achieved better results than the baseline. The first is a modified version of Squeeze and Excitation attention. The second is a combination between Dual Attention Networks and UNet architecture. Proposed architectures outperformed the baseline by up to 3% in tumor Dice coefficient. The thesis also shows the difference between 2D architectures and their 3D counterparts. 3D architectures achieved more than 10% higher tumor Dice coefficient than 2D architectures.
|
8 |
DEEP LEARNING-BASED IMAGE RECONSTRUCTION FROM MULTIMODE FIBER: COMPARATIVE EVALUATION OF VARIOUS APPROACHESMohammadzadeh, Mohammad 01 May 2024 (has links) (PDF)
This thesis presents three distinct methodologies aimed at exploring pivotal aspects within the domain of fiber optics and piezoelectric materials. The first approach offers a comprehensive exploration of three pivotal aspects within the realm of fiber optics and piezoelectric materials. The study delves into the influence of voltage variation on piezoelectric displacement, examines the effects of bending multimode fiber (MMF) on data transmission, and scrutinizes the performance of an Autoencoder in MMF image reconstruction with and without additional noise. To assess the impact of voltage variation on piezoelectric displacement, experiments were conducted by applying varying voltages to a piezoelectric material, meticulously measuring its radial displacement. The results revealed a notable increase in displacement with higher voltage, presenting implications for fiber stability and overall performance. Additionally, the investigation into the effects of bending MMF on data transmission highlighted that the bending process causes the fiber to become leaky and radiate power radially, potentially affecting data transmission. This crucial insight emphasizes the necessity for further research to optimize data transmission in practical fiber systems. Furthermore, the performance of an Autoencoder model was evaluated using a dataset of MMF images, in diverse scenarios. The Autoencoder exhibited impressive accuracy in reconstructing MMF images with high fidelity. The results underscore the significance of ongoing research in these domains, propelling advancements in fiber optic technology.The second approach of this thesis entails a comparative investigation involving three distinct neural network models to assess their efficacy in improving image quality within optical transmissions through multimode fibers, with a specific focus on mitigating speckle patterns. Our proposed methodology integrates multimode fibers with a piezoelectric source, deliberately introducing noise into transmitted images to evaluate their performance using an autoencoder neural network. The autoencoder, trained on a dataset augmented with noise and speckle patterns, adeptly eliminates noise and reconstructs images with enhanced fidelity. Comparative analyses conducted with alternative neural network architectures, namely a single hidden layer (SHL) model and a U-Net architecture, reveal that while U-Net demonstrates superior performance in terms of image reconstruction fidelity, the autoencoder exhibits notable advantages in training efficiency. Notably, the autoencoder achieves saturation SSIM in 450 epochs and 24 minutes, surpassing the training durations of both U-Net (210 epochs, 1 hour) and SHL (160 epochs, 3 hours and 25 minutes) models. Impressively, the autoencoder's training time per epoch is six times faster than U-Net and fourteen times faster than SHL. The experimental setup involves the application of varying voltages via a piezoelectric source to induce noise, facilitating adaptation to real-world conditions. Furthermore, the study not only demonstrates the efficacy of the proposed methodology but also conducts comparative analyses with prior works, revealing significant improvements. Compared to Li et al.'s study, our methodology, particularly when utilizing the pre-trained autoencoder, demonstrates an average improvement of 15% for SSIM and 9% for PSNR in the worst-case scenario. Additionally, when compared to Lai et al.'s study employing a generative adversarial network for image reconstruction, our methodology achieves slightly superior SSIM outcomes in certain scenarios, reaching 96%. The versatility of the proposed method is underscored by its consistent performance across varying voltage scenarios, showcasing its potential applications in medical procedures and industrial inspections. This research not only presents a comprehensive and innovative approach to addressing challenges in optical image reconstruction but also signifies significant advancements compared to prior works. The final approach of this study entails employing Hermit Gaussian Functions with varying orders as activation functions within a U-Net model architecture, aiming to evaluate its effectiveness in image reconstruction. The performance of the model is rigorously assessed across five distinct voltage scenarios, and a supplementary evaluation is conducted with digit 5 excluded from the training set to gauge its generalization capability. The outcomes offer promising insights into the efficacy of the proposed methodologies, showcasing significant advancements in optical image reconstruction. Particularly noteworthy is the robust accuracy demonstrated by the higher orders of the Hermit Gaussian Function in reconstructing MMF images, even amidst the presence of noise introduced by the voltage source. However, a decline in accuracy is noted in the presence of voltage-induced noise, underscoring the imperative need for further research to bolster the model's resilience in real-world scenarios, especially in comparison to the utilization of the Rectified Linear Unit (ReLU) function.
|
9 |
Using Satellite Images and Deep Learning to Detect Water Hidden Under the Vegetation : A cross-modal knowledge distillation-based method to reduce manual annotation work / Användning Satellitbilder och Djupinlärning för att Upptäcka Vatten Gömt Under Vegetationen : En tvärmodal kunskapsdestillationsbaserad metod för att minska manuellt anteckningsarbeteCristofoli, Ezio January 2024 (has links)
Detecting water under vegetation is critical to tracking the status of geological ecosystems like wetlands. Researchers use different methods to estimate water presence, avoiding costly on-site measurements. Optical satellite imagery allows the automatic delineation of water using the concept of the Normalised Difference Water Index (NDWI). Still, optical imagery is subject to visibility conditions and cannot detect water under the vegetation, a typical situation for wetlands. Synthetic Aperture Radar (SAR) imagery works under all visibility conditions. It can detect water under vegetation but requires deep network algorithms to segment water presence, and manual annotation work is required to train the deep models. This project uses DEEPAQUA, a cross-modal knowledge distillation method, to eliminate the manual annotation needed to extract water presence from SAR imagery with deep neural networks. In this method, a deep student model (e.g., UNET) is trained to segment water in SAR imagery. The student model uses the NDWI algorithm as the non-parametric, cross-modal teacher. The key prerequisite is that NDWI works on the optical imagery taken from the exact location and simultaneously as the SAR. Three different deep architectures are tested in this project: UNET, SegNet, and UNET++, and the Otsu method is used as the baseline. Experiments on imagery from Swedish wetlands in 2020-2022 show that cross-modal distillation consistently achieved better segmentation performances across architectures than the baseline. Additionally, the UNET family of algorithms performed better than SegNet with a confidence of 95%. The UNET++ model achieved the highest Intersection Over Union (IOU) performance. However, no statistical evidence emerged that UNET++ performs better than UNET, with a confidence of 95%. In conclusion, this project shows that cross-modal knowledge distillation works well across architectures and removes tedious and expensive manual work hours when detecting water from SAR imagery. Further research could evaluate performances on other datasets and student architectures. / Att upptäcka vatten under vegetation är avgörande för att hålla koll på statusen på geologiska ekosystem som våtmarker. Forskare använder olika metoder för att uppskatta vattennärvaro vilket undviker kostsamma mätningar på plats. Optiska satellitbilder tillåter automatisk avgränsning av vatten med hjälp av konceptet Normalised Difference Water Index (NDWI). Optiska bilder fortfarande beroende av siktförhållanden och kan inte upptäcka vatten under vegetationen, en typisk situation för våtmarker. Synthetic Aperture Radar (SAR)-bilder fungerar under alla siktförhållanden. Den kan detektera vatten under vegetation men kräver djupa nätverksalgoritmer för att segmentera vattennärvaro, och manuellt anteckningsarbete krävs för att träna de djupa modellerna. Detta projekt använder DEEPAQUA, en cross-modal kunskapsdestillationsmetod, för att eliminera det manuella annoteringsarbete som behövs för att extrahera vattennärvaro från SAR-bilder med djupa neurala nätverk. I denna metod tränas en djup studentmodell (t.ex. UNET) att segmentera vatten i SAR-bilder semantiskt. Elevmodellen använder NDWI, som fungerar på de optiska bilderna tagna från den exakta platsen och samtidigt som SAR, som den icke-parametriska, cross-modal lärarmodellen. Tre olika djupa arkitekturer testas i detta examensarbete: UNET, SegNet och UNET++, och Otsu-metoden används som baslinje. Experiment på bilder tagna på svenska våtmarker 2020-2022 visar att cross-modal destillation konsekvent uppnådde bättre segmenteringsprestanda över olika arkitekturer jämfört med baslinjen. Dessutom presterade UNET-familjen av algoritmer bättre än SegNet med en konfidens på 95%. UNET++-modellen uppnådde högsta prestanda för Intersection Over Union (IOU). Det framkom dock inga statistiska bevis för att UNET++ presterar bättre än UNET, med en konfidens på 95%. Sammanfattningsvis visar detta projekt att cross-modal kunskapsdestillation fungerar bra över olika arkitekturer och tar bort tidskrävande och kostsamma manuella arbetstimmar vid detektering av vatten från SAR-bilder. Ytterligare forskning skulle kunna utvärdera prestanda på andra datamängder och studentarkitekturer.
|
10 |
Exploring Deep Learning Frameworks for Multiclass Segmentation of 4D Cardiac Computed Tomography / Utforskning av djupinlärningsmetoder för 4D segmentering av hjärtat från datortomografiJanurberg, Norman, Luksitch, Christian January 2021 (has links)
By combining computed tomography data with computational fluid dynamics, the cardiac hemodynamics of a patient can be assessed for diagnosis and treatment of cardiac disease. The advantage of computed tomography over other medical imaging modalities is its capability of producing detailed high resolution images containing geometric measurements relevant to the simulation of cardiac blood flow. To extract these geometries from computed tomography data, segmentation of 4D cardiac computed tomography (CT) data has been performed using two deep learning frameworks that combine methods which have previously shown success in other research. The aim of this thesis work was to develop and evaluate a deep learning based technique to segment the left ventricle, ascending aorta, left atrium, left atrial appendage and the proximal pulmonary vein inlets. Two frameworks have been studied where both utilise a 2D multi-axis implementation to segment a single CT volume by examining it in three perpendicular planes, while one of them has also employed a 3D binary model to extract and crop the foreground from surrounding background. Both frameworks determine a segmentation prediction by reconstructing three volumes after 2D segmentation in each plane and combining their probabilities in an ensemble for a 3D output. The results of both frameworks show similarities in their performance and ability to properly segment 3D CT data. While the framework that examines 2D slices of full size volumes produces an overall higher Dice score, it is less successful than the cropping framework at segmenting the smaller left atrial appendage. Since the full size 2D slices also contain background information in each slice, it is believed that this is the main reason for better segmentation performance. While the cropping framework provides a higher proportion of each foreground label, making it easier for the model to identify smaller structures. Both frameworks show success for use in 3D cardiac CT segmentation, and with further research and tuning of each network, even better results can be achieved.
|
Page generated in 0.0301 seconds