Global ETD Search

21	Self-Supervised Transformer Networks for Error Classification of Tightening Traces Bogatov Wilkman, Dennis January 2022 (has links) Transformers have shown remarkable results in the domains of Natural Language Processing and Computer Vision. This naturally raises the question whether the success could be replicated in other domains. However, due to Transformers being inherently data hungry and sensitive to weight initialization, applying the Transformer to new domains is quite a challenging task. Previously, the data demands have been met using large scale supervised or self-supervised pre-training on a similar task before supervised fine-tuning on a target down stream task. We show that Transformers are applicable for the task of multi-label error classification of trace data, and that masked data modelling based self-supervised learning methods can be used to leverage unlabelled data to increase performance compared to a baseline supervised learning approach. / Transformers har visat upp anmärkningsvärda resultat inom områdena Natural Language Processing och Computer Vision. Detta väcker naturligtvis frågan om dessa framgångar kan upprepas inom andra områden. På grund av att transformatorer i sig är datahungriga och känsliga för initialisering av vikt är det dock en utmaning att tillämpa transformatorn på nya områden. Tidigare har datakraven tillgodosetts med hjälp av storskalig övervakad eller självövervakad förträning på en liknande uppgift före övervakad finjustering på en måluppgift i efterföljande led. Vi visar att transformatorer kan användas för klassificering av spårdata med flera etiketter och att metoder för självövervakad inlärning som bygger på modellering av maskerade data kan användas för att utnyttja omärkta data för att öka prestandan jämfört med en grundläggande övervakad inlärningsmetod. Read more Transformers Self-supervised Learning Multi-Label Error Classification Tightening Traces Transformatorer Självövervakad Inlärning Skärpnings spår Computer and Information Sciences Data- och informationsvetenskap
22	Emergence of language-like latents in deep neural networks Lu, Yuchen 05 1900 (has links) L'émergence du langage est considérée comme l'une des marques de l'intelligence humaine. Par conséquent, nous émettons l'hypothèse que l'émergence de latences ou de représentations similaires au langage dans un système d'apprentissage profond pourrait aider les modèles à obtenir une meilleure généralisation compositionnelle et hors distribution. Dans cette thèse, nous présentons une série d'articles qui explorent cette hypothèse dans différents domaines, notamment l'apprentissage interactif du langage, l'apprentissage par imitation et la vision par ordinateur. / The emergence of language is regarded as one of the hallmarks of human intelligence. Therefore, we hypothesize that the emergence of language-like latents or representations in a deep learning system could help models achieve better compositional and out-of-distribution generalization. In this thesis, we present a series of papers that explores this hypothesis in different fields including interactive language learning, imitation learning and computer vision. Deep Learning Language Emergence Compositionality Imitation Learning Self-supervised Learning Apprentissage Profond Émergence du Langage Compositionnalité Apprentissage par Imitation Apprentissage Auto-supervisé
23	Leveraging self-supervision for visual embodied navigation with neuralized potential fields Saavedra Ruiz, Miguel Angel 05 1900 (has links) Une tâche fondamentale en robotique consiste à naviguer entre deux endroits. En particulier, la navigation dans le monde réel nécessite une planification à long terme à l'aide d'images RVB (RGB) en haute dimension, ce qui constitue un défi considérable pour les approches d'apprentissage de bout-en-bout. Les méthodes semi-paramétriques actuelles parviennent plutôt à atteindre des objectifs éloignés en combinant des modèles paramétriques avec une mémoire topologique de l'environnement, souvent représentée sous forme d'un graphe ayant pour nœuds des images précédemment vues. Cependant, l'utilisation de ces graphes implique généralement l'ajustement d'heuristiques d'élagage afin d'éviter les arêtes superflues, limiter la mémoire requise et permettre des recherches raisonnablement rapides dans le graphe. Dans cet ouvrage, nous montrons comment les approches de bout-en-bout basées sur l'apprentissage auto-supervisé peuvent exceller dans des tâches de navigation à long terme. Nous présentons initialement Duckie-Former (DF), une approche de bout-en-bout pour la navigation visuelle dans des environnements routiers. En utilisant un Vision Transformer (ViT) pré-entraîné avec une méthode auto-supervisée, nous nous inspirons des champs de potentiels afin de dériver une stratégie de navigation utilisant en entrée un masque de segmentation d'image de faible résolution. DF est évalué dans des tâches de navigation de suivi de voie et d'évitement d'obstacles. Nous présentons ensuite notre deuxième approche intitulée One-4-All (O4A). O4A utilise l'apprentissage auto-supervisé et l'apprentissage de variétés afin de créer un pipeline de navigation de bout-en-bout sans graphe permettant de spécifier l'objectif à l'aide d'une image. La navigation est réalisée en minimisant de manière vorace une fonction de potentiel définie de manière continue dans l'espace latent O4A. Les deux systèmes sont entraînés sans interagir avec le simulateur ou le robot sur des séquences d'exploration de données RVB et de contrôles non experts. Ils ne nécessitent aucune mesure de profondeur ou de pose. L'évaluation est effectuée dans des environnements simulés et réels en utilisant un robot à entraînement différentiel. / A fundamental task in robotics is to navigate between two locations. Particularly, real-world navigation can require long-horizon planning using high-dimensional RGB images, which poses a substantial challenge for end-to-end learning-based approaches. Current semi-parametric methods instead achieve long-horizon navigation by combining learned modules with a topological memory of the environment, often represented as a graph over previously collected images. However, using these graphs in practice typically involves tuning various pruning heuristics to prevent spurious edges, limit runtime memory usage, and allow reasonably fast graph queries. In this work, we show how end-to-end approaches trained through Self-Supervised Learning (SSL) can excel in long-horizon navigation tasks. We initially present Duckie-Former (DF), an end-to-end approach for visual servoing in road-like environments. Using a Vision Transformer (ViT) pretrained with a self-supervised method, we derive a potential-fields-like navigation strategy based on a coarse image segmentation model. DF is assessed in the navigation tasks of lane-following and obstacle avoidance. Subsequently, we introduce our second approach called One-4-All (O4A). O4A leverages SSL and manifold learning to create a graph-free, end-to-end navigation pipeline whose goal is specified as an image. Navigation is achieved by greedily minimizing a potential function defined continuously over the O4A latent space. O4A is evaluated in complex indoor environments. Both systems are trained offline on non-expert exploration sequences of RGB data and controls, and do not require any depth or pose measurements. Assessment is performed in simulated and real-world environments using a differential-drive robot. Read more Navigation visuelle Apprentissage auto-supervisé Champs de potentiel Apprentissage de variétés Robotique Visual navigation Self-supervised learning Potential fields Manifold learning Robotics
24	Self-supervised pre-training of an attention-based model for 3D medical image segmentation / Självövervakad förberedande träning av en attention-baserad model för 3D medicinsk bildsegmentering Sund Aillet, Albert January 2023 (has links) Accurate segmentation of anatomical structures is crucial for radiation therapy in cancer treatment. Deep learning methods have been demonstrated effective for segmentation of 3D medical images, establishing the current standard. However, they require large amounts of labelled data and suffer from reduced performance on domain shift. A possible solution to these challenges is self-supervised learning, that uses unlabelled data to learn representations, which could possibly reduce the need for labelled data and produce more robust segmentation models. This thesis investigates the impact of self-supervised pre-training on an attention-based model for 3D medical image segmentation, specifically focusing on single-organ semantic segmentation, exploring whether self-supervised pre-training enhances the segmentation performance on CT scans with and without domain shift. The Swin UNETR is chosen as the deep learning model since it has been shown to be a successful attention-based architecture for semantic segmentation. During the pre-training stage, the contracting path is trained for three self-supervised pretext tasks using a large dataset of 5 465 unlabelled CT scans. The model is then fine-tuned using labelled datasets with 97, 142 and 288 segmentations of the stomach, the sternum and the pancreas. The results indicate that a substantial performance gain from self-supervised pre-training is not evident. Parameter freezing of the contracting path suggest that the representational power of the contracting path is not as critical for model performance as expected. Decreasing the amount of supervised training data shows that while the pre-training improves model performance when the amount of training data is restricted, the improvements are strongly decreased when more supervised training data is used. / Noggrann segmentering av anatomiska strukturer är avgörande för strålbehandling inom cancervården. Djupinlärningmetoder har visat sig vara effektiva och utgör standard för segmentering av 3D medicinska bilder. Dessa metoder kräver däremot stora mängder märkt data och kännetecknas av lägre prestanda vid domänskift. Eftersom självövervakade inlärningsmetoder använder icke-märkt data för inlärning, kan de möjligen minska behovet av märkt data och producera mer robusta segmenteringsmodeller. Denna uppsats undersöker effekten av självövervakad förberedande träning av en attention-baserad modell för 3D medicinsk bildsegmentering, med särskilt fokus på semantisk segmentering av enskilda organ. Syftet är att studera om självövervakad förberedande träning förbättrar segmenteringsprestandan utan respektive med domänskift. Swin UNETR har valts som djupinlärningsmodell eftersom den har visat sig vara en framgångsrik attention-baserad arkitektur för semantisk segmentering. Under den förberedande träningsfasen optimeras modellens kontraherande del med 5 465 icke-märkta CT-scanningar. Modellen tränas sedan på märkta dataset med 97, 142 och 288 segmenterade skanningar av magen, bröstbenet och bukspottkörteln. Resultaten visar att prestandaökningen från självövervakad förberedande träning inte är tydlig. Parameterfrysning av den kontraherande delen visar att dess representationer inte lika avgörande för segmenteringsprestandan som förväntat. Minskning av mängden träningsdata tyder på att även om den förberedande träningen förbättrar modellens prestanda när mängden träningsdata är begränsad, minskas förbättringarna betydligt när mer träningsdata används. Read more Computer vision Deep learning 3D Medical image segmentation Self-supervised learning Datorseende Djupinlärning 3D Medicinsk bildsegmentering Självövervakad träning Computer and Information Sciences Data- och informationsvetenskap
25	SELF-SUPERVISED ONE-SHOT LEARNING FOR AUTOMATIC SEGMENTATION OF GAN-GENERATED IMAGES Ankit V Manerikar (16523988) 11 July 2023 (has links) <p>Generative Adversarial Networks (GANs) have consistently defined the state-of-the-art in the generative modelling of high-quality images in several applications. The images generated using GANs, however, do not lend themselves to being directly used in supervised learning tasks without first being curated through annotations. This dissertation investigates how to carry out automatic on-the-fly segmentation of GAN-generated images and how this can be applied to the problem of producing high-quality simulated data for X-ray based security screening. The research exploits the hidden layer properties of GAN models in a self-supervised learning framework for the automatic one-shot segmentation of images created by a style-based GAN. The framework consists of a novel contrastive learner that is based on a Sinkhorn distance-based clustering algorithm and that learns a compact feature space for per-pixel classification of the GAN-generated images. This facilitates faster learning of the feature vectors for one-shot segmentation and allows on-the-fly automatic annotation of the GAN images. We have tested our framework on a number of standard benchmarks (CelebA, PASCAL, LSUN) to yield a segmentation performance that not only exceeds the semi-supervised baselines by an average wIoU margin of 1.02 % but also improves the inference speeds by a factor of 4.5. This dissertation also presents BagGAN, an extension of our framework to the problem domain of X-ray based baggage screening. BagGAN produces annotated synthetic baggage X-ray scans to train machine-learning algorithms for the detection of prohibited items during security screening. We have compared the images generated by BagGAN with those created by deterministic ray-tracing models for X-ray simulation and have observed that our GAN-based baggage simulator yields a significantly improved performance in terms of image fidelity and diversity. The BagGAN framework is also tested on the PIDRay and other baggage screening benchmarks to produce segmentation results comparable to their respective baseline segmenters based on manual annotations.</p> Read more Computer vision Adversarial machine learning Deep learning Generative Adversarial Networks (GANs) Self-Supervised Learning Image Segmentation One-Shot Learning X-ray imaging and computed tomography
26	Analysis of Brain Signals from Patients with Parkinson’s Disease using Self-Supervised Learning / Analys av hjärnsignaler från patienter med parkinsons sjukdom med hjälp av självövervakad inlärning Lind, Emma January 2022 (has links) Parkinson’s disease (PD) is one of the most common neurodegenerative brain disorders, commonly diagnosed and monitored via clinical examinations, which can be imprecise and lead to a delayed or inaccurate diagnosis. Therefore, recent research has focused on finding biomarkers by analyzing brain networks’ neural activity to find abnormalities associated with PD pathology. Brain signals can be measured using Magnetoencephalography (MEG) or Electroencephalogram (EEG), which have demonstrated their practical use in decoding neural activity. Nevertheless, interpreting and labeling human neural activity measured using MEG/EEG is yet a challenging task requiring vast of time and expertise. In addition, there is a risk of introducing bias or omitting important information not recognizable by humans. This thesis investigates whether it is possible to find meaningful features relevant to PD by uncovering the brain signals’ underlying structure using self-supervised learning (SSL), requiring no labels or hand-crafted features. Four experiments on one EEG and one MEG dataset were conducted to evaluate if the features found during the SSL were meaningful, including t-SNE, silhouette coefficient, Kolmogorov-Smirnov test, and classification performance. Additionally, transfer learning between the two datasets was tested. The SSL model, TS-TCC, was employed in this thesis due to its outstanding performance on two other EEGdatasets and its training efficiency. The evaluation of the EEG dataset inferred it was feasible to find meaningful features to distinguish PD from healthy controls to some extent using SSL. However, more investigations of reusing the features in a downstream task are needed. The evaluation of the MEG dataset did not reach the same satisfying result, the proposed reason, among others, was the amount of data. Lastly, transfer learning was unsuccessful in the setting of transforming knowledge from the EEG to the MEG dataset. / Parkinsons sjukdom är en av de mest förekommande neurodegenerativa hjärnsjukdomarna. Vanligtvis diagnostiseras och övervakas sjukdomen via kliniska undersökningar, dessa kan vara diffusa och leda till en fördröjd eller en felaktig diagnos. Den senaste forskning har därför fokuserat på att hitta nya biomarkörer, bland annat genom att analysera hjärnnätverkens neurala aktivitet för att hitta abnormiteter associerade med parkinsons patologi. Magnetoencefalografi (MEG) och elektroencefalogram (EEG) har visat sig vara bra tekniker för att avkoda neural aktivitet och kan därmed användas för att mäta hjärnsignaler. Dessvärre är det en utmanande uppgift att tolka och märka hjärnsignaler, det kräver mycket tid och expertis. Det finns också en risk att märkningen inte blir helt objektiv eller att viktig information som inte är upptäckbar av människor utelämnas. Denna avhandling undersöker om det är möjligt att hitta meningsfulla särdrag relevanta för parkinsons sjukdom medhjälp av självövervakad inlärning (SSL), som varken kräver etiketter eller handgjorda särdrag. För att utvärdera om särdragen funna av SSL är meningsfulla utfördes fyra experiment på ett EEG och ett MEG-dataset. Experimenten inkluderade tSNE, siluettkoefficienten, Kolmogorov-Smirnov-testet och klassificeringsprestanda. Dessutom utvärderades möjligheten att överföra särdrag mellan de två dataseten för att nå bättre resultat. TS-TCC användes som SSL modell i denna avhandling på grund av dess prestanda på två andra EEG-dataset och dess effektivitet när det kommer till träning. Utvärderingen av EEG-datat visade på att det var möjligt att hitta meningsfulla särdrag för att till viss del skilja patienter från friska kontroller. Däremot så behövs vidare undersökning av användandet av särdragen i en klassificerare. Utvärderingen av MEG-datat nådde inte samma tillfredsställande resultat; anledningen kan bland annat vara mängden data. Slutligen, det var inte möjligt att överföra särdrag mellan EEG och MEG-datat för att nå ett bättre resultat. Read more Machine Learning Self-supervised learning Feature extraction Parkinson’s Disease Magnetoencephalography Electroencephalogram Maskininlärning Självlärande inlärning Särdragsextraktion Parkinsons sjukdom Magnetoencefalografi Elektroencefalografi Computer Sciences Datavetenskap (datalogi)
27	Label-Efficient Visual Understanding with Consistency Constraints Zou, Yuliang 24 May 2022 (has links) Modern deep neural networks are proficient at solving various visual recognition and understanding tasks, as long as a sufficiently large labeled dataset is available during the training time. However, the progress of these visual tasks is limited by the number of manual annotations. On the other hand, it is usually time-consuming and error-prone to annotate visual data, rendering the challenge of scaling up human labeling for many visual tasks. Fortunately, it is easy to collect large-scale, diverse unlabeled visual data from the Internet. And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how to utilize the unlabeled data and synthetic labeled data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea is to encourage deep neural networks to produce consistent predictions across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose to use the consistency over different geometric formulations and a cycle consistency over time to tackle the low-level scene geometry perception tasks in a self-supervised learning setting. In Part II, we tackle the high-level semantic understanding tasks in a semi-supervised learning setting, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains with one single forward pass, without model training or optimization at the inference time. / Doctor of Philosophy / Recently, deep learning has emerged as one of the most powerful tools to solve various visual understanding tasks. However, the development of deep learning methods is significantly limited by the amount of manually labeled data. On the other hand, it is usually time-consuming and error-prone to annotate visual data, making the human labeling process not easily scalable. Fortunately, it is easy to collect large-scale, diverse raw visual data from the Internet (\eg search engines, YouTube, Instagram, etc.). And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how we can utilize the raw visual data and synthetic data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea behind this is to encourage deep neural networks to produce consistent predictions of the same visual input across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose using the consistency over different geometric formulations and a forward-backward cycle consistency over time to tackle the low-level scene geometry perception tasks, using unlabeled visual data only. In Part II, we tackle the high-level semantic understanding tasks using both a small amount of labeled data and a large amount of unlabeled data jointly, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains. Read more Label-Efficient Consistency Regularization Visual Understanding Self-Supervised Learning Semi-Supervised Learning Pseudo Labeling Test-Time Adaptation BatchNorm Calibration Cross-Domain Generalization
28	Deep Convolutional Denoising for MicroCT : A Self-Supervised Approach / Brusreducering för mikroCT med djupa faltningsnätverk : En självövervakad metod Karlström, Daniel January 2024 (has links) Microtomography, or microCT, is an x-ray imaging modality that provides volumetric data of an object's internal structure with microscale resolution, making it suitable for scanning small, highly detailed objects. The microCT image quality is limited by quantum noise, which can be reduced by increasing the scan time. This complicates the scanning both of dynamic processes and, due to the increased radiation dose, dose-sensitive samples. A recently proposed method for improved dose- or time-limited scanning is Noise2Inverse, a framework for denoising data in tomography and linear inverse problems by training a self-supervised convolutional neural network. This work implements Noise2Inverse for denoising lab-based cone-beam microCT data and compares it to both supervised neural networks and more traditional filtering methods. While some trade-off in spatial resolution is observed, the method outperforms traditional filtering methods and matches supervised denoising in quantitative and qualitative evaluations of image quality. Additionally, a segmentation task is performed to show that denoising the data can aid in practical tasks. / Mikrotomografi, eller mikroCT, är en röntgenmetod som avbildar små objekt i tre dimensioner med upplösning på mikrometernivå, vilket möjligör avbildning av små och högdetaljerade objekt. Bildkvaliteten vid mikroCT begränsas av kvantbrus, vilket kan minskas genom att öka skanningstiden. Detta försvårar avbildning av dynamiska processer och, på grund av den ökade stråldosen, doskänsliga objekt. En metod som tros kunna förbättra dos- eller tidsbegränsad avbildning är Noise2Inverse, ett ramverk för brusreducering av tomografisk data genom träning av ett självövervakat faltningsnätverk, och jämförs med både övervakade neuronnät och mer traditionella filtermetoder. Noise2Inverse implementaras i detta arbete för brusreducering av data från ett labb-baserat mikroCT-system med cone beam-geometri. En viss reducering i spatiell upplösning observeras, men metoden överträffar traditionella filtermetoder och matchar övervakade neuronnät i kvantitativa och kvalitativa utvärderingar av bildkvalitet. Dessutom visas att metoden går att använda för att förbätta resultat från bildsegmentering. Read more X-ray tomography Deep learning Image denoising Self-supervised learning Linear inverse problems Röntgentomografi Djupinlärning Bildbrusreducering Självövervakad inlärning Linjära inversa problem Physical Sciences Fysik
29	[pt] AJUSTE FINO DE MODELO AUTO-SUPERVISIONADO USANDO REDES NEURAIS SIAMESAS PARA CLASSIFICAÇÃO DE IMAGENS DE COVID-19 / [en] FINE-TUNING SELF-SUPERVISED MODEL WITH SIAMESE NEURAL NETWORKS FOR COVID-19 IMAGE CLASSIFICATION ANTONIO MOREIRA PINTO 03 December 2024 (has links) [pt] Nos últimos anos, o aprendizado auto-supervisionado demonstrou desempenho estado da arte em áreas como visão computacional e processamento de linguagem natural. No entanto, ajustar esses modelos para tarefas específicas de classificação, especialmente com dados rotulados, permanece sendo um desafio. Esta dissertação apresenta uma abordagem para ajuste fino de modelos auto-supervisionados usando Redes Neurais Siamesas, aproveitando a função de perda semi-hard triplet loss. Nosso método visa refinar as representações do espaço latente dos modelos auto-supervisionados para melhorar seu desempenho em tarefas posteriores de classificação. O framework proposto emprega Masked Autoencoders para pré-treinamento em um conjunto abrangente de dados de radiografias, seguido de ajuste fino com redes siamesas para separação eficaz de características e melhor classificação. A abordagem é avaliada no conjunto de dados COVIDx 9 para detecção de COVID-19 a partir de radiografias frontais de peito, alcançando uma nova precisão recorde de 98,5 por cento, superando as técnicas tradicionais de ajuste fino e o modelo COVID-Net CRX 3. Os resultados demonstram a eficácia de nosso método em aumentar a utilidade de modelos auto-supervisionados para tarefas complexas de imagem médica. Trabalhos futuros explorarão a escalabilidade dessa abordagem para outros domínios e a integração de funções de perda de espaço de embedding mais sofisticadas. / [en] In recent years, self-supervised learning has demonstrated state-of-theart performance in domains such as computer vision and natural language processing. However, fine-tuning these models for specific classification tasks, particularly with labeled data, remains challenging. This thesis introduces a novel approach to fine-tuning self-supervised models using Siamese Neural Networks, specifically leveraging a semi-hard triplet loss function. Our method aims to refine the latent space representations of self-supervised models to improve their performance on downstream classification tasks. The proposed framework employs Masked Autoencoders for pre-training on a comprehensive radiograph dataset, followed by fine-tuning with Siamese networks for effective feature separation and improved classification. The approach is evaluated on the COVIDx dataset for COVID-19 detection from frontal chest radiographs, achieving a new record accuracy of 98.5 percent, surpassing traditional fine-tuning techniques and COVID-Net CRX 3. The results demonstrate the effectiveness of our method in enhancing the utility of self-supervised models for complex medical imaging tasks. Future work will explore the scalability of this approach to other domains and the integration of more sophisticated embedding-space loss functions. Read more [pt] RADIOGRAFIA [pt] REDES NEURAIS SIAMESAS [pt] MASKED AUTOENCODER [pt] APRENDIZADO AUTO-SUPERVISIONADO [en] RADIOGRAPHY [en] SIAMESE NEURAL NETWORKS [en] MASKED AUTOENCODER [en] SELF-SUPERVISED LEARNING
30	Semantic Segmentation of Remote Sensing Data using Self-Supervised Learning Wallin, Emma, Åhlander, Rebecka January 2024 (has links) Semantic segmentation is the process of assigning a specific class label to each pixel in an image. There are multiple areas of use for semantic segmentation of remote sensing images, including climate change studies and urban planning and development. When training a network to perform semantic segmentation in a supervised manner, annotated data is crucial, and annotating satellite images is an expensive and time-consuming task. A resolution to this issue might be self-supervised learning. Training a pretext task on a large unlabeled dataset, and a downstream task on a smaller labeled dataset, could mitigate the need for large amounts of labeled data. In this thesis, the use of self-supervised learning for semantic segmentation of remote sensing data is investigated and compared to the traditional use of supervised pre-training using ImageNet. Two different methods of self-supervised learning are evaluated, a reconstructive method and a contrastive method. Furthermore, whether including modalities unique to remote sensing data yields greater performance for semantic segmentation is investigated. The findings indicate that self-supervised learning with in-domain data shows significant potential. While the performance of models pre-trained using self-supervised learning on remote sensing data, does not surpass that of pre-trained models using supervised learning on ImageNet, it achieves a comparable level. This is notable given the substantially smaller training data used. However, in cases where the in-domain dataset is small — as in this thesis with approximately 20,000 images — leveraging ImageNet for pre-training is preferable. Furthermore, self-supervised learning demonstrates promise as a more effective pre-training approach compared to supervised learning, when both methods are trained on ImageNet. The reconstructive method proves more suitable for semantic segmentation of remote sensing data compared to the contrastive method, and incorporating modalities unique to remote sensing further enhances performance. Read more Machine Learning Deep Learning Satellite Imagery Remote Sensing Data Self-supervised Learning Semantic Segmentation

Search results