• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 2
  • 1
  • 1
  • Tagged with
  • 62
  • 62
  • 62
  • 32
  • 29
  • 22
  • 20
  • 17
  • 16
  • 16
  • 16
  • 15
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Messing With The Gap: On The Modality Gap Phenomenon In Multimodal Contrastive Representation Learning

Al-Jaff, Mohammad January 2023 (has links)
In machine learning, a sub-field of computer science, a two-tower architecture model is a specialised type of neural network model that encodes paired data from different modalities (like text and images, sound and video, or proteomics and gene expression profiles) into a shared latent representation space. However, when training these models using a specific contrastive loss function, known as the multimodalinfoNCE loss, seems to often lead to a unique geometric phenomenon known as the modality gap. This gap is a clear geometric separation of the embeddings of the modalities in the joint contrastive latent space. This thesis investigates the modality gap in multimodal machine learning, specifically in two-tower neural networks trained with multimodal-infoNCE loss. We examine the adequacy of the current definition of the modality gap, the conditions under which the modality gap phenomenon manifests, and its impact on representation quality and downstream task performance. The approach to address these questions consists of a two-phase experimental strategy. Phase I involves a series of experiments, ranging from toy synthetic simulations to true multimodal machine learning with complex datasets, to explore and characterise the modality gap under varying conditions. Phase II focuses on modifying the modality gap and analysing representation quality, evaluating different loss functions and their impact on the modality gap. This methodical exploration allows us to systematically dissect the emergence and implications of the modality gap phenomenon, providing insights into its impact on downstream tasks, measured with proxy metrics based on semantic clustering in the shared latent representation space and modality-specific linear probe evaluation. Our findings reveal that the modality gap definition proposed by W. Liang et al. 2022, is insufficient. We demonstrate that similar modality gap magnitudes can exhibit varying linear separability between modality embeddings in the contrastive latent space and varying embedding topologies, indicating the need for additional metrics to capture the true essence of the gap. Furthermore, our experiments show that the temperature hyperparameter in the multimodal infoNCE loss function plays a crucial role in the emergence of the modality gap, and this effect varies with different data sets. This suggests that individual dataset characteristics significantly influence the modality gap's manifestation. A key finding is the consistent emergence of modality gaps with small temperature settings in the fixed temperature mode of the loss function and almost invariably under learned temperature mode settings, regardless of the initial temperature value. Additionally, we observe that the magnitude of the modality gap is influenced by distribution shifts, with the gap magnitude increasing progressively from the training set to the validation set, then to the test set, and finally to more distributionally shifted datasets. We discover that the choice of contrastive learning method, temperature settings, and temperature values is crucial in influencing the modality gap. However, reducing the gap does not consistently improve downstream task performance, suggesting its role may be more nuanced than previously understood. This insight indicates that the modality gap might be a geometric by-product of the learning methods rather than a critical determinant of representation quality. Our results encourage the need to reevaluate the modality gap's significance in multimodal contrastive learning, emphasising the importance of dataset characteristics and contrastive learning methodology.
52

[pt] APRENDIZADO SEMI E AUTO-SUPERVISIONADO APLICADO À CLASSIFICAÇÃO MULTI-LABEL DE IMAGENS DE INSPEÇÕES SUBMARINAS / [en] SEMI AND SELF-SUPERVISED LEARNING APPLIED TO THE MULTI-LABEL CLASSIFICATION OF UNDERWATER INSPECTION IMAGE

AMANDA LUCAS PEREIRA 11 July 2023 (has links)
[pt] O segmento offshore de produção de petróleo é o principal produtor nacional desse insumo. Nesse contexto, inspeções submarinas são cruciais para a manutenção preventiva dos equipamentos, que permanecem toda a vida útil em ambiente oceânico. A partir dos dados de imagem e sensor coletados nessas inspeções, especialistas são capazes de prevenir e reparar eventuais danos. Tal processo é profundamente complexo, demorado e custoso, já que profissionais especializados têm que assistir a horas de vídeos atentos a detalhes. Neste cenário, o presente trabalho explora o uso de modelos de classificação de imagens projetados para auxiliar os especialistas a encontrarem o(s) evento(s) de interesse nos vídeos de inspeções submarinas. Esses modelos podem ser embarcados no ROV ou na plataforma para realizar inferência em tempo real, o que pode acelerar o ROV, diminuindo o tempo de inspeção e gerando uma grande redução nos custos de inspeção. No entanto, existem alguns desafios inerentes ao problema de classificação de imagens de inspeção submarina, tais como: dados rotulados balanceados são caros e escassos; presença de ruído entre os dados; alta variância intraclasse; e características físicas da água que geram certas especificidades nas imagens capturadas. Portanto, modelos supervisionados tradicionais podem não ser capazes de cumprir a tarefa. Motivado por esses desafios, busca-se solucionar o problema de classificação de imagens submarinas a partir da utilização de modelos que requerem menos supervisão durante o seu treinamento. Neste trabalho, são explorados os métodos DINO (Self-DIstillation with NO labels, auto-supervisionado) e uma nova versão multi-label proposta para o PAWS (Predicting View Assignments With Support Samples, semi-supervisionado), que chamamos de mPAWS (multi-label PAWS). Os modelos são avaliados com base em sua performance como extratores de features para o treinamento de um classificador simples, formado por uma camada densa. Nos experimentos realizados, para uma mesma arquitetura, se obteve uma performance que supera em 2.7 por cento o f1-score do equivalente supervisionado. / [en] The offshore oil production segment is the main national producer of this input. In this context, underwater inspections are crucial for the preventive maintenance of equipment, which remains in the ocean environment for its entire useful life. From the image and sensor data collected in these inspections,experts are able to prevent and repair damage. Such a process is deeply complex, time-consuming and costly, as specialized professionals have to watch hours of videos attentive to details. In this scenario, the present work explores the use of image classification models designed to help experts to find the event(s) of interest in under water inspection videos. These models can be embedded in the ROV or on the platform to perform real-time inference,which can speed up the ROV, monitor notification time, and greatly reduce verification costs. However, there are some challenges inherent to the problem of classification of images of armored submarines, such as: balanced labeled data are expensive and scarce; the presence of noise among the data; high intraclass variance; and some physical characteristics of the water that achieved certain specificities in the captured images. Therefore, traditional supervised models may not be able to fulfill the task. Motivated by these challenges, we seek to solve the underwater image classification problem using models that require less supervision during their training. In this work, they are explorers of the DINO methods (Self-Distillation with NO labels, self-supervised) anda new multi-label version proposed for PAWS (Predicting View AssignmentsWith Support Samples, semi-supervised), which we propose as mPAWS (multi-label PAWS). The models are evaluated based on their performance as features extractors for training a simple classifier, formed by a dense layer. In the experiments carried out, for the same architecture, a performance was obtained that exceeds by 2.7 percent the f1-score of the supervised equivalent.
53

Multilingual Speech Emotion Recognition using pretrained models powered by Self-Supervised Learning / Flerspråkig känsloigenkänning från tal med hjälp av förtränade tal-modeller baserat på själv-övervakad Inlärning

Luthman, Felix January 2022 (has links)
Society is based on communication, for which speech is the most prevalent medium. In day to day interactions we talk to each other, but it is not only the words spoken that matters, but the emotional delivery as well. Extracting emotion from speech has therefore become a topic of research in the area of speech tasks. This area as a whole has in recent years adopted a Self- Supervised Learning approach for learning speech representations from raw speech audio, without the need for any supplementary labelling. These speech representations can be leveraged for solving tasks limited by the availability of annotated data, be it for low-resource language, or a general lack of data for the task itself. This thesis aims to evaluate the performances of a set of pre-trained speech models by fine-tuning them in different multilingual environments, and evaluating their performance thereafter. The model presented in this paper is based on wav2vec 2.0 and manages to correctly classify 86.58% of samples over eight different languages and four emotional classes when trained on those same languages. Experiments were conducted to garner how well a model trained on seven languages would perform on the one left out, which showed that there is quite a large margin of similarity in how different cultures express vocal emotions, and further investigations showed that as little as just a few minutes of in-domain data is able to increase the performance substantially. This shows promising results even for niche languages, as the amount of available data may not be as large of a hurdle as one might think. With that said, increasing the amount of data from minutes to hours does still garner substantial improvements, albeit to a lesser degree. / Hela vårt samhälle är byggt på kommunikation mellan olika människor, varav tal är det vanligaste mediet. På en daglig basis interagerar vi genom att prata med varandra, men det är inte bara orden som förmedlar våra intentioner, utan även hur vi uttrycker dem. Till exempel kan samma mening ge helt olika intryck beroende på ifall den sägs med ett argt eller glatt tonfall. Talbaserad forskning är ett stort vetenskapligt område i vilket talbaserad känsloigenkänning vuxit fram. Detta stora tal-område har under de senaste åren sett en tendens att utnyttja en teknik kallad själv-övervakad inlärning för att utnyttja omärkt ljuddata för att lära sig generella språkrepresentationer, vilket kan liknas vid att lära sig strukturen av tal. Dessa representationer, eller förtränade modeller, kan sedan utnyttjas som en bas för att lösa problem med begränsad tillgång till märkt data, vilket kan vara fallet för sällsynta språk eller unika uppgifter. Målet med denna rapport är att utvärdera olika applikationer av denna representations inlärning i en flerspråkig miljö genom att finjustera förtränade modeller för känsloigenkänning. I detta syfte presenterar vi en modell baserad på wav2vec 2.0 som lyckas klassifiera 86.58% av ljudklipp tagna från åtta olika språk över fyra olika känslo-klasser, efter att modellen tränats på dessa språk. För att avgöra hur bra en modell kan klassifiera data från ett språk den inte tränats på skapades modeller tränade på sju språk, och evaluerades sedan på det språk som var kvar. Dessa experiment visar att sättet vi uttrycker känslor mellan olika kulturer är tillräckligt lika för att modellen ska prestera acceptabelt även i det fall då modellen inte sett språket under träningsfasen. Den sista undersökningen utforskar hur olika mängd data från ett språk påverkar prestandan på det språket, och visar att så lite som endast ett par minuter data kan förbättra resultet nämnvärt, vilket är lovande för att utvidga modellen för fler språk i framtiden. Med det sagt är ytterligare data att föredra, då detta medför fortsatta förbättringar, om än i en lägre grad.
54

Feature extraction from MEG data using self-supervised learning : Investigating contrastive representation learning methods to f ind informative representations / Särdragsextrahering från MEG data med självövervakad inlärning : Undersökning av kontrastiv representationsinlärning för att hitta informativa representationer

Ågren, Wilhelm January 2022 (has links)
Modern day society is vastly complex, with information and data constantly being posted, shared, and collected everywhere. There is often an abundance of massive amounts of unlabeled data that can not be leveraged in a supervised machine learning context. Thus, there exists an incentive to research and develop machine learning methods which can learn without labels. Selfsupervised learning (SSL) is a newly emerged machine learning paradigm that aims to learn representations that can later be used in domain specific downstream tasks. In this degree project three SSL models based on the Simple Framework for Contrastive Learning of Visual Representations (SimCLR) are evaluated. Each model aims to learn sleep deprivation related representations on magnetoencephalography (MEG) measurements. MEG is a non-invasive neuroimaging technique that is used on humans to investigate neuronal activity. The data was acquired through a collaboration with Karolinska Institutet and Stockholm University, where the SLEMEG project was conducted to study the neurophysiological response to partial sleep deprivation. The features extracted by the SSL-models are analyzed both qualitatively and quantitatively, and also used to perform classification and regression tasks on subject labels. The results show that the evaluated Signal- and Recording SimCLR models can learn sleep deprivation related features, whilst simultaneously learning other co-occuring information also. Furthermore, the results indicate that the learned representations are informative and can be utilized for multiple downstream tasks. However, it is noted that what has been learned is mostly related to subject-specific individual variance, which leads to poor generalization performance on classification and regression downstream tasks. Thus, it is believed that the models would perform better with access to more MEG data, and that source localized MEG data could remove part of the individual variance that is learned. / Den moderna dagens samhälle är enormt komplext, information och data blir konstant postat, delat, och insamlat överallt. På grund av det så finns det ofta ett överflöd av massiva mängder omärkt data some inte kan användas i ett övervakat maskininlärnings-sammanhang. Därmed finns det ett incitament att forska om och utveckla maskininlärningsmetoder som kan lära modeller utan tillgång till märkningar. Självövervakad inlärning (SSL) är en modern metod som nyligen har fått mycket fokus, vars mål är att lära sig representationer av datat som sedan kan användas i domänspecifika nedströmsuppgifter. I det här examensarbetet så är tre SSL metoder evaluerade där de alla strävar efter att lära sig representationer relaterat till sömndeprivering på magnetoencefalografi (MEG) mätningar. MEG är en icke-invasiv metod som används på människor för att undersöka neuronal aktivitet. Datat var förvärvat genom ett sammarbeta med Karolinska Institutet och Stockholms Universitet, där SLEMEG studien hade blivit genomförd för att studera neurofysiologisk respons på sömndeprivering. De av SSL-modellerna extraherade särdragen är analyserade både kvalitativt samt kvantitativt, och sedan använda för att genomföra klassificerings och regressions-uppgifter. Resultaten visar på att de evaluerade Signal- och Recording SimCLR modellerna kan lära sig särdrag relaterade till sömndepriverad, men samtidigt också lära sig annan samförekommande information. Dessutom så indikerar resultaten att de lärda representationerna är informativa och kan då användas i flera olika nedströmsuppgifter. Dock så noteras det att det som blivit inlärt är mestadels relaterat till individ-specifik varians, vilket leder till dålig generaliseringsprestanda. Således är det trott att modellerna hade presterat bättre med tillgång till mer MEG data, samt att källlokalisering av MEG datat hade kunnat ta bort en del av den individuella variansen som blir inlärd.
55

Using Satellite Images And Self-supervised Deep Learning To Detect Water Hidden Under Vegetation / Använda satellitbilder och Självövervakad Deep Learning Till Upptäck vatten gömt under Vegetation

Iakovidis, Ioannis January 2024 (has links)
In recent years the wide availability of high-resolution satellite images has made the remote monitoring of water resources all over the world possible. While the detection of open water from satellite images is relatively easy, a significant percentage of the water extent of wetlands is covered by vegetation. Convolutional Neural Networks have shown great success in the task of detecting wetlands in satellite images. However, these models require large amounts of manually annotated satellite images, which are slow and expensive to produce. In this paper we use self-supervised training methods to train a Convolutional Neural Network to detect water from satellite images without the use of annotated data. We use a combination of deep clustering and negative sampling based on the paper ”Unsupervised Single-Scene Semantic Segmentation for Earth Observation”, and we expand the paper by changing the clustering loss, the model architecture and implementing an ensemble model. Our final ensemble of self-supervised models outperforms a single supervised model, showing the power of self-supervision. / Under de senaste åren har den breda tillgången på högupplösta satellitbilder möjliggjort fjärrövervakning av vattenresurser över hela världen. Även om det är relativt enkelt att upptäcka öppet vatten från satellitbilder, täcks en betydande andel av våtmarkernas vattenutbredning av vegetation. Lyckligtvis kan radarsignaler tränga igenom vegetation, vilket gör det möjligt för oss att upptäcka vatten gömt under vegetation från satellitradarbilder. Under de senaste åren har Convolutional Neural Networks visat stor framgång i denna uppgift. Tyvärr kräver dessa modeller stora mängder manuellt annoterade satellitbilder, vilket är långsamt och dyrt att producera. Självövervakad inlärning är ett område inom maskininlärning som syftar till att träna modeller utan användning av annoterade data. I den här artikeln använder vi självövervakad träningsmetoder för att träna en Convolutional Neural Network-baserad modell för att detektera vatten från satellitbilder utan användning av annoterade data. Vi använder en kombination av djup klustring och kontrastivt lärande baserat på artikeln ”Unsupervised Single-Scene Semantic Segmentation for Earth Observation”. Dessutom utökar vi uppsatsen genom att modifiera klustringsförlusten och modellarkitekturen som används. Efter att ha observerat hög varians i våra modellers prestanda implementerade vi också en ensemblevariant av vår modell för att få mer konsekventa resultat. Vår slutliga ensemble av självövervakade modeller överträffar en enda övervakad modell, vilket visar kraften i självövervakning.
56

Multi-brain decoding for precision psychiatry

Ranjbaran, Ghazaleh 04 1900 (has links)
Le trouble du spectre de l'autisme (TSA) est un trouble neurodéveloppemental caractérisé par des interactions sociales atypiques. L’hyperscanning est une technique émergente permettant l'enregistrement simultané de l'activité cérébrale de plusieurs individus lors d'interactions sociales. Dans cette étude, des données d'EEG hyperscanning issues de participants autistes et neurotypiques seront traitées par des techniques d’apprentissage profond (AP), améliorées par l'apprentissage auto-supervisé (AAS) pour analyser et discerner des schémas indicatifs de TSA. L'AP est utilisé pour extraire des schémas à partir des données brutes de l'EEG, réduisant la dépendance à l'ingénierie de caractéristiques manuelles, puis l’AAS est appliqué aux des données EEG non étiquetées. Cependant, malgré le potentiel des techniques d’AP, leur application au TSA reste largement inexplorée, notamment en hyperscanning. Afin de combler cette lacune, nous avons adapté et personnalisé des techniques d'AAS proposée par Banville et al., (2020), en incorporant deux encodeurs AP distincts entraînés pour extraire des caractéristiques significatives à partir de données EEG individuelles, et affinés dans un modèle d’AP de classificateur binaire. Des comparaisons ont été réalisées avec des encodeurs initialement aléatoires et des caractéristiques extraites manuellement des données EEG utilisées comme entrées pour un modèle de régression logistique. Le classificateur binaire entraîné sur des caractéristiques apprises par AAS surpasse systématiquement le classificateur de régression logistique et les encodeurs initialisés aléatoirement, atteignant une précision de 78 %, comparable à la performance la plus élevée rapportée par Banville et al. (2020) de 79,4 %. Nos résultats soulignent l'importance des représentations acquises à partir de signaux EEG individuels dans l'architecture multi-cerveaux adaptée à la classification d’EEG hyperscanning. Cette étude encourage ainsi l’utilisation des modèles d’AP dans les analyses d’EEG hyperscanning, notamment pour le développement d'outils de diagnostic et d'interventions plus précis et efficaces pour les personnes autistes, et ce même avec un nombre limité d'échantillons de données. / Autism spectrum condition (ASC) is a neurodevelopmental condition characterized by atypical social interactions. Traditional research on ASC has primarily focused on individual brain signals, but the emerging technique of hyperscanning enables simultaneous recording of multiple individuals' brain activity during social interactions. In this study, we leverage hyperscanning EEG data and employ Deep Learning (DL) techniques, augmented by self-supervised learning (SSL), to analyze and discern patterns indicative of ASC. DL is utilized to extract patterns from raw EEG data, reducing the reliance on manual feature engineering. SSL further enhances DL's efficacy by training on unlabeled EEG data, particularly useful when labeled datasets are limited. Despite the potential of DL techniques, their application in ASC diagnosis and treatment, particularly in hyperscanning, remains largely unexplored. This project aimed to bridge this gap by analyzing hyperscanning EEG data from autistic and neurotypical participants. Specifically, we adapted and customized SSL techniques proposed by Banville et al., incorporating two distinct DL embedders. These embedders are trained to extract meaningful features from single-brain EEG data and finetuned within a binary classifier DL model using hyperscanning EEG data from autistic and control dyads. Baseline comparisons were conducted with supervised, randomly initialized embedders, and hand-engineered features extracted from hyperscanning EEG using as inputs to a logistic regression model. Notably, the binary classifier trained on SSL-learned features consistently outperforms the logistic regression classifier and randomly initialized embedders, achieving an accuracy of 78%. This accuracy is comparable to Banville et al.'s highest reported performance of 79.4%. Our results underscore the significance of representations acquired from individual EEG signals within the multi-brain architecture tailored for hyperscanning EEG classification. Moreover, they hold promise for broader utilization of DL models in hyperscanning EEG analyses, especially for developing more accurate and efficient diagnostic tools and interventions for autistic individuals, even with limited data samples available.
57

Segmentace lézí roztroušené sklerózy pomocí hlubokých neuronových sítí / Segmentation of multiple sclerosis lesions using deep neural networks

Sasko, Dominik January 2021 (has links)
Hlavným zámerom tejto diplomovej práce bola automatická segmentácia lézií sklerózy multiplex na snímkoch MRI. V rámci práce boli otestované najnovšie metódy segmentácie s využitím hlbokých neurónových sietí a porovnané prístupy inicializácie váh sietí pomocou preneseného učenia (transfer learning) a samoriadeného učenia (self-supervised learning). Samotný problém automatickej segmentácie lézií sklerózy multiplex je veľmi náročný, a to primárne kvôli vysokej nevyváženosti datasetu (skeny mozgov zvyčajne obsahujú len malé množstvo poškodeného tkaniva). Ďalšou výzvou je manuálna anotácia týchto lézií, nakoľko dvaja rozdielni doktori môžu označiť iné časti mozgu ako poškodené a hodnota Dice Coefficient týchto anotácií je približne 0,86. Možnosť zjednodušenia procesu anotovania lézií automatizáciou by mohlo zlepšiť výpočet množstva lézií, čo by mohlo viesť k zlepšeniu diagnostiky individuálnych pacientov. Našim cieľom bolo navrhnutie dvoch techník využívajúcich transfer learning na predtrénovanie váh, ktoré by neskôr mohli zlepšiť výsledky terajších segmentačných modelov. Teoretická časť opisuje rozdelenie umelej inteligencie, strojového učenia a hlbokých neurónových sietí a ich využitie pri segmentácii obrazu. Následne je popísaná skleróza multiplex, jej typy, symptómy, diagnostika a liečba. Praktická časť začína predspracovaním dát. Najprv boli skeny mozgu upravené na rovnaké rozlíšenie s rovnakou veľkosťou voxelu. Dôvodom tejto úpravy bolo využitie troch odlišných datasetov, v ktorých boli skeny vytvárané rozličnými prístrojmi od rôznych výrobcov. Jeden dataset taktiež obsahoval lebku, a tak bolo nutné jej odstránenie pomocou nástroju FSL pre ponechanie samotného mozgu pacienta. Využívali sme 3D skeny (FLAIR, T1 a T2 modality), ktoré boli postupne rozdelené na individuálne 2D rezy a použité na vstup neurónovej siete s enkodér-dekodér architektúrou. Dataset na trénovanie obsahoval 6720 rezov s rozlíšením 192 x 192 pixelov (po odstránení rezov, ktorých maska neobsahovala žiadnu hodnotu). Využitá loss funkcia bola Combo loss (kombinácia Dice Loss s upravenou Cross-Entropy). Prvá metóda sa zameriavala na využitie predtrénovaných váh z ImageNet datasetu na enkodér U-Net architektúry so zamknutými váhami enkodéra, resp. bez zamknutia a následného porovnania s náhodnou inicializáciou váh. V tomto prípade sme použili len FLAIR modalitu. Transfer learning dokázalo zvýšiť sledovanú metriku z hodnoty približne 0,4 na 0,6. Rozdiel medzi zamknutými a nezamknutými váhami enkodéru sa pohyboval okolo 0,02. Druhá navrhnutá technika používala self-supervised kontext enkodér s Generative Adversarial Networks (GAN) na predtrénovanie váh. Táto sieť využívala všetky tri spomenuté modality aj s prázdnymi rezmi masiek (spolu 23040 obrázkov). Úlohou GAN siete bolo dotvoriť sken mozgu, ktorý bol prekrytý čiernou maskou v tvare šachovnice. Takto naučené váhy boli následne načítané do enkodéru na aplikáciu na náš segmentačný problém. Tento experiment nevykazoval lepšie výsledky, s hodnotou DSC 0,29 a 0,09 (nezamknuté a zamknuté váhy enkodéru). Prudké zníženie metriky mohlo byť spôsobené použitím predtrénovaných váh na vzdialených problémoch (segmentácia a self-supervised kontext enkodér), ako aj zložitosť úlohy kvôli nevyváženému datasetu.
58

Unsupervised representation learning in interactive environments

Racah, Evan 08 1900 (has links)
Extraire une représentation de tous les facteurs de haut niveau de l'état d'un agent à partir d'informations sensorielles de bas niveau est une tâche importante, mais difficile, dans l'apprentissage automatique. Dans ce memoire, nous explorerons plusieurs approches non supervisées pour apprendre ces représentations. Nous appliquons et analysons des méthodes d'apprentissage de représentations non supervisées existantes dans des environnements d'apprentissage par renforcement, et nous apportons notre propre suite d'évaluations et notre propre méthode novatrice d'apprentissage de représentations d'état. Dans le premier chapitre de ce travail, nous passerons en revue et motiverons l'apprentissage non supervisé de représentations pour l'apprentissage automatique en général et pour l'apprentissage par renforcement. Nous introduirons ensuite un sous-domaine relativement nouveau de l'apprentissage de représentations : l'apprentissage auto-supervisé. Nous aborderons ensuite deux approches fondamentales de l'apprentissage de représentations, les méthodes génératives et les méthodes discriminatives. Plus précisément, nous nous concentrerons sur une collection de méthodes discriminantes d'apprentissage de représentations, appelées méthodes contrastives d'apprentissage de représentations non supervisées (CURL). Nous terminerons le premier chapitre en détaillant diverses approches pour évaluer l'utilité des représentations. Dans le deuxième chapitre, nous présenterons un article de workshop dans lequel nous évaluons un ensemble de méthodes d'auto-supervision standards pour les problèmes d'apprentissage par renforcement. Nous découvrons que la performance de ces représentations dépend fortement de la dynamique et de la structure de l'environnement. À ce titre, nous déterminons qu'une étude plus systématique des environnements et des méthodes est nécessaire. Notre troisième chapitre couvre notre deuxième article, Unsupervised State Representation Learning in Atari, où nous essayons d'effectuer une étude plus approfondie des méthodes d'apprentissage de représentations en apprentissage par renforcement, comme expliqué dans le deuxième chapitre. Pour faciliter une évaluation plus approfondie des représentations en apprentissage par renforcement, nous introduisons une suite de 22 jeux Atari entièrement labellisés. De plus, nous choisissons de comparer les méthodes d'apprentissage de représentations de façon plus systématique, en nous concentrant sur une comparaison entre méthodes génératives et méthodes contrastives, plutôt que les méthodes générales du deuxième chapitre choisies de façon moins systématique. Enfin, nous introduisons une nouvelle méthode contrastive, ST-DIM, qui excelle sur ces 22 jeux Atari. / Extracting a representation of all the high-level factors of an agent’s state from level-level sensory information is an important, but challenging task in machine learning. In this thesis, we will explore several unsupervised approaches for learning these state representations. We apply and analyze existing unsupervised representation learning methods in reinforcement learning environments, as well as contribute our own evaluation benchmark and our own novel state representation learning method. In the first chapter, we will overview and motivate unsupervised representation learning for machine learning in general and for reinforcement learning. We will then introduce a relatively new subfield of representation learning: self-supervised learning. We will then cover two core representation learning approaches, generative methods and discriminative methods. Specifically, we will focus on a collection of discriminative representation learning methods called contrastive unsupervised representation learning (CURL) methods. We will close the first chapter by detailing various approaches for evaluating the usefulness of representations. In the second chapter, we will present a workshop paper, where we evaluate a handful of off-the-shelf self-supervised methods in reinforcement learning problems. We discover that the performance of these representations depends heavily on the dynamics and visual structure of the environment. As such, we determine that a more systematic study of environments and methods is required. Our third chapter covers our second article, Unsupervised State Representation Learning in Atari, where we try to execute a more thorough study of representation learning methods in RL as motivated by the second chapter. To facilitate a more thorough evaluation of representations in RL we introduce a benchmark of 22 fully labelled Atari games. In addition, we choose the representation learning methods for comparison in a more systematic way by focusing on comparing generative methods with contrastive methods, instead of the less systematically chosen off-the-shelf methods from the second chapter. Finally, we introduce a new contrastive method, ST-DIM, which excels at the 22 Atari games.
59

Teaching an AI to recycle by looking at scrap metal : Semantic segmentation through self-supervised learning with transformers / Lär en AI att källsortera genom att kolla på metallskrot

Forsberg, Edwin, Harris, Carl January 2022 (has links)
Stena Recycling is one of the leading recycling companies in Sweden and at their facility in Halmstad, 300 tonnes of refuse are handled every day where aluminium is one of the most valuable materials they sort. Today, most of the sorting process is done automatically, but there are still parts of the refuse that are not correctly sorted. Approximately 4\% of the aluminium is currently not properly sorted and goes to waste. Earlier works have investigated using machine vision to help in the sorting process at Stena Recycling. However, consistently through all these previous works, there is a problem in gathering enough annotated data to train the machine learning models. This thesis aims to investigate how machine vision could be used in the recycling process and if pre-training models using self-supervised learning can alleviate the problem of gathering annotated data and yield an improvement. The results show that machine vision models could viably be used in an information system to assist operators. This thesis also shows that pre-training models with self-supervised learning may yield a small increase in performance. Furthermore, we show that models pre-trained using self-supervised learning also appear to transfer the knowledge learned from images created in a lab environment to images taken at the recycling plant.
60

Domain adaptation in reinforcement learning via causal representation learning

Côté-Turcotte, Léa 07 1900 (has links)
Les progrès récents en apprentissage par renforcement ont été substantiels, mais ils dépendent souvent de l'accès à l'état. Un état est un ensemble d'informations qui fournit une description concise et complète de l'environnement, englobant tous les détails pertinents nécessaires pour que l'agent puisse prendre des décisions éclairées. Cependant, de telles données détaillées sont rarement disponibles dans les situations réelles. Les images offrent une forme de données plus réaliste et accessible, mais leur complexité pose d'importants défis dans le développement de politiques robustes et efficaces. Les méthodes d'apprentissage de représentation se sont révélées prometteuses pour améliorer l'efficacité des politiques basées sur les données de pixels. Néanmoins, les politiques peinent toujours à généraliser à de nouveaux domaines, rendant l'application de l'apprentissage par renforcement basé sur les pixels impraticable pour des scénarios du monde réel. Cela souligne le besoin urgent de s'attaquer à l'adaptation de domaine dans l'apprentissage par renforcement basé sur les pixels. Cette thèse examine le potentiel de l'apprentissage de représentation causale pour améliorer l'adaptation de domaine dans l'apprentissage par renforcement. L'idée sous-jacente est que pour que les agents s'adaptent efficacement à de nouveaux domaines, ils doivent être capables d'extraire des informations de haut niveau à partir de données brutes et de comprendre les dynamiques causales qui régulent l'environnement. Pour étudier cela, nous évaluons quatre algorithmes distincts d'apprentissage de représentation causale, chacun conçu pour capturer un niveau de structure plus détaillé dans l'espace latent, évaluant leur impact sur la performance d'adaptation de domaine. Le processus implique d'abord d'apprendre une représentation causale puis de former l'agent d'apprentissage par renforcement sur cette représentation. La performance d'adaptation de domaine de ces agents est évaluée dans deux environnements de conduite autonome : CarRacing et CARLA. Nos résultats soutiennent que l'apprentissage d'une représentation latente améliore nettement l'efficacité et la robustesse dans l'apprentissage par renforcement basé sur les pixels. De plus, ils indiquent qu'apprendre une structure causale dans l'espace latent contribue à une meilleure performance d'adaptation de domaine. Cependant, la promesse de la représentation causale pour améliorer l'adaptation de domaine est tempérée par leurs demandes computationnelles substantielles. De plus, lorsque des observations de plusieurs domaines sont disponibles, cette approche ne dépasse pas l'efficacité des méthodes plus simples. Nous avons également trouvé que les agents entraînés sur des représentations qui conservent toutes les informations de l'espace latent ont tendance à surpasser les autres, suggérant que les représentations dissociées sont préférables aux représentations invariantes. / Recent advancements in reinforcement learning have been substantial, but they often depend on access to the state. A state is a set of information that provides a concise and complete description of the environment, encompassing all relevant details necessary for the agent to make informed decisions. However, such detailed data is rarely available in real-world settings. Images present a more realistic and accessible data form, but their complexity introduces considerable challenges in developing robust and efficient policies. Representation learning methods have shown promise in enhancing the efficiency of policies based on pixel data. Nonetheless, policies continue to struggle to generalize to new domains, making the application of pixel-based reinforcement learning impractical for real-world scenarios. This highlights the urgent need to address domain adaptation in pixel-based reinforcement learning. This thesis investigates the potential of causal representation learning in improving domain adaptation in reinforcement learning. The underlying premise is that for reinforcement learning agents to adapt to new domains effectively, they must be able to extract high-level information from raw data and comprehend the causal dynamics that regulate the environment. We evaluate four distinct causal representation learning algorithms, each aimed at uncovering a more intricate level of structure within the latent space, to assess their impact on domain adaptation performance. This involves first learning a causal representation, followed by training the reinforcement learning agent on this representation. The domain adaptation performance of these agents is evaluated within two autonomous driving environments: CarRacing and CARLA. Our results support that learning a latent representation enhances efficiency and robustness in pixel-based RL. Moreover, it indicates that understanding complex causal structures in the latent space leads to improved domain adaptation performance. However, the promise of advanced causal representation in augmenting domain adaptation is tempered by its substantial computational demands. Additionally, when observations from multiple domains are available, this approach does not exceed the effectiveness of simpler methods. We also found that agents trained on representations that retain all information tend to outperform others, suggesting that disentangled representations are preferable to invariant representations.

Page generated in 0.0705 seconds