Global ETD Search

41	Emergence of language-like latents in deep neural networks Lu, Yuchen 05 1900 (has links) L'émergence du langage est considérée comme l'une des marques de l'intelligence humaine. Par conséquent, nous émettons l'hypothèse que l'émergence de latences ou de représentations similaires au langage dans un système d'apprentissage profond pourrait aider les modèles à obtenir une meilleure généralisation compositionnelle et hors distribution. Dans cette thèse, nous présentons une série d'articles qui explorent cette hypothèse dans différents domaines, notamment l'apprentissage interactif du langage, l'apprentissage par imitation et la vision par ordinateur. / The emergence of language is regarded as one of the hallmarks of human intelligence. Therefore, we hypothesize that the emergence of language-like latents or representations in a deep learning system could help models achieve better compositional and out-of-distribution generalization. In this thesis, we present a series of papers that explores this hypothesis in different fields including interactive language learning, imitation learning and computer vision. Deep Learning Language Emergence Compositionality Imitation Learning Self-supervised Learning Apprentissage Profond Émergence du Langage Compositionnalité Apprentissage par Imitation Apprentissage Auto-supervisé
42	Leveraging self-supervision for visual embodied navigation with neuralized potential fields Saavedra Ruiz, Miguel Angel 05 1900 (has links) Une tâche fondamentale en robotique consiste à naviguer entre deux endroits. En particulier, la navigation dans le monde réel nécessite une planification à long terme à l'aide d'images RVB (RGB) en haute dimension, ce qui constitue un défi considérable pour les approches d'apprentissage de bout-en-bout. Les méthodes semi-paramétriques actuelles parviennent plutôt à atteindre des objectifs éloignés en combinant des modèles paramétriques avec une mémoire topologique de l'environnement, souvent représentée sous forme d'un graphe ayant pour nœuds des images précédemment vues. Cependant, l'utilisation de ces graphes implique généralement l'ajustement d'heuristiques d'élagage afin d'éviter les arêtes superflues, limiter la mémoire requise et permettre des recherches raisonnablement rapides dans le graphe. Dans cet ouvrage, nous montrons comment les approches de bout-en-bout basées sur l'apprentissage auto-supervisé peuvent exceller dans des tâches de navigation à long terme. Nous présentons initialement Duckie-Former (DF), une approche de bout-en-bout pour la navigation visuelle dans des environnements routiers. En utilisant un Vision Transformer (ViT) pré-entraîné avec une méthode auto-supervisée, nous nous inspirons des champs de potentiels afin de dériver une stratégie de navigation utilisant en entrée un masque de segmentation d'image de faible résolution. DF est évalué dans des tâches de navigation de suivi de voie et d'évitement d'obstacles. Nous présentons ensuite notre deuxième approche intitulée One-4-All (O4A). O4A utilise l'apprentissage auto-supervisé et l'apprentissage de variétés afin de créer un pipeline de navigation de bout-en-bout sans graphe permettant de spécifier l'objectif à l'aide d'une image. La navigation est réalisée en minimisant de manière vorace une fonction de potentiel définie de manière continue dans l'espace latent O4A. Les deux systèmes sont entraînés sans interagir avec le simulateur ou le robot sur des séquences d'exploration de données RVB et de contrôles non experts. Ils ne nécessitent aucune mesure de profondeur ou de pose. L'évaluation est effectuée dans des environnements simulés et réels en utilisant un robot à entraînement différentiel. / A fundamental task in robotics is to navigate between two locations. Particularly, real-world navigation can require long-horizon planning using high-dimensional RGB images, which poses a substantial challenge for end-to-end learning-based approaches. Current semi-parametric methods instead achieve long-horizon navigation by combining learned modules with a topological memory of the environment, often represented as a graph over previously collected images. However, using these graphs in practice typically involves tuning various pruning heuristics to prevent spurious edges, limit runtime memory usage, and allow reasonably fast graph queries. In this work, we show how end-to-end approaches trained through Self-Supervised Learning (SSL) can excel in long-horizon navigation tasks. We initially present Duckie-Former (DF), an end-to-end approach for visual servoing in road-like environments. Using a Vision Transformer (ViT) pretrained with a self-supervised method, we derive a potential-fields-like navigation strategy based on a coarse image segmentation model. DF is assessed in the navigation tasks of lane-following and obstacle avoidance. Subsequently, we introduce our second approach called One-4-All (O4A). O4A leverages SSL and manifold learning to create a graph-free, end-to-end navigation pipeline whose goal is specified as an image. Navigation is achieved by greedily minimizing a potential function defined continuously over the O4A latent space. O4A is evaluated in complex indoor environments. Both systems are trained offline on non-expert exploration sequences of RGB data and controls, and do not require any depth or pose measurements. Assessment is performed in simulated and real-world environments using a differential-drive robot. Navigation visuelle Apprentissage auto-supervisé Champs de potentiel Apprentissage de variétés Robotique Visual navigation Self-supervised learning Potential fields Manifold learning Robotics
43	Self-supervised pre-training of an attention-based model for 3D medical image segmentation / Självövervakad förberedande träning av en attention-baserad model för 3D medicinsk bildsegmentering Sund Aillet, Albert January 2023 (has links) Accurate segmentation of anatomical structures is crucial for radiation therapy in cancer treatment. Deep learning methods have been demonstrated effective for segmentation of 3D medical images, establishing the current standard. However, they require large amounts of labelled data and suffer from reduced performance on domain shift. A possible solution to these challenges is self-supervised learning, that uses unlabelled data to learn representations, which could possibly reduce the need for labelled data and produce more robust segmentation models. This thesis investigates the impact of self-supervised pre-training on an attention-based model for 3D medical image segmentation, specifically focusing on single-organ semantic segmentation, exploring whether self-supervised pre-training enhances the segmentation performance on CT scans with and without domain shift. The Swin UNETR is chosen as the deep learning model since it has been shown to be a successful attention-based architecture for semantic segmentation. During the pre-training stage, the contracting path is trained for three self-supervised pretext tasks using a large dataset of 5 465 unlabelled CT scans. The model is then fine-tuned using labelled datasets with 97, 142 and 288 segmentations of the stomach, the sternum and the pancreas. The results indicate that a substantial performance gain from self-supervised pre-training is not evident. Parameter freezing of the contracting path suggest that the representational power of the contracting path is not as critical for model performance as expected. Decreasing the amount of supervised training data shows that while the pre-training improves model performance when the amount of training data is restricted, the improvements are strongly decreased when more supervised training data is used. / Noggrann segmentering av anatomiska strukturer är avgörande för strålbehandling inom cancervården. Djupinlärningmetoder har visat sig vara effektiva och utgör standard för segmentering av 3D medicinska bilder. Dessa metoder kräver däremot stora mängder märkt data och kännetecknas av lägre prestanda vid domänskift. Eftersom självövervakade inlärningsmetoder använder icke-märkt data för inlärning, kan de möjligen minska behovet av märkt data och producera mer robusta segmenteringsmodeller. Denna uppsats undersöker effekten av självövervakad förberedande träning av en attention-baserad modell för 3D medicinsk bildsegmentering, med särskilt fokus på semantisk segmentering av enskilda organ. Syftet är att studera om självövervakad förberedande träning förbättrar segmenteringsprestandan utan respektive med domänskift. Swin UNETR har valts som djupinlärningsmodell eftersom den har visat sig vara en framgångsrik attention-baserad arkitektur för semantisk segmentering. Under den förberedande träningsfasen optimeras modellens kontraherande del med 5 465 icke-märkta CT-scanningar. Modellen tränas sedan på märkta dataset med 97, 142 och 288 segmenterade skanningar av magen, bröstbenet och bukspottkörteln. Resultaten visar att prestandaökningen från självövervakad förberedande träning inte är tydlig. Parameterfrysning av den kontraherande delen visar att dess representationer inte lika avgörande för segmenteringsprestandan som förväntat. Minskning av mängden träningsdata tyder på att även om den förberedande träningen förbättrar modellens prestanda när mängden träningsdata är begränsad, minskas förbättringarna betydligt när mer träningsdata används. Computer vision Deep learning 3D Medical image segmentation Self-supervised learning Datorseende Djupinlärning 3D Medicinsk bildsegmentering Självövervakad träning Computer and Information Sciences Data- och informationsvetenskap
44	SELF-SUPERVISED ONE-SHOT LEARNING FOR AUTOMATIC SEGMENTATION OF GAN-GENERATED IMAGES Ankit V Manerikar (16523988) 11 July 2023 (has links) <p>Generative Adversarial Networks (GANs) have consistently defined the state-of-the-art in the generative modelling of high-quality images in several applications. The images generated using GANs, however, do not lend themselves to being directly used in supervised learning tasks without first being curated through annotations. This dissertation investigates how to carry out automatic on-the-fly segmentation of GAN-generated images and how this can be applied to the problem of producing high-quality simulated data for X-ray based security screening. The research exploits the hidden layer properties of GAN models in a self-supervised learning framework for the automatic one-shot segmentation of images created by a style-based GAN. The framework consists of a novel contrastive learner that is based on a Sinkhorn distance-based clustering algorithm and that learns a compact feature space for per-pixel classification of the GAN-generated images. This facilitates faster learning of the feature vectors for one-shot segmentation and allows on-the-fly automatic annotation of the GAN images. We have tested our framework on a number of standard benchmarks (CelebA, PASCAL, LSUN) to yield a segmentation performance that not only exceeds the semi-supervised baselines by an average wIoU margin of 1.02 % but also improves the inference speeds by a factor of 4.5. This dissertation also presents BagGAN, an extension of our framework to the problem domain of X-ray based baggage screening. BagGAN produces annotated synthetic baggage X-ray scans to train machine-learning algorithms for the detection of prohibited items during security screening. We have compared the images generated by BagGAN with those created by deterministic ray-tracing models for X-ray simulation and have observed that our GAN-based baggage simulator yields a significantly improved performance in terms of image fidelity and diversity. The BagGAN framework is also tested on the PIDRay and other baggage screening benchmarks to produce segmentation results comparable to their respective baseline segmenters based on manual annotations.</p> Computer vision Adversarial machine learning Deep learning Generative Adversarial Networks (GANs) Self-Supervised Learning Image Segmentation One-Shot Learning X-ray imaging and computed tomography
45	Self-supervised Learning for Efficient Object Detection / Självövervakat lärande för effektiv Objektdetektering Berta, Benjamin István January 2021 (has links) Self-supervised learning has become a prominent approach in pre-training Convolutional Neural Networks for computer vision. These methods are able to achieve state-of-the-art representation learning with unlabeled datasets. In this thesis, we apply Self-supervised Learning to the object detection problem. Previous methods have used large networks that are not suitable for embedded applications, so our goal was to train lightweight networks that can reach the accuracy of supervised learning. We used MoCo as a baseline for pre-training a ResNet-18 encoder and finetuned it on the COCO object detection task using a RetinaNet object detector. We evaluated our method based on the COCO evaluation metric with several additions to the baseline method. Our results show that lightweight networks can be trained by self-supervised learning and reach the accuracy of the supervised learning pre-training. / Självledd inlärning har blivit ett framträdande tillvägagångssätt vid träning av ”Convolutional Neural Networks” för datorseende. Dessa metoder kan uppnå topp prestanda med representationsinlärning med omärkta datamängder. I det här examensarbetet tillämpar vi Självledd inlärning på objektdetekteringsproblemet. Tidigare metoder har använt stora nätverk som inte är lämpliga för inbyggda applikationer, så vårt mål var att träna lättviktsnätverk som kan nå noggrannheten av ett tränat nätverk. Vi använde MoCo som basnivå för träning av en ResNet-18-kodare och finjusterade den på COCO-objektdetekteringsuppgiften med hjälp av en RetinaNet-objektdetektor. Vi utvärderade vår metod baserat på COCO-utvärderingsmåttet med flera tillägg till baslinjemetoden. Våra resultat visar att lättviktsnätverk kan tränas genom självledd inlärning och uppnå samma precisionen som för ett tränat nätverk. Self-supervised Learning Object Detection Computer Vision Contrastive Learning Deep Learning Självövervakat lärande Objektdetektering Datorsyn Contrastive Learning Deep Learning Computer and Information Sciences Data- och informationsvetenskap
46	Analysis of Brain Signals from Patients with Parkinson’s Disease using Self-Supervised Learning / Analys av hjärnsignaler från patienter med parkinsons sjukdom med hjälp av självövervakad inlärning Lind, Emma January 2022 (has links) Parkinson’s disease (PD) is one of the most common neurodegenerative brain disorders, commonly diagnosed and monitored via clinical examinations, which can be imprecise and lead to a delayed or inaccurate diagnosis. Therefore, recent research has focused on finding biomarkers by analyzing brain networks’ neural activity to find abnormalities associated with PD pathology. Brain signals can be measured using Magnetoencephalography (MEG) or Electroencephalogram (EEG), which have demonstrated their practical use in decoding neural activity. Nevertheless, interpreting and labeling human neural activity measured using MEG/EEG is yet a challenging task requiring vast of time and expertise. In addition, there is a risk of introducing bias or omitting important information not recognizable by humans. This thesis investigates whether it is possible to find meaningful features relevant to PD by uncovering the brain signals’ underlying structure using self-supervised learning (SSL), requiring no labels or hand-crafted features. Four experiments on one EEG and one MEG dataset were conducted to evaluate if the features found during the SSL were meaningful, including t-SNE, silhouette coefficient, Kolmogorov-Smirnov test, and classification performance. Additionally, transfer learning between the two datasets was tested. The SSL model, TS-TCC, was employed in this thesis due to its outstanding performance on two other EEGdatasets and its training efficiency. The evaluation of the EEG dataset inferred it was feasible to find meaningful features to distinguish PD from healthy controls to some extent using SSL. However, more investigations of reusing the features in a downstream task are needed. The evaluation of the MEG dataset did not reach the same satisfying result, the proposed reason, among others, was the amount of data. Lastly, transfer learning was unsuccessful in the setting of transforming knowledge from the EEG to the MEG dataset. / Parkinsons sjukdom är en av de mest förekommande neurodegenerativa hjärnsjukdomarna. Vanligtvis diagnostiseras och övervakas sjukdomen via kliniska undersökningar, dessa kan vara diffusa och leda till en fördröjd eller en felaktig diagnos. Den senaste forskning har därför fokuserat på att hitta nya biomarkörer, bland annat genom att analysera hjärnnätverkens neurala aktivitet för att hitta abnormiteter associerade med parkinsons patologi. Magnetoencefalografi (MEG) och elektroencefalogram (EEG) har visat sig vara bra tekniker för att avkoda neural aktivitet och kan därmed användas för att mäta hjärnsignaler. Dessvärre är det en utmanande uppgift att tolka och märka hjärnsignaler, det kräver mycket tid och expertis. Det finns också en risk att märkningen inte blir helt objektiv eller att viktig information som inte är upptäckbar av människor utelämnas. Denna avhandling undersöker om det är möjligt att hitta meningsfulla särdrag relevanta för parkinsons sjukdom medhjälp av självövervakad inlärning (SSL), som varken kräver etiketter eller handgjorda särdrag. För att utvärdera om särdragen funna av SSL är meningsfulla utfördes fyra experiment på ett EEG och ett MEG-dataset. Experimenten inkluderade tSNE, siluettkoefficienten, Kolmogorov-Smirnov-testet och klassificeringsprestanda. Dessutom utvärderades möjligheten att överföra särdrag mellan de två dataseten för att nå bättre resultat. TS-TCC användes som SSL modell i denna avhandling på grund av dess prestanda på två andra EEG-dataset och dess effektivitet när det kommer till träning. Utvärderingen av EEG-datat visade på att det var möjligt att hitta meningsfulla särdrag för att till viss del skilja patienter från friska kontroller. Däremot så behövs vidare undersökning av användandet av särdragen i en klassificerare. Utvärderingen av MEG-datat nådde inte samma tillfredsställande resultat; anledningen kan bland annat vara mängden data. Slutligen, det var inte möjligt att överföra särdrag mellan EEG och MEG-datat för att nå ett bättre resultat. Machine Learning Self-supervised learning Feature extraction Parkinson’s Disease Magnetoencephalography Electroencephalogram Maskininlärning Självlärande inlärning Särdragsextraktion Parkinsons sjukdom Magnetoencefalografi Elektroencefalografi Computer Sciences Datavetenskap (datalogi)
47	Label-Efficient Visual Understanding with Consistency Constraints Zou, Yuliang 24 May 2022 (has links) Modern deep neural networks are proficient at solving various visual recognition and understanding tasks, as long as a sufficiently large labeled dataset is available during the training time. However, the progress of these visual tasks is limited by the number of manual annotations. On the other hand, it is usually time-consuming and error-prone to annotate visual data, rendering the challenge of scaling up human labeling for many visual tasks. Fortunately, it is easy to collect large-scale, diverse unlabeled visual data from the Internet. And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how to utilize the unlabeled data and synthetic labeled data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea is to encourage deep neural networks to produce consistent predictions across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose to use the consistency over different geometric formulations and a cycle consistency over time to tackle the low-level scene geometry perception tasks in a self-supervised learning setting. In Part II, we tackle the high-level semantic understanding tasks in a semi-supervised learning setting, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains with one single forward pass, without model training or optimization at the inference time. / Doctor of Philosophy / Recently, deep learning has emerged as one of the most powerful tools to solve various visual understanding tasks. However, the development of deep learning methods is significantly limited by the amount of manually labeled data. On the other hand, it is usually time-consuming and error-prone to annotate visual data, making the human labeling process not easily scalable. Fortunately, it is easy to collect large-scale, diverse raw visual data from the Internet (\eg search engines, YouTube, Instagram, etc.). And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how we can utilize the raw visual data and synthetic data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea behind this is to encourage deep neural networks to produce consistent predictions of the same visual input across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose using the consistency over different geometric formulations and a forward-backward cycle consistency over time to tackle the low-level scene geometry perception tasks, using unlabeled visual data only. In Part II, we tackle the high-level semantic understanding tasks using both a small amount of labeled data and a large amount of unlabeled data jointly, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains. Label-Efficient Consistency Regularization Visual Understanding Self-Supervised Learning Semi-Supervised Learning Pseudo Labeling Test-Time Adaptation BatchNorm Calibration Cross-Domain Generalization
48	Deep Convolutional Denoising for MicroCT : A Self-Supervised Approach / Brusreducering för mikroCT med djupa faltningsnätverk : En självövervakad metod Karlström, Daniel January 2024 (has links) Microtomography, or microCT, is an x-ray imaging modality that provides volumetric data of an object's internal structure with microscale resolution, making it suitable for scanning small, highly detailed objects. The microCT image quality is limited by quantum noise, which can be reduced by increasing the scan time. This complicates the scanning both of dynamic processes and, due to the increased radiation dose, dose-sensitive samples. A recently proposed method for improved dose- or time-limited scanning is Noise2Inverse, a framework for denoising data in tomography and linear inverse problems by training a self-supervised convolutional neural network. This work implements Noise2Inverse for denoising lab-based cone-beam microCT data and compares it to both supervised neural networks and more traditional filtering methods. While some trade-off in spatial resolution is observed, the method outperforms traditional filtering methods and matches supervised denoising in quantitative and qualitative evaluations of image quality. Additionally, a segmentation task is performed to show that denoising the data can aid in practical tasks. / Mikrotomografi, eller mikroCT, är en röntgenmetod som avbildar små objekt i tre dimensioner med upplösning på mikrometernivå, vilket möjligör avbildning av små och högdetaljerade objekt. Bildkvaliteten vid mikroCT begränsas av kvantbrus, vilket kan minskas genom att öka skanningstiden. Detta försvårar avbildning av dynamiska processer och, på grund av den ökade stråldosen, doskänsliga objekt. En metod som tros kunna förbättra dos- eller tidsbegränsad avbildning är Noise2Inverse, ett ramverk för brusreducering av tomografisk data genom träning av ett självövervakat faltningsnätverk, och jämförs med både övervakade neuronnät och mer traditionella filtermetoder. Noise2Inverse implementaras i detta arbete för brusreducering av data från ett labb-baserat mikroCT-system med cone beam-geometri. En viss reducering i spatiell upplösning observeras, men metoden överträffar traditionella filtermetoder och matchar övervakade neuronnät i kvantitativa och kvalitativa utvärderingar av bildkvalitet. Dessutom visas att metoden går att använda för att förbätta resultat från bildsegmentering. X-ray tomography Deep learning Image denoising Self-supervised learning Linear inverse problems Röntgentomografi Djupinlärning Bildbrusreducering Självövervakad inlärning Linjära inversa problem Physical Sciences Fysik
49	Semantic Segmentation of Remote Sensing Data using Self-Supervised Learning Wallin, Emma, Åhlander, Rebecka January 2024 (has links) Semantic segmentation is the process of assigning a specific class label to each pixel in an image. There are multiple areas of use for semantic segmentation of remote sensing images, including climate change studies and urban planning and development. When training a network to perform semantic segmentation in a supervised manner, annotated data is crucial, and annotating satellite images is an expensive and time-consuming task. A resolution to this issue might be self-supervised learning. Training a pretext task on a large unlabeled dataset, and a downstream task on a smaller labeled dataset, could mitigate the need for large amounts of labeled data. In this thesis, the use of self-supervised learning for semantic segmentation of remote sensing data is investigated and compared to the traditional use of supervised pre-training using ImageNet. Two different methods of self-supervised learning are evaluated, a reconstructive method and a contrastive method. Furthermore, whether including modalities unique to remote sensing data yields greater performance for semantic segmentation is investigated. The findings indicate that self-supervised learning with in-domain data shows significant potential. While the performance of models pre-trained using self-supervised learning on remote sensing data, does not surpass that of pre-trained models using supervised learning on ImageNet, it achieves a comparable level. This is notable given the substantially smaller training data used. However, in cases where the in-domain dataset is small — as in this thesis with approximately 20,000 images — leveraging ImageNet for pre-training is preferable. Furthermore, self-supervised learning demonstrates promise as a more effective pre-training approach compared to supervised learning, when both methods are trained on ImageNet. The reconstructive method proves more suitable for semantic segmentation of remote sensing data compared to the contrastive method, and incorporating modalities unique to remote sensing further enhances performance. Machine Learning Deep Learning Satellite Imagery Remote Sensing Data Self-supervised Learning Semantic Segmentation
50	Self-supervision for reinforcement learning Anand, Ankesh 03 1900 (has links) Cette thèse tente de construire de meilleurs agents d'apprentissage par renforcement (RL) en tirant parti de l'apprentissage auto-supervisé. Il se présente sous la forme d'une thèse par article qui contient trois travaux. Dans le premier article, nous construisons un benchmark basé sur les jeux Atari pour évaluer systématiquement les méthodes d'apprentissage auto-supervisé dans les environnements RL. Nous comparons un éventail de ces méthodes à travers une suite de tâches de sondage pour identifier leurs forces et leurs faiblesses. Nous montrons en outre qu'une nouvelle méthode contrastive ST-DIM excelle à capturer la plupart des facteurs génératifs dans les environnements étudiés, sans avoir besoin de s'appuyer sur des étiquettes ou des récompenses. Dans le deuxième article, nous proposons des représentations auto-prédictives (SPR) qui apprennent un modèle latent auto-supervisé de la dynamique de l'environnement parallèlement à la résolution de la tâche RL en cours. Nous montrons que SPR réalise des améliorations spectaculaires dans l'état de l'art sur le benchmark Atari 100k difficile où les agents n'ont droit qu'à 2 heures d'expérience en temps réel. Le troisième article étudie le rôle de la RL basée sur un modèle et de l'apprentissage auto-supervisé dans le contexte de la généralisation en RL. Grâce à des contrôles minutieux, nous montrons que la planification et l'apprentissage de représentation basé sur un modèle contribuent tous deux à une meilleure généralisation pour l'agent Muzero. Nous améliorons encore MuZero avec des objectifs d'apprentissage auto-supervisés auxiliaires, et montrons que cet agent MuZero++ obtient des résultats de pointe sur les benchmarks Procgen et Metaworld. / This thesis tries to build better Reinforcement Learning (RL) agents by leveraging self-supervised learning. It is presented as a thesis by article that contains three pieces of work. In the first article, we construct a benchmark based on Atari games to systematically evaluate self-supervised learning methods in RL environments. We compare an array of such methods across a suite of probing tasks to identify their strengths and weaknesses. We further show that a novel contrastive method ST-DIM excels at capturing most generative factors in the studied environments, without needing to rely on labels or rewards. In the second article, we propose Self-Predictive Representations (SPR) that learns a self-supervised latent model of the environment dynamics alongside solving the RL task at hand. We show that SPR achieves dramatic improvements in state-of-the-art on the challenging Atari 100k benchmark where agents are allowed only 2 hours of real-time experience. The third article studies the role of model-based RL and self-supervised learning in the context of generalization in RL. Through careful controls, we show that planning and model-based representation learning both contribute towards better generalization for the Muzero agent. We further improve MuZero with auxiliary self-supervised learning objectives, and show that this MuZero++ agent achieves state-of-the-art results on the Procgen and Metaworld benchmarks. Deep Learning Reinforcement Learning Self-Supervised Learning Apprentissage en profondeur Apprentissage auto-supervisé Apprentissage par renforcement

Search results