Global ETD Search

1	Understanding the Robustnessof Self Supervised Representations Rodahl Holmgren, Johan January 2023 (has links) This work investigates the robustness of learned representations of self-supervised learn-ing approaches, focusing on distribution shifts in computer vision. Joint embedding architecture and method-based self-supervised learning approaches have shown advancesin learning representations in a label-free manner and efficient knowledge transfer towardreducing human annotation needs. However, the empirical analysis is majorly limitedto the downstream task’s performance on natural scenes within the distribution. This constraint evaluation does not reflect the detailed comparative performance of learn-ing methods, preventing it from highlighting the limitations of these methods towards systematic improvement. This work evaluates the robustness of self-supervised learn-ing methods on the distribution shift and corrupted dataset ImageNet-C quantitatively and qualitatively. Several self-supervised learning approaches are considered for compre-hensiveness, including contrastive learning, knowledge distillation, mutual information maximization, and clustering. A detailed comparative analysis is presented to under-stand the retention of robustness against the varying severity of induced corruptions and noise present in data. This work provides insights into appropriate method selectionunder different conditions and highlights the limitations for future method development. Self-Supervision AI Robustness Computer-Vision Computer Systems Datorsystem
2	Learning without Expert Labels for Multimodal Data Maruf, Md Abdullah Al 09 January 2025 (has links) While advancements in deep learning have been largely possible due to the availability of large-scale labeled datasets, obtaining labeled datasets at the required granularity is challenging in many real-world applications, especially in scientific domains, due to the costly and labor-intensive nature of generating annotations. Hence, there is a need to develop new paradigms for learning that do not rely on expert-labeled data and can work even with indirect supervision. Approaches for learning with indirect supervision include unsupervised learning, self-supervised learning, weakly supervised learning, few-shot learning, and knowledge distillation. This thesis addresses these opportunities in the context of multi-modal data through three main contributions. First, this thesis proposes a novel Distance-aware Negative Sampling method for self-supervised Graph Representation Learning (GRL) that learns node representations directly from the graph structure by maximizing separation between distant nodes and maximizing cohesion among nearby nodes. Second, this thesis introduces effective modifications to weakly supervised semantic segmentation (WS3) models, such as stochastic aggregation to saliency maps that improve the learning of pseudo-ground truths from class-level coarse-grained labels and address the limitations of class activation maps. Finally, this thesis evaluates whether pre-trained Vision-Language Models (VLMs) contain the necessary scientific knowledge to identify and reason about biological traits from scientific images. The zero-shot performance of 12 large VLMs is evaluated on a novel VLM4Bio dataset, along with the effects of prompting and reasoning hallucinations are explored. / Doctor of Philosophy / While advancements in machine learning (ML), such as deep learning, have been largely possible due to the availability of large-scale labeled datasets, obtaining high-quality and high-resolution labels is challenging in many real-world applications due to the costly and labor-intensive nature of generating annotations. This thesis explores new ways of training ML models without relying heavily on expert-labeled data using indirect supervision. First, it introduces a novel way of using the structure of graphs for learning representations of graph-based data. Second, it analyzes the effect of weak supervision using coarse labels for image-based data. Third, it evaluates whether current ML models can recognize and reason about scientific images on their own, aiming to make learning more efficient and less dependent on exhaustive labeling. Deep Learning Knowledge-Guided Machine Learning Weak Supervision Self-Supervision Vision-Language Models
3	Unsupervised 3D Human Pose Estimation / Oövervakad mänsklig poseuppskattning i 3D Budaraju, Sri Datta January 2021 (has links) The thesis proposes an unsupervised representation learning method to predict 3D human pose from a 2D skeleton via a VAEGAN (Variational Autoencoder Generative Adversarial Network) hybrid network. The method learns to lift poses from 2D to 3D using selfsupervision and adversarial learning techniques. The method does not use images, heatmaps, 3D pose annotations, paired/unpaired 2Dto3D skeletons, 3D priors, synthetic 2D skeletons, multiview or temporal information in any shape or form. The 2D skeleton input is taken by a VAE that encodes it in a latent space and then decodes that latent representation to a 3D pose. The 3D pose is then reprojected to 2D for a constrained, selfsupervised optimization using the input 2D pose. Parallelly, the 3D pose is also randomly rotated and reprojected to 2D to generate a ’novel’ 2D view for unconstrained adversarial optimization using a discriminator network. The combination of the optimizations of the original and the novel 2D views of the predicted 3D pose results in a ’realistic’ 3D pose generation. The thesis shows that the encoding and decoding process of the VAE addresses the major challenge of erroneous and incomplete skeletons from 2D detection networks as inputs and that the variance of the VAE can be altered to get various plausible 3D poses for a given 2D input. Additionally, the latent representation could be used for crossmodal training and many downstream applications. The results on Human3.6M datasets outperform previous unsupervised approaches with less model complexity while addressing more hurdles in scaling the task to the real world. / Uppsatsen föreslår en oövervakad metod för representationslärande för att förutsäga en 3Dpose från ett 2D skelett med hjälp av ett VAE GAN (Variationellt Autoenkodande Generativt Adversariellt Nätverk) hybrid neuralt nätverk. Metoden lär sig att utvidga poser från 2D till 3D genom att använda självövervakning och adversariella inlärningstekniker. Metoden använder sig vare sig av bilder, värmekartor, 3D poseannotationer, parade/oparade 2D till 3D skelett, a priori information i 3D, syntetiska 2Dskelett, flera vyer, eller tidsinformation. 2Dskelettindata tas från ett VAE som kodar det i en latent rymd och sedan avkodar den latenta representationen till en 3Dpose. 3D posen är sedan återprojicerad till 2D för att genomgå begränsad, självövervakad optimering med hjälp av den tvådimensionella posen. Parallellt roteras dessutom 3Dposen slumpmässigt och återprojiceras till 2D för att generera en ny 2D vy för obegränsad adversariell optimering med hjälp av ett diskriminatornätverk. Kombinationen av optimeringarna av den ursprungliga och den nya 2Dvyn av den förutsagda 3Dposen resulterar i en realistisk 3Dposegenerering. Resultaten i uppsatsen visar att kodningsoch avkodningsprocessen av VAE adresserar utmaningen med felaktiga och ofullständiga skelett från 2D detekteringsnätverk som indata och att variansen av VAE kan modifieras för att få flera troliga 3D poser för givna 2D indata. Dessutom kan den latenta representationen användas för crossmodal träning och flera nedströmsapplikationer. Resultaten på datamängder från Human3.6M är bättre än tidigare oövervakade metoder med mindre modellkomplexitet samtidigt som de adresserar flera hinder för att skala upp uppgiften till verkliga tillämpningar. Computer Vision Projective Geometry Deep Learning Unsupervised Learning 3D Human Pose Estimation GAN AutoEncoder Hybrid Generative Model Self Supervision Computer and Information Sciences Data- och informationsvetenskap
4	Resource-efficient image segmentation using self-supervision and active learning Max, Muriel January 2021 (has links) Neural Networks have been demonstrated to perform well in computer vision tasks, especially in the field of semantic segmentation, where a classification is performed on a per pixel-level. Using deep learning can reduce time and effort in comparison to manual segmentation, however, the performance of neural networks highly depends on the data quality and quantity, which is costly and time-consuming to obtain; especially for image segmentation tasks. In this work, this problem is addressed by investigating a combined approach of self-supervised pre-training and active learning aimed at selecting the most informative training samples. Experiments were performed using the Gland Segmentation and BraTS 2020 datasets. The results indicate that active learning can increase performance for both datasets when only a small percentage of labeled data is used. Furthermore, self-supervised pre-training improves model robustness as well as in some cases additionally boosts model performance. / Neurala nätverk har visats fungera bra för att lösa visionsbasesarade problem med datorer, särskilt inom bildsegmentering, där operationer utförs på en per pixelnivå. Att använda djupinlärning kan minska tid och ansträngning jämfört med manuell segmentering. Prestandan för dessa metoder är dock beror på kvaliteten och kvantiteten på den tillgängliga datan, vilket är kostsamt och tidskrävande att få fram. I detta arbete behandlar vi problemet om kostsam dataannotering genom att undersöka mer effektiva tillvägagångssätt för att träna dessa modeller på mindre annoterad data genom en kombination av självövervakad förträning och active learning - som kan användas för att finna de mest informativa träningspunkterna. Experiment utfördes med hjälp av datasetten Gland Segmentation och BraTS 2020. Resultaten indikerar attactive learning kan öka prestandan för båda datamängderna när endast ett fåtal datapunkter har annoterats och används för träning. Dessutom förbättrar självövervakad pre-training modellens robusthet och kan i vissa fall öka modellprestandan. Image Segmentation Deep Learning Active Learning Self-supervision Pretraining Bildsegmentering Djupinlärning Active Learning självövervakad träning Pre-training Engineering and Technology Teknik och teknologier
5	Selfverwysing as supervisieproses : ontwikkeling van die interne supervisor Meyer, Gert Frederick 09 1900 (has links) Text in Afrikaans / Die objek van studie in hierdie proefskrif is ietwat ongewoon. Die studie is outobiografies en is gegrond op die aanname dat daar tussen die psigoterapeut, sy geskiedenis, die wetenskap (etnografie en tweede-orde kubernetika) en die klient(e) 'n unieke patroonverband bestaan. Die psigoterapeut in die platteland het weens afstand of finansies nie altyd die voorreg om supervisie van 'n eksterne supervisor te ontvang nie. In so 'n situasie kan selfsupervisie, deurdat dit 'n proses van selfontdekking is, 'n belangrike rol in die psigoterapeut se selfontwikkeling speel. Selfsupervisie plaas die klem op die psigoterapeut as persoon en as terapeut; oor wie hy is, waar hy vandaan kom en waarheen hy binne die psigoterapeutiese proses op pad is. Vir enige psigoterapeut is dit belangrik om na 'n hoer vlak van psigoterapie te streef. Hierdie strewe impliseer 'n proses van selfondersoek, delwing, selfevaluasie en disseksie. Hierdie proses is aan die hand van dagboekinskrywings gedoen, waar die psigoterapeut sy daaglikse ervarings en gebeurtenisse vanuit die verlede interpreteer het. Dit plaas die psigoterapeut as hoofspeler, met sy familiegeskiedenis en huidige interpersoonlike opset as inherente deel van sy mondering, op die voorgrond. In hierdie proses word die psigoterapeut sentraal geplaas met die klem op eie verantwoordelikheid met betrekking tot die proses van selfsupervisie. Die probleme wat deur hierdie persoonlik gekleurde, wetenskaplike studie aangespreek word, is probleme wat die psigoterapeut deur middel van sy selfsupervisie ge1dentifiseer het. So 'n selfondersoek lei tot 'n diepere selfkennis wat die psigoterapeut tot voordeel van homself, sy gesinsisteem en klientsisteem kan gebruik. Hierdie studie is 'n poging om 'n nuwe wyse van navorsing te identifiseer. Dit is omvattend en lei tot persoonlike vervulling asook diepere selfkennis en is 'n man waardeur ander psigoterapeute ook hulself en hul werelde kan ontdek. Dit is 'n stadige en pynlike proses. Hoofstukke 1 tot 4 is die teoretiese, wetenskaplike beredenering van die studie en hoofstukke 5 tot 12 is 'n uitbeelding van die geskiedenis van die psigoterapeut. Hoofstuk 13 plaas selfsupervisie as selfevalueringsmetode binne die psigoterapeutiese beroep. / The object of this study somewhat unusual. The study is an autobiography based on the assumption that there exists an unique patterned connection between the psychotherapist, his history, science (ethnography and second-order cybernetics), and his clients. Due to distance or financial problems, a rural psychotherapist cannot experience the privilege of supervision with an external supervisor. In such a situation selfsupervision could play an important role in the self-development, because it includes a search of self that will lead to more effectiveness in psychotherapy. Self-supervision focuses on the psychotherapist as a person and therapist, who he is, where he comes from, and in what direction he, as a psychotherapist, is developing within the psychotherapeutic process. It is important to any psychotherapist to strive towards a higher level of psychotherapy. This implies a process of selfinvestigation, dissection and self-evaluation. This process was conducted by means of diary entries in which the psychotherapist interpreted his daily experiences and events in terms of his past. This places the psychotherapist, with his family history and current interpersonal situation, as intrinsic parts of himself, in the foreground. In this process the psychotherapist takes centre stage with emphasis on his responsibility concerning the process of self-supervision. The problems addressed by this personally coloured, scientific study, are problems that the psychotherapist identified through the process of self-supervision and introspection. Such introspection leads to a deeper personal knowledge which the psychotherapist can use to his own benefit but also to the benefit of his family and client system. This study is an attempt to identify a new way of research. It is comprehensive and leads to personal fulfilment and deeper self-knowledge and is also a method by which other psychotherapists could discover themselves and their worlds. It is a slow and painful process. Chapters 1 to 4 comprise of the theoretical rationale of the study and chapters 5 to 12 depict the history of the psychotherapist. Chapter 13 situates self-supervision as a method of self-evaluation in the profession of psychotherapy. / Psychology / D. Litt. et Phil. Etnografie Familiegeskiedenis Familiesisteem Interaksionele patrone Metafore Professionele groei Psigoterapeut Outobiografies Selfevaluas Selfontdekking Selfsupervisie Selfverwysing Stories Transformasie Tweede-orde kubernetika Autobiography Ethnography Familyhistory Familysystem Interactional patterns Metaphors Psychotherapist Professional growth Second-order cybernetics Self-discovery Selfevaluation Stories Self-referetial Self-supervision Transformation 616.8914 Self-knowledge, Theory of Psychotherapists -- Supervision of Self-evaluation Self-actualization (Psychology)
6	Selfverwysing as supervisieproses : ontwikkeling van die interne supervisor Meyer, Gert Frederick 09 1900 (has links) Text in Afrikaans / Die objek van studie in hierdie proefskrif is ietwat ongewoon. Die studie is outobiografies en is gegrond op die aanname dat daar tussen die psigoterapeut, sy geskiedenis, die wetenskap (etnografie en tweede-orde kubernetika) en die klient(e) 'n unieke patroonverband bestaan. Die psigoterapeut in die platteland het weens afstand of finansies nie altyd die voorreg om supervisie van 'n eksterne supervisor te ontvang nie. In so 'n situasie kan selfsupervisie, deurdat dit 'n proses van selfontdekking is, 'n belangrike rol in die psigoterapeut se selfontwikkeling speel. Selfsupervisie plaas die klem op die psigoterapeut as persoon en as terapeut; oor wie hy is, waar hy vandaan kom en waarheen hy binne die psigoterapeutiese proses op pad is. Vir enige psigoterapeut is dit belangrik om na 'n hoer vlak van psigoterapie te streef. Hierdie strewe impliseer 'n proses van selfondersoek, delwing, selfevaluasie en disseksie. Hierdie proses is aan die hand van dagboekinskrywings gedoen, waar die psigoterapeut sy daaglikse ervarings en gebeurtenisse vanuit die verlede interpreteer het. Dit plaas die psigoterapeut as hoofspeler, met sy familiegeskiedenis en huidige interpersoonlike opset as inherente deel van sy mondering, op die voorgrond. In hierdie proses word die psigoterapeut sentraal geplaas met die klem op eie verantwoordelikheid met betrekking tot die proses van selfsupervisie. Die probleme wat deur hierdie persoonlik gekleurde, wetenskaplike studie aangespreek word, is probleme wat die psigoterapeut deur middel van sy selfsupervisie ge1dentifiseer het. So 'n selfondersoek lei tot 'n diepere selfkennis wat die psigoterapeut tot voordeel van homself, sy gesinsisteem en klientsisteem kan gebruik. Hierdie studie is 'n poging om 'n nuwe wyse van navorsing te identifiseer. Dit is omvattend en lei tot persoonlike vervulling asook diepere selfkennis en is 'n man waardeur ander psigoterapeute ook hulself en hul werelde kan ontdek. Dit is 'n stadige en pynlike proses. Hoofstukke 1 tot 4 is die teoretiese, wetenskaplike beredenering van die studie en hoofstukke 5 tot 12 is 'n uitbeelding van die geskiedenis van die psigoterapeut. Hoofstuk 13 plaas selfsupervisie as selfevalueringsmetode binne die psigoterapeutiese beroep. / The object of this study somewhat unusual. The study is an autobiography based on the assumption that there exists an unique patterned connection between the psychotherapist, his history, science (ethnography and second-order cybernetics), and his clients. Due to distance or financial problems, a rural psychotherapist cannot experience the privilege of supervision with an external supervisor. In such a situation selfsupervision could play an important role in the self-development, because it includes a search of self that will lead to more effectiveness in psychotherapy. Self-supervision focuses on the psychotherapist as a person and therapist, who he is, where he comes from, and in what direction he, as a psychotherapist, is developing within the psychotherapeutic process. It is important to any psychotherapist to strive towards a higher level of psychotherapy. This implies a process of selfinvestigation, dissection and self-evaluation. This process was conducted by means of diary entries in which the psychotherapist interpreted his daily experiences and events in terms of his past. This places the psychotherapist, with his family history and current interpersonal situation, as intrinsic parts of himself, in the foreground. In this process the psychotherapist takes centre stage with emphasis on his responsibility concerning the process of self-supervision. The problems addressed by this personally coloured, scientific study, are problems that the psychotherapist identified through the process of self-supervision and introspection. Such introspection leads to a deeper personal knowledge which the psychotherapist can use to his own benefit but also to the benefit of his family and client system. This study is an attempt to identify a new way of research. It is comprehensive and leads to personal fulfilment and deeper self-knowledge and is also a method by which other psychotherapists could discover themselves and their worlds. It is a slow and painful process. Chapters 1 to 4 comprise of the theoretical rationale of the study and chapters 5 to 12 depict the history of the psychotherapist. Chapter 13 situates self-supervision as a method of self-evaluation in the profession of psychotherapy. / Psychology / D. Litt. et Phil. Etnografie Familiegeskiedenis Familiesisteem Interaksionele patrone Metafore Professionele groei Psigoterapeut Outobiografies Selfevaluas Selfontdekking Selfsupervisie Selfverwysing Stories Transformasie Tweede-orde kubernetika Autobiography Ethnography Familyhistory Familysystem Interactional patterns Metaphors Psychotherapist Professional growth Second-order cybernetics Self-discovery Selfevaluation Stories Self-referetial Self-supervision Transformation 616.8914 Self-knowledge, Theory of Psychotherapists -- Supervision of Self-evaluation Self-actualization (Psychology)
7	Better representation learning for TPMS Raza, Amir 10 1900 (has links) Avec l’augmentation de la popularité de l’IA et de l’apprentissage automatique, le nombre de participants a explosé dans les conférences AI/ML. Le grand nombre d’articles soumis et la nature évolutive des sujets constituent des défis supplémentaires pour les systèmes d’évaluation par les pairs qui sont cruciaux pour nos communautés scientifiques. Certaines conférences ont évolué vers l’automatisation de l’attribution des examinateurs pour les soumissions, le TPMS [1] étant l’un de ces systèmes existants. Actuellement, TPMS prépare des profils de chercheurs et de soumissions basés sur le contenu, afin de modéliser l’adéquation des paires examinateur-soumission. Dans ce travail, nous explorons différentes approches pour le réglage fin auto-supervisé des transformateurs BERT pour les données des documents de conférence. Nous démontrons quelques nouvelles approches des vues d’augmentation pour l’auto-supervision dans le traitement du langage naturel, qui jusqu’à présent était davantage axée sur les problèmes de vision par ordinateur. Nous utilisons ensuite ces représentations d’articles individuels pour construire un modèle d’expertise qui apprend à combiner la représentation des différents travaux publiés d’un examinateur et à prédire leur pertinence pour l’examen d’un article soumis. Au final, nous montrons que de meilleures représentations individuelles des papiers et une meilleure modélisation de l’expertise conduisent à de meilleures performances dans la tâche de prédiction de l’adéquation de l’examinateur. / With the increase in popularity of AI and Machine learning, participation numbers have exploded in AI/ML conferences. The large number of submission papers and the evolving nature of topics constitute additional challenges for peer-review systems that are crucial for our scientific communities. Some conferences have moved towards automating the reviewer assignment for submissions, TPMS [1] being one such existing system. Currently, TPMS prepares content-based profiles of researchers and submission papers, to model the suitability of reviewer-submission pairs. In this work, we explore different approaches to self-supervised fine-tuning of BERT transformers for conference papers data. We demonstrate some new approaches to augmentation views for self-supervision in natural language processing, which till now has been more focused on problems in computer vision. We then use these individual paper representations for building an expertise model which learns to combine the representation of different published works of a reviewer and predict their relevance for reviewing a submission paper. In the end, we show that better individual paper representations and expertise modeling lead to better performance on the reviewer suitability prediction task. Machine learning Natural language processing Text representations BERT transformer Fine tuning Self-supervision Contrastive learning Automating peer-review Expertise Modelling Interest prediction l’apprentissage de la machine le traitement du langage naturel des représentations textuelles les transformateurs BERT réglage fin auto-surveillance Apprentissage contrasté modélisation expertise prévision d’intérêt évaluation par les pairs automatiser
8	Large state spaces and self-supervision in reinforcement learning Touati, Ahmed 08 1900 (has links) L'apprentissage par renforcement (RL) est un paradigme d'apprentissage orienté agent qui s'intéresse à l'apprentissage en interagissant avec un environnement incertain. Combiné à des réseaux de neurones profonds comme approximateur de fonction, l'apprentissage par renforcement profond (Deep RL) nous a permis récemment de nous attaquer à des tâches très complexes et de permettre à des agents artificiels de maîtriser des jeux classiques comme le Go, de jouer à des jeux vidéo à partir de pixels et de résoudre des tâches de contrôle robotique. Toutefois, un examen plus approfondi de ces remarquables succès empiriques révèle certaines limites fondamentales. Tout d'abord, il a été difficile de combiner les caractéristiques souhaitables des algorithmes RL, telles que l'apprentissage hors politique et en plusieurs étapes, et l'approximation de fonctions, de manière à obtenir des algorithmes stables et efficaces dans de grands espaces d'états. De plus, les algorithmes RL profonds ont tendance à être très inefficaces en raison des stratégies d'exploration-exploitation rudimentaires que ces approches emploient. Enfin, ils nécessitent une énorme quantité de données supervisées et finissent par produire un agent étroit capable de résoudre uniquement la tâche sur laquelle il est entrainé. Dans cette thèse, nous proposons de nouvelles solutions aux problèmes de l'apprentissage hors politique et du dilemme exploration-exploitation dans les grands espaces d'états, ainsi que de l'auto-supervision dans la RL. En ce qui concerne l'apprentissage hors politique, nous apportons deux contributions. Tout d'abord, pour le problème de l'évaluation des politiques, nous montrons que la combinaison des méthodes populaires d'apprentissage hors politique et à plusieurs étapes avec une paramétrisation linéaire de la fonction de valeur pourrait conduire à une instabilité indésirable, et nous dérivons une variante de ces méthodes dont la convergence est prouvée. Deuxièmement, pour l'optimisation des politiques, nous proposons de stabiliser l'étape d'amélioration des politiques par une régularisation de divergence hors politique qui contraint les distributions stationnaires d'états induites par des politiques consécutives à être proches les unes des autres. Ensuite, nous étudions l'apprentissage en ligne dans de grands espaces d'états et nous nous concentrons sur deux hypothèses structurelles pour rendre le problème traitable : les environnements lisses et linéaires. Pour les environnements lisses, nous proposons un algorithme en ligne efficace qui apprend activement un partitionnement adaptatif de l'espace commun en zoomant sur les régions les plus prometteuses et fréquemment visitées. Pour les environnements linéaires, nous étudions un cadre plus réaliste, où l'environnement peut maintenant évoluer dynamiquement et même de façon antagoniste au fil du temps, mais le changement total est toujours limité. Pour traiter ce cadre, nous proposons un algorithme en ligne efficace basé sur l'itération de valeur des moindres carrés pondérés. Il utilise des poids exponentiels pour oublier doucement les données qui sont loin dans le passé, ce qui pousse l'agent à continuer à explorer pour découvrir les changements. Enfin, au-delà du cadre classique du RL, nous considérons un agent qui interagit avec son environnement sans signal de récompense. Nous proposons d'apprendre une paire de représentations qui mettent en correspondance les paires état-action avec un certain espace latent. Pendant la phase non supervisée, ces représentations sont entraînées en utilisant des interactions sans récompense pour encoder les relations à longue portée entre les états et les actions, via une carte d'occupation prédictive. Au moment du test, lorsqu'une fonction de récompense est révélée, nous montrons que la politique optimale pour cette récompense est directement obtenue à partir de ces représentations, sans aucune planification. Il s'agit d'une étape vers la construction d'agents entièrement contrôlables. Un thème commun de la thèse est la conception d'algorithmes RL prouvables et généralisables. Dans la première et la deuxième partie, nous traitons de la généralisation dans les grands espaces d'états, soit par approximation de fonctions linéaires, soit par agrégation d'états. Dans la dernière partie, nous nous concentrons sur la généralisation sur les fonctions de récompense et nous proposons un cadre d'apprentissage non-supervisé de représentation qui est capable d'optimiser toutes les fonctions de récompense. / Reinforcement Learning (RL) is an agent-oriented learning paradigm concerned with learning by interacting with an uncertain environment. Combined with deep neural networks as function approximators, deep reinforcement learning (Deep RL) allowed recently to tackle highly complex tasks and enable artificial agents to master classic games like Go, play video games from pixels, and solve robotic control tasks. However, a closer look at these remarkable empirical successes reveals some fundamental limitations. First, it has been challenging to combine desirable features of RL algorithms, such as off-policy and multi-step learning with function approximation in a way that leads to both stable and efficient algorithms in large state spaces. Moreover, Deep RL algorithms tend to be very sample inefficient due to the rudimentary exploration-exploitation strategies these approaches employ. Finally, they require an enormous amount of supervised data and end up producing a narrow agent able to solve only the task that it was trained on. In this thesis, we propose novel solutions to the problems of off-policy learning and exploration-exploitation dilemma in large state spaces, as well as self-supervision in RL. On the topic of off-policy learning, we provide two contributions. First, for the problem of policy evaluation, we show that combining popular off-policy and multi-step learning methods with linear value function parameterization could lead to undesirable instability, and we derive a provably convergent variant of these methods. Second, for policy optimization, we propose to stabilize the policy improvement step through an off-policy divergence regularization that constrains the discounted state-action visitation induced by consecutive policies to be close to one another. Next, we study online learning in large state spaces and we focus on two structural assumptions to make the problem tractable: smooth and linear environments. For smooth environments, we propose an efficient online algorithm that actively learns an adaptive partitioning of the joint space by zooming in on more promising and frequently visited regions. For linear environments, we study a more realistic setting, where the environment is now allowed to evolve dynamically and even adversarially over time, but the total change is still bounded. To address this setting, we propose an efficient online algorithm based on weighted least squares value iteration. It uses exponential weights to smoothly forget data that are far in the past, which drives the agent to keep exploring to discover changes. Finally, beyond the classical RL setting, we consider an agent interacting with its environments without a reward signal. We propose to learn a pair of representations that map state-action pairs to some latent space. During the unsupervised phase, these representations are trained using reward-free interactions to encode long-range relationships between states and actions, via a predictive occupancy map. At test time, once a reward function is revealed, we show that the optimal policy for that reward is directly obtained from these representations, with no planning. This is a step towards building fully controllable agents. A common theme in the thesis is the design of provable RL algorithms that generalize. In the first and the second part, we deal with generalization in large state spaces either by linear function approximation or state aggregation. In the last part, we focus on generalization over reward functions and we propose a task-agnostic representation learning framework that is provably able to solve all reward functions. reinforcement learning Markov decision process artificial agent off-policy learning function approximation exploration-exploitation trade-off self-supervision generalization apprentissage par renforcement processus de décision Markovien agent artificiel apprentissage hors-politique approximation de fonction compromis exploration-exploitation auto-supervision généralisation

Search results