Global ETD Search

71	On Non-Convex Splitting Methods For Markovian Information Theoretic Representation Learning Teng Hui Huang (12463926) 27 April 2022 (has links) <p>In this work, we study a class of Markovian information theoretic optimization problems motivated by the recent interests in incorporating mutual information as performance metrics which gives evident success in representation learning, feature extraction and clustering problems. In particular, we focus on the information bottleneck (IB) and privacy funnel (PF) methods and their recent multi-view, multi-source generalizations that gain attention because the performance significantly improved with multi-view, multi-source data. Nonetheless, the generalized problems challenge existing IB and PF solves in terms of the complexity and their abilities to tackle large-scale data. </p> <p>To address this, we study both the IB and PF under a unified framework and propose solving it through splitting methods, including renowned algorithms such as alternating directional method of multiplier (ADMM), Peaceman-Rachford splitting (PRS) and Douglas-Rachford splitting (DRS) as special cases. Our convergence analysis and the locally linear rate of convergence results give rise to new splitting method based IB and PF solvers that can be easily generalized to multi-view IB, multi-source PF. We implement the proposed methods with gradient descent and empirically evaluate the new solvers in both synthetic and real-world datasets. Our numerical results demonstrate improved performance over the state-of-the-art approach with significant reduction in complexity. Furthermore, we consider the practical scenario where there is distribution mismatch between training and testing data generating processes under a known bounded divergence constraint. In analyzing the generalization error, we develop new techniques inspired by the input-output mutual information approach and tighten the existing generalization error bounds.</p> Optimisation Coding and Information Theory Information Engineering and Theory Rate distortion optimization non-convex optimization information bottleneck theory multi-view data representation learning generalization errors splitting methods ADMM algorithm
72	Analysis of user popularity pattern and engagement prediction in online social networks / Analyse du modèle de popularité de l'utilisateur et de la prédiction d'engagement en les réseaux sociaux en ligne Mohammadi, Samin 04 December 2018 (has links) De nos jours, les médias sociaux ont largement affecté tous les aspects de la vie humaine. Le changement le plus significatif dans le comportement des gens après l'émergence des réseaux sociaux en ligne (OSNs) est leur méthode de communication et sa portée. Avoir plus de connexions sur les OSNs apporte plus d'attention et de visibilité aux gens, où cela s'appelle la popularité sur les médias sociaux. Selon le type de réseau social, la popularité se mesure par le nombre d'adeptes, d'amis, de retweets, de goûts et toutes les autres mesures qui servaient à calculer l'engagement. L'étude du comportement de popularité des utilisateurs et des contenus publiés sur les médias sociaux et la prédiction de leur statut futur sont des axes de recherche importants qui bénéficient à différentes applications telles que les systèmes de recommandation, les réseaux de diffusion de contenu, les campagnes publicitaires, la prévision des résultats des élections, etc. Cette thèse porte sur l'analyse du comportement de popularité des utilisateurs d'OSN et de leurs messages publiés afin, d'une part, d'identifier les tendances de popularité des utilisateurs et des messages et, d'autre part, de prévoir leur popularité future et leur niveau d'engagement pour les messages publiés par les utilisateurs. A cette fin, i) l'évolution de la popularité des utilisateurs de l'ONS est étudiée à l'aide d'un ensemble de données d'utilisateurs professionnels 8K Facebook collectées par un crawler avancé. L'ensemble de données collectées comprend environ 38 millions d'instantanés des valeurs de popularité des utilisateurs et 64 millions de messages publiés sur une période de 4 ans. Le regroupement des séquences temporelles des valeurs de popularité des utilisateurs a permis d'identifier des modèles d'évolution de popularité différents et intéressants. Les grappes identifiées sont caractérisées par l'analyse du secteur d'activité des utilisateurs, appelé catégorie, leur niveau d'activité, ainsi que l'effet des événements externes. Ensuite ii) la thèse porte sur la prédiction de l'engagement des utilisateurs sur les messages publiés par les utilisateurs sur les OSNs. Un nouveau modèle de prédiction est proposé qui tire parti de l'information mutuelle par points (PMI) et prédit la réaction future des utilisateurs aux messages nouvellement publiés. Enfin, iii) le modèle proposé est élargi pour tirer profit de l'apprentissage de la représentation et prévoir l'engagement futur des utilisateurs sur leurs postes respectifs. L'approche de prédiction proposée extrait l'intégration de l'utilisateur de son historique de réaction au lieu d'utiliser les méthodes conventionnelles d'extraction de caractéristiques. La performance du modèle proposé prouve qu'il surpasse les méthodes d'apprentissage conventionnelles disponibles dans la littérature. Les modèles proposés dans cette thèse, non seulement déplacent les modèles de prédiction de réaction vers le haut pour exploiter les fonctions d'apprentissage de la représentation au lieu de celles qui sont faites à la main, mais pourraient également aider les nouvelles agences, les campagnes publicitaires, les fournisseurs de contenu dans les CDN et les systèmes de recommandation à tirer parti de résultats de prédiction plus précis afin d'améliorer leurs services aux utilisateurs / Nowadays, social media has widely affected every aspect of human life. The most significant change in people's behavior after emerging Online Social Networks (OSNs) is their communication method and its range. Having more connections on OSNs brings more attention and visibility to people, where it is called popularity on social media. Depending on the type of social network, popularity is measured by the number of followers, friends, retweets, likes, and all those other metrics that is used to calculate engagement. Studying the popularity behavior of users and published contents on social media and predicting its future status are the important research directions which benefit different applications such as recommender systems, content delivery networks, advertising campaign, election results prediction and so on. This thesis addresses the analysis of popularity behavior of OSN users and their published posts in order to first, identify the popularity trends of users and posts and second, predict their future popularity and engagement level for published posts by users. To this end, i) the popularity evolution of ONS users is studied using a dataset of 8K Facebook professional users collected by an advanced crawler. The collected dataset includes around 38 million snapshots of users' popularity values and 64 million published posts over a period of 4 years. Clustering temporal sequences of users' popularity values led to identifying different and interesting popularity evolution patterns. The identified clusters are characterized by analyzing the users' business sector, called category, their activity level, and also the effect of external events. Then ii) the thesis focuses on the prediction of user engagement on the posts published by users on OSNs. A novel prediction model is proposed which takes advantage of Point-wise Mutual Information (PMI) and predicts users' future reaction to newly published posts. Finally, iii) the proposed model is extended to get benefits of representation learning and predict users' future engagement on each other's posts. The proposed prediction approach extracts user embedding from their reaction history instead of using conventional feature extraction methods. The performance of the proposed model proves that it outperforms conventional learning methods available in the literature. The models proposed in this thesis, not only improves the reaction prediction models to exploit representation learning features instead of hand-crafted features but also could help news agencies, advertising campaigns, content providers in CDNs, and recommender systems to take advantage of more accurate prediction results in order to improve their user services Réseaux sociaux en ligne Apprentissage machine Prédiction Popularité Apprentissage de la représentation Exploration de données Online social networks Machine learning Prediction Popularity Representation learning Data Mining
73	Pattern Recognition in the Usage Sequences of Medical Apps / Analyse des Séquences d'Usage d'Applications Médicales Adam, Chloé 01 April 2019 (has links) Les radiologues utilisent au quotidien des solutions d'imagerie médicale pour le diagnostic. L'amélioration de l'expérience utilisateur est toujours un axe majeur de l'effort continu visant à améliorer la qualité globale et l'ergonomie des produits logiciels. Les applications de monitoring permettent en particulier d'enregistrer les actions successives effectuées par les utilisateurs dans l'interface du logiciel. Ces interactions peuvent être représentées sous forme de séquences d'actions. Sur la base de ces données, ce travail traite de deux sujets industriels : les pannes logicielles et l'ergonomie des logiciels. Ces deux thèmes impliquent d'une part la compréhension des modes d'utilisation, et d'autre part le développement d'outils de prédiction permettant soit d'anticiper les pannes, soit d'adapter dynamiquement l'interface logicielle en fonction des besoins des utilisateurs. Tout d'abord, nous visons à identifier les origines des crashes du logiciel qui sont essentielles afin de pouvoir les corriger. Pour ce faire, nous proposons d'utiliser un test binomial afin de déterminer quel type de pattern est le plus approprié pour représenter les signatures de crash. L'amélioration de l'expérience utilisateur par la personnalisation et l'adaptation des systèmes aux besoins spécifiques de l'utilisateur exige une très bonne connaissance de la façon dont les utilisateurs utilisent le logiciel. Afin de mettre en évidence les tendances d'utilisation, nous proposons de regrouper les sessions similaires. Nous comparons trois types de représentation de session dans différents algorithmes de clustering. La deuxième contribution de cette thèse concerne le suivi dynamique de l'utilisation du logiciel. Nous proposons deux méthodes -- basées sur des représentations différentes des actions d'entrée -- pour répondre à deux problématiques industrielles distinctes : la prédiction de la prochaine action et la détection du risque de crash logiciel. Les deux méthodologies tirent parti de la structure récurrente des réseaux LSTM pour capturer les dépendances entre nos données séquentielles ainsi que leur capacité à traiter potentiellement différents types de représentations d'entrée pour les mêmes données. / Radiologists use medical imaging solutions on a daily basis for diagnosis. Improving user experience is a major line of the continuous effort to enhance the global quality and usability of software products. Monitoring applications enable to record the evolution of various software and system parameters during their use and in particular the successive actions performed by the users in the software interface. These interactions may be represented as sequences of actions. Based on this data, this work deals with two industrial topics: software crashes and software usability. Both topics imply on one hand understanding the patterns of use, and on the other developing prediction tools either to anticipate crashes or to dynamically adapt software interface according to users' needs. First, we aim at identifying crash root causes. It is essential in order to fix the original defects. For this purpose, we propose to use a binomial test to determine which type of patterns is the most appropriate to represent crash signatures. The improvement of software usability through customization and adaptation of systems to each user's specific needs requires a very good knowledge of how users use the software. In order to highlight the trends of use, we propose to group similar sessions into clusters. We compare 3 session representations as inputs of different clustering algorithms. The second contribution of our thesis concerns the dynamical monitoring of software use. We propose two methods -- based on different representations of input actions -- to address two distinct industrial issues: next action prediction and software crash risk detection. Both methodologies take advantage of the recurrent structure of LSTM neural networks to capture dependencies among our sequential data as well as their capacity to potentially handle different types of input representations for the same data. Exploration de motifs fréquents Représentations pour l’apprentissage Représentations d’action Clustering Réseaux de Neurones Récurrents LSTM Frequent pattern mining Representation learning Action embeddings Clustering LSTM Recurrent Neural Networks
74	From specialists to generalists : inductive biases of deep learning for higher level cognition Goyal, Anirudh 10 1900 (has links) Les réseaux de neurones actuels obtiennent des résultats de pointe dans une gamme de domaines problématiques difficiles. Avec suffisamment de données et de calculs, les réseaux de neurones actuels peuvent obtenir des résultats de niveau humain sur presque toutes les tâches. En ce sens, nous avons pu former des spécialistes capables d'effectuer très bien une tâche particulière, que ce soit le jeu de Go, jouer à des jeux Atari, manipuler le cube Rubik, mettre des légendes sur des images ou dessiner des images avec des légendes. Le prochain défi pour l'IA est de concevoir des méthodes pour former des généralistes qui, lorsqu'ils sont exposés à plusieurs tâches pendant l'entraînement, peuvent s'adapter rapidement à de nouvelles tâches inconnues. Sans aucune hypothèse sur la distribution génératrice de données, il peut ne pas être possible d'obtenir une meilleure généralisation et une meilleure adaptation à de nouvelles tâches (inconnues). Les réseaux de neurones actuels obtiennent des résultats de pointe dans une gamme de domaines problématiques difficiles. Une possibilité fascinante est que l'intelligence humaine et animale puisse être expliquée par quelques principes, plutôt qu'une encyclopédie de faits. Si tel était le cas, nous pourrions plus facilement à la fois comprendre notre propre intelligence et construire des machines intelligentes. Tout comme en physique, les principes eux-mêmes ne suffiraient pas à prédire le comportement de systèmes complexes comme le cerveau, et des calculs importants pourraient être nécessaires pour simuler l'intelligence humaine. De plus, nous savons que les vrais cerveaux intègrent des connaissances a priori détaillées spécifiques à une tâche qui ne pourraient pas tenir dans une courte liste de principes simples. Nous pensons donc que cette courte liste explique plutôt la capacité des cerveaux à apprendre et à s'adapter efficacement à de nouveaux environnements, ce qui est une grande partie de ce dont nous avons besoin pour l'IA. Si cette hypothèse de simplicité des principes était correcte, cela suggérerait que l'étude du type de biais inductifs (une autre façon de penser aux principes de conception et aux a priori, dans le cas des systèmes d'apprentissage) que les humains et les animaux exploitent pourrait aider à la fois à clarifier ces principes et à fournir source d'inspiration pour la recherche en IA. L'apprentissage en profondeur exploite déjà plusieurs biais inductifs clés, et mon travail envisage une liste plus large, en se concentrant sur ceux qui concernent principalement le traitement cognitif de niveau supérieur. Mon travail se concentre sur la conception de tels modèles en y incorporant des hypothèses fortes mais générales (biais inductifs) qui permettent un raisonnement de haut niveau sur la structure du monde. Ce programme de recherche est à la fois ambitieux et pratique, produisant des algorithmes concrets ainsi qu'une vision cohérente pour une recherche à long terme vers la généralisation dans un monde complexe et changeant. / Current neural networks achieve state-of-the-art results across a range of challenging problem domains. Given enough data, and computation, current neural networks can achieve human-level results on mostly any task. In the sense, that we have been able to train \textit{specialists} that can perform a particular task really well whether it's the game of GO, playing Atari games, Rubik's cube manipulation, image caption or drawing images given captions. The next challenge for AI is to devise methods to train \textit{generalists} that when exposed to multiple tasks during training can quickly adapt to new unknown tasks. Without any assumptions about the data generating distribution it may not be possible to achieve better generalization and adaption to new (unknown) tasks. A fascinating possibility is that human and animal intelligence could be explained by a few principles (rather than an encyclopedia). If that was the case, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human intelligence. In addition, we know that real brains incorporate some detailed task-specific a priori knowledge which could not fit in a short list of simple principles. So we think of that short list rather as explaining the ability of brains to learn and adapt efficiently to new environments, which is a great part of what we need for AI. If that simplicity of principles hypothesis was correct it would suggest that studying the kind of inductive biases (another way to think about principles of design and priors, in the case of learning systems) that humans and animals exploit could help both clarify these principles and provide inspiration for AI research. Deep learning already exploits several key inductive biases, and my work considers a larger list, focusing on those which concern mostly higher-level cognitive processing. My work focuses on designing such models by incorporating in them strong but general assumptions (inductive biases) that enable high-level reasoning about the structure of the world. This research program is both ambitious and practical, yielding concrete algorithms as well as a cohesive vision for long-term research towards generalization in a complex and changing world. Deep Learning Apprentissage en profondeur Traitement du langage naturel Apprentissage des représentations Modèles génératifs Modélisation du langage Natural Language Processing Representation Learning Generative Models Language Modeling
75	Multimodal Classification of Second-Hand E-Commerce Ads / Multimodal klassiciering av annonser på Second-Hand-Marknadsplatser Åberg, Ludvig January 2018 (has links) In second-hand e-commerce, categorization of new products is typically done by the seller. Automating this process makes it easier to upload ads and could lower the number of incorrectly categorized ads. Automatic ad categorization also makes it possible for a second-hand e-commerce platform to use a more detailed category system, which could make the shopping experience better for potential buyers. Product ad categorization is typically addressed as a text classification problem as most metadata associated with products are textual. By including image information, i.e. using a multimodal approach, better performance can however be expected. The work done in this thesis evaluates different multimodal deep learning models for the task of ad categorization on data from Blocket.se. We examine late fusion models, where the modalities are combined at decision level, and early fusion models, where the modalities are combined at feature level. We also introduce our own approach Text Based Visual Attention (TBVA), which extends the image CNN Inception v3 with an attention mechanism to incorporate textual information. For all models evaluated, the text classifier fastText is used to process text data and the Inception v3 network to process image data. Our results show that the late fusion models perform best in our setting. We conclude that these models generally learn which of the baseline models to ’trust’, while early fusion and the TBVA models learn more abstract concepts. As future work, we would like to examine how the TBVA models perform on other tasks, such as ad similarity. / Produkter som läggs ut på marknadsplatser, såsom Blocket.se, kategoriseras oftast av säljaren själv. Att automatisera processen för kategorisering gör det därför både enklare och snabbare att lägga upp annonser och kan minska antalet produkter med felaktig kategori. Automatisk kategorisering gör det ocksåmöjligt för marknadsplatsen att använda ett mer detaljerat kategorisystem, vilket skulle kunna effektivisera sökandet efter produkter för potentiella köpare.Produktkategorisering adresseras ofta som ett klassificeringsproblem för text, eftersom den största delen av produktinformationen finns i skriftlig form. Genom att också inkludera produktbilder kan vi dock förvänta oss bättre resultat.I den här uppsatsen evalueras olika metoder för att använda både bild och text för annonsklassificering av data från blocket.se. I synnerhetundersökslate fusion modeller, där informationen från modaliteterna kombineras i samband med klassificeringen, samt early fusion modeller, där modaliteterna istället kombineras på en abstrakt nivå innan klassificeringen. Vi introduserar också vår egen modell Text Based Visual Attention (TBVA), en utvidgning av bildklassificeraren Inception v3 [1], som använder en attention mekanism för att inkorporera textinformation. För alla modeller som beskrivs i denna uppsats används textklassificeraren fast Text[2] för att processa text och bildklassificeraren Inception v3 för att processa bild. Våra resultat visar att late fusion modeller presterar bäst med vår data. I slutsatsen konstateras att late fusion modellerna lär sig vilka fall den ska 'lita' på text eller bild informationen, där early fusion och TBVA modellerna istället lär sig mer abstrakta koncept. Som framtida arbete tror vi det skulle vara av värde att undersöka hur TBVA modellerna presterar på andra uppgifter, såsom att bedöma likheter mellan annonser. Machine Learning Classification Multimodal Classification Multimodal Learning Representation Learning Late Fusion Early Fusion Visual Attention Marketplace Second Hand E-commerce Blocket.se Computer Sciences Datavetenskap (datalogi)
76	Image-Text context relation using Machine Learning : Research on performance of different datasets Sun, Yuqi January 2022 (has links) Based on the progress in Computer Vision and Natural Language Processing fields, Vision-Language (VL) models are designed to process information from images and texts. The thesis focused on the performance of a model, Oscar, on different datasets. Oscar is a State-of-The-Art VL representation learning model based on a pre-trained model for Object Detection and a pre-trained Bert model. By comparing the performance of datasets, we could understand the relationship between the properties of datasets and the performance of models. The conclusions could provide the direction for future work on VL datasets and models. In this thesis, I collected five VL datasets that have at least one main difference from each other and generated 8 subsets from these datasets. I trained the same model with different subsets to classify whether an image is related to a text. In common sense, clear datasets have better performance because their images are of everyday scenes and annotated by human annotators. Thus, the size of clear datasets is always limited. However, an interesting phenomenon in the thesis is that the dataset generated by models trained on different datasets has achieved as good performance as clear datasets. This would encourage the research on models for data collection. The experiment results also indicated that future work on the VL model could focus on improving feature extraction from images, as the images have a great influence on the performance of VL models. / Baserat på prestationerna inom Computer Vision och Natural Language Processing-fält, är Vision-Language (VL)-modeller utformade för att bearbeta information från bilder och texter. Projektet fokuserade på prestanda av en modell, Oscar, på olika datamängder. Oscar är en State-of-The-Art VL-representationsinlärningsmodell baserad på en förutbildad modell för Objektdetektion och en förutbildad Bert-modell. Genom att jämföra datauppsättningarnas prestanda kunde vi förstå sambandet mellan datauppsättningarnas egenskaper och modellernas prestanda. Slutsatserna skulle kunna ge riktning för framtida arbete med VL-datauppsättningar och modeller. I detta projekt samlade jag fem VL-datauppsättningar som har minst en huvudskillnad från varandra och genererade 8 delmängder från dessa datauppsättningar. Jag tränade samma modell med olika delmängder för att klassificera om en bild är relaterad till en text. I sunt förnuft har tydliga datauppsättningar bättre prestanda eftersom deras bilder är av vardagliga scener och kommenterade av människor. Storleken på tydliga datamängder är därför alltid begränsad. Ett intressant fenomen i projektet är dock att den datauppsättning som genereras av modeller har uppnått lika bra prestanda som tydliga datauppsättningar. Detta skulle uppmuntra forskning om modeller för datainsamling. Experimentresultaten indikerade också att framtida arbete med VL-modellen kan fokusera på att förbättra funktionsextraktion från bilder, eftersom bilderna har ett stort inflytande på prestandan hos VL-modeller. Vision-Language Representation learning Bert Faster R-CNN Oscar Datasets Visual-Language Representation inlärning Bert Faster R-CNN Oscar Datamängder Computer and Information Sciences Data- och informationsvetenskap
77	Feature extraction with self-supervised learning on eye-tracking data from Parkinson’s patients and healthy individuals / Extrahering av särdrag med hjälp av självövervakande maskininlärning applicerad på ögonrörelsedata från parkinsonpatienter och friska försökspersoner. Bergman, Leo January 2022 (has links) Eye-tracking is a method for monitoring and measuring eye movements. The technology has had a significant impact so far and new application areas are emerging. Today, the technology is used in the gaming industry, health industry, self-driving cars, and not least in medicine. In the latter, large research resources are invested to investigate the extent to which eye-tracking can help with disease diagnostics. One disease of interest is Parkinson’s disease, a neuro-degenerative disease in which the dopamine production in nerve cells is destroyed. This leads to detoriating nerve signal transmission, which in turn affects the motor skills. One of the affected motor functions associated with PD is the oculomotor function, affecting the eye function. The declination can be observed clinically by physicians, however eye-tracking technology has a high potential here, but it remains to investigate which methodology and which test protocols are relevant to study and to what extent the technology can be used as a diagnostic tool. A novel class of algorithms for finding representations of data is called self-supervised learning (SSL). The class of algorithms seems to have a high potential in terms of categorizing biomarkers. This thesis examines to which extent an SSL network can learn representations of eye-tracking data on Parkinson’s patients, in order to distinguish between healthy and sick, patients on and off medication. The result suggests that the network does not succeed in learning distinct differences between groups. Furthermore, no difference is observed in the result when we in the model take into account the task-specific target information that the subjects are following. Today in the UK approximately 26 percent of Parkinson’s patients are misdiagnosed. In the initial state of the disease, the misdiagnosis is even higher. Potentially, the method can be used as a complement to regular diagnosis in different stages of the disease. This would provide better conditions for the patient as well as for medical and pharmaceutical research. The method also has the potential to reduce physicians’ workload. / Eye-tracking eller ögonrörelsemätning som är den svenska termen, är en metod för att följa och mäta ögats rörelser. Tekniken har fått en betydande genomslagskraft hittills och nya applikationsområden dyker upp titt som tätt. Idag används tekniken inom spelindustrin, hälsa, i självkörande bilar och inte minst inom medicin. Inom det senare läggs idag stora forskningsresurser för att undersöka i vilken utsträckning eye-tracking kan hjälpa till att diagnosticera sjukdomar. En sjukdom av intresse är Parkinson’s sjukdom, vilket är en neurodegenerativ sjukdom där dopaminproduktionen i nervceller förstörs. Det leder till att transmissionen av nervsignaler försämras som i sin tur gör att motoriken påverkas vilket bland annat leder till en nedsättning i ögats motorik. Det är något som man idag kan observera kliniskt, eye-tracking teknik har här en hög potential men det återstår att undersöka vilken metodik och vilka testprotokoll som är relevanta att undersöka och i vilken grad tekniken kan användas som ett diagnostiskt verktyg. En ny typ av algoritmer för att hitta representationer av data kallas för self-supervised learning (SSL), dessa algoritmer verkar ha en hög potential vad gäller kategorisering av biomarkörer. I denna uppsats undersöks i vilken grad ett SSL-nätverk kan lära sig representationer av eye-tracking data på Parkinson’s patienter för att kunna särskilja mellan friska och sjuka, medicinerade och omedicinerade. Resultatet är att nätverket inte lyckas lära sig skiljaktigheter mellan dessa klasser. Vidare noteras ingen skillnad i resultatet då vi i modellen tar hänsyn till de specifika uppgifterna som försökspersonerna fått. Idag får 30 procent av parkinsonpatienterna fel diagnos. I ett initialt tillstånd av sjukdomen är feldiagnosticeringen ännu högre. Potentiellt kan metoden användas som komplement till diagnosticering i olika skeden av sjukdomen. Detta skulle ge bättre förutsättningar för såväl patienten som för den medicinska och farmaceutiska forskningen. Metoden har dessutom potential att minska läkares arbetsbörda. Eye-tracking Representation learning Self-supervised learning Parkinson’s disease Feature extraction Clustering analysis Ögonspårning Särdragsextraktion Parkinsonssjukdom Representationsinlärning Maskininlärning Klustring Computer Sciences Datavetenskap (datalogi)
78	Learning and planning with noise in optimization and reinforcement learning Thomas, Valentin 06 1900 (has links) La plupart des algorithmes modernes d'apprentissage automatique intègrent un certain degré d'aléatoire dans leurs processus, que nous appellerons le bruit, qui peut finalement avoir un impact sur les prédictions du modèle. Dans cette thèse, nous examinons de plus près l'apprentissage et la planification en présence de bruit pour les algorithmes d'apprentissage par renforcement et d'optimisation. Les deux premiers articles présentés dans ce document se concentrent sur l'apprentissage par renforcement dans un environnement inconnu, et plus précisément sur la façon dont nous pouvons concevoir des algorithmes qui utilisent la stochasticité de leur politique et de l'environnement à leur avantage. Notre première contribution présentée dans ce document se concentre sur le cadre de l'apprentissage par renforcement non supervisé. Nous montrons comment un agent laissé seul dans un monde inconnu sans but précis peut apprendre quels aspects de l'environnement il peut contrôler indépendamment les uns des autres, ainsi qu'apprendre conjointement une représentation latente démêlée de ces aspects que nous appellerons \emph{facteurs de variation}. La deuxième contribution se concentre sur la planification dans les tâches de contrôle continu. En présentant l'apprentissage par renforcement comme un problème d'inférence, nous empruntons des outils provenant de la littérature sur les m\'thodes de Monte Carlo séquentiel pour concevoir un algorithme efficace et théoriquement motiv\'{e} pour la planification probabiliste en utilisant un modèle appris du monde. Nous montrons comment l'agent peut tirer parti de note objectif probabiliste pour imaginer divers ensembles de solutions. Les deux contributions suivantes analysent l'impact du bruit de gradient dû à l'échantillonnage dans les algorithmes d'optimisation. La troisième contribution examine le rôle du bruit de l'estimateur du gradient dans l'estimation par maximum de vraisemblance avec descente de gradient stochastique, en explorant la relation entre la structure du bruit du gradient et la courbure locale sur la généralisation et la vitesse de convergence du modèle. Notre quatrième contribution revient sur le sujet de l'apprentissage par renforcement pour analyser l'impact du bruit d'échantillonnage sur l'algorithme d'optimisation de la politique par ascension du gradient. Nous constatons que le bruit d'échantillonnage peut avoir un impact significatif sur la dynamique d'optimisation et les politiques découvertes en apprentissage par renforcement. / Most modern machine learning algorithms incorporate a degree of randomness in their processes, which we will refer to as noise, which can ultimately impact the model's predictions. In this thesis, we take a closer look at learning and planning in the presence of noise for reinforcement learning and optimization algorithms. The first two articles presented in this document focus on reinforcement learning in an unknown environment, specifically how we can design algorithms that use the stochasticity of their policy and of the environment to their advantage. Our first contribution presented in this document focuses on the unsupervised reinforcement learning setting. We show how an agent left alone in an unknown world without any specified goal can learn which aspects of the environment it can control independently from each other as well as jointly learning a disentangled latent representation of these aspects, or factors of variation. The second contribution focuses on planning in continuous control tasks. By framing reinforcement learning as an inference problem, we borrow tools from Sequential Monte Carlo literature to design a theoretically grounded and efficient algorithm for probabilistic planning using a learned model of the world. We show how the agent can leverage the uncertainty of the model to imagine a diverse set of solutions. The following two contributions analyze the impact of gradient noise due to sampling in optimization algorithms. The third contribution examines the role of gradient noise in maximum likelihood estimation with stochastic gradient descent, exploring the relationship between the structure of the gradient noise and local curvature on the generalization and convergence speed of the model. Our fourth contribution returns to the topic of reinforcement learning to analyze the impact of sampling noise on the policy gradient algorithm. We find that sampling noise can significantly impact the optimization dynamics and policies discovered in on-policy reinforcement learning. Deep reinforcement learning Planning Stochastic optimization Generalization Control as inference Representation learning Optimisation stochastique Apprentissage par renforcement Apprentissage de representations Planification
79	Latent data augmentation and modular structure for improved generalization Lamb, Alexander 08 1900 (has links) This thesis explores the nature of generalization in deep learning and several settings in which it fails. In particular, deep neural networks can struggle to generalize in settings with limited data, insufficient supervision, challenging long-range dependencies, or complex structure and subsystems. This thesis explores the nature of these challenges for generalization in deep learning and presents several algorithms which seek to address these challenges. In the first article, we show how training with interpolated hidden states can improve generalization and calibration in deep learning. We also introduce a theory showing how our algorithm, which we call Manifold Mixup, leads to a flattening of the per-class hidden representations, which can be seen as a compression of the information in the hidden states. The second article is related to the first and shows how interpolated examples can be used for semi-supervised learning. In addition to interpolating the input examples, the model’s interpolated predictions are used as targets for these examples. This improves results on standard benchmarks as well as classic 2D toy problems for semi-supervised learning. The third article studies how a recurrent neural network can be divided into multiple modules with different parameters and well separated hidden states, as well as a competition mechanism restricting updating of the hidden states to a subset of the most relevant modules on a specific time-step. This improves systematic generalization when the pattern distribution is changed between the training and evaluation phases. It also improves generalization in reinforcement learning. In the fourth article, we show that attention can be used to control the flow of information between successive layers in deep networks. This allows each layer to only process the subset of the previously computed layers’ outputs which are most relevant. This improves generalization on relational reasoning tasks as well as standard benchmark classification tasks. / Cette thèse explore la nature de la généralisation dans l’apprentissage en profondeur et plusieurs contextes dans lesquels elle échoue. En particulier, les réseaux de neurones profonds peuvent avoir du mal à se généraliser dans des contextes avec des données limitées, une supervision insuffisante, des dépendances à longue portée difficiles ou une structure et des sous-systèmes complexes. Cette thèse explore la nature de ces défis pour la généralisation en apprentissage profond et présente plusieurs algorithmes qui cherchent à relever ces défis. Dans le premier article, nous montrons comment l’entraînement avec des états cachés interpolés peut améliorer la généralisation et la calibration en apprentissage profond. Nous introduisons également une théorie montrant comment notre algorithme, que nous appelons Manifold Mixup, conduit à un aplatissement des représentations cachées par classe, ce qui peut être vu comme une compression de l’information dans les états cachés. Le deuxième article est lié au premier et montre comment des exemples interpolés peuvent être utilisés pour un apprentissage semi-supervisé. Outre l’interpolation des exemples d’entrée, les prédictions interpolées du modèle sont utilisées comme cibles pour ces exemples. Cela améliore les résultats sur les benchmarks standard ainsi que sur les problèmes de jouets 2D classiques pour l’apprentissage semi-supervisé. Le troisième article étudie comment un réseau de neurones récurrent peut être divisé en plusieurs modules avec des paramètres différents et des états cachés bien séparés, ainsi qu’un mécanisme de concurrence limitant la mise à jour des états cachés à un sous-ensemble des modules les plus pertinents sur un pas de temps spécifique. . Cela améliore la généralisation systématique lorsque la distribution des modèles est modifiée entre les phases de entraînement et d’évaluation. Il améliore également la généralisation dans l’apprentissage par renforcement. Dans le quatrième article, nous montrons que l’attention peut être utilisée pour contrôler le flux d’informations entre les couches successives des réseaux profonds. Cela permet à chaque couche de ne traiter que le sous-ensemble des sorties des couches précédemment calculées qui sont les plus pertinentes. Cela améliore la généralisation sur les tâches de raisonnement relationnel ainsi que sur les tâches de classification de référence standard. Deep Learning Natural Language Processing Representation Learning Generative Models Language Modeling Apprentissage en profondeur Traitement du langage naturel Apprentissage des représentations Modèles génératifs Modélisation du langage
80	Cognitively Guided Modeling of Visual Perception in Intelligent Vehicles Plebe, Alice 20 April 2021 (has links) This work proposes a strategy for visual perception in the context of autonomous driving. Despite the growing research aiming to implement self-driving cars, no artificial system can claim to have reached the driving performance of a human, yet. Humans---when not distracted or drunk---are still the best drivers you can currently find. Hence, the theories about the human mind and its neural organization could reveal precious insights on how to design a better autonomous driving agent. This dissertation focuses specifically on the perceptual aspect of driving, and it takes inspiration from four key theories on how the human brain achieves the cognitive capabilities required by the activity of driving. The first idea lies at the foundation of current cognitive science, and it argues that thinking nearly always involves some sort of mental simulation, which takes the form of imagery when dealing with visual perception. The second theory explains how the perceptual simulation takes place in neural circuits called convergence-divergence zones, which expand and compress information to extract abstract concepts from visual experience and code them into compact representations. The third theory highlights that perception---when specialized for a complex task as driving---is refined by experience in a process called perceptual learning. The fourth theory, namely the free-energy principle of predictive brains, corroborates the role of visual imagination as a fundamental mechanism of inference. In order to implement these theoretical principles, it is necessary to identify the most appropriate computational tools currently available. Within the consolidated and successful field of deep learning, I select the artificial architectures and strategies that manifest a sounding resemblance with their cognitive counterparts. Specifically, convolutional autoencoders have a strong correspondence with the architecture of convergence-divergence zones and the process of perceptual abstraction. The free-energy principle of predictive brains is related to variational Bayesian inference and the use of recurrent neural networks. In fact, this principle can be translated into a training procedure that learns abstract representations predisposed to predicting how the current road scenario will change in the future. The main contribution of this dissertation is a method to learn conceptual representations of the driving scenario from visual information. This approach forces a semantic internal organization, in the sense that distinct parts of the representation are explicitly associated to specific concepts useful in the context of driving. Specifically, the model uses as few as 16 neurons for each of the two basic concepts here considered: vehicles and lanes. At the same time, the approach biases the internal representations towards the ability to predict the dynamics of objects in the scene. This property of temporal coherence allows the representations to be exploited to predict plausible future scenarios and to perform a simplified form of mental imagery. In addition, this work includes a proposal to tackle the problem of opaqueness affecting deep neural networks. I present a method that aims to mitigate this issue, in the context of longitudinal control for automated vehicles. A further contribution of this dissertation experiments with higher-level spaces of prediction, such as occupancy grids, which could conciliate between the direct application to motor controls and the biological plausibility. Settore INF/01 - Informatica

Search results