Global ETD Search

71	Deep generative neural networks for novelty generation : a foundational framework, metrics and experiments / Réseaux profonds génératifs pour la génération de nouveauté : fondations, métriques et expériences Cherti, Mehdi 26 January 2018 (has links) Des avancées significatives sur les réseaux de neurones profonds ont récemment permis le développement de technologies importantes comme les voitures autonomes et les assistants personnels intelligents basés sur la commande vocale. La plupart des succès en apprentissage profond concernent la prédiction, alors que les percées initiales viennent des modèles génératifs. Actuellement, même s'il existe des outils puissants dans la littérature des modèles génératifs basés sur les réseaux profonds, ces techniques sont essentiellement utilisées pour la prédiction ou pour générer des objets connus (i.e., des images de haute qualité qui appartiennent à des classes connues) : un objet généré qui est à priori inconnu est considéré comme une erreur (Salimans et al., 2016) ou comme un objet fallacieux (Bengio et al., 2013b). En d'autres termes, quand la prédiction est considérée comme le seul objectif possible, la nouveauté est vue comme une erreur - que les chercheurs ont essayé d'éliminer au maximum. Cette thèse défends le point de vue que, plutôt que d'éliminer ces nouveautés, on devrait les étudier et étudier le potentiel génératif des réseaux neuronaux pour créer de la nouveauté utile - particulièrement sachant l'importance économique et sociétale de la création d'objets nouveaux dans les sociétés contemporaines. Cette thèse a pour objectif d'étudier la génération de la nouveauté et sa relation avec les modèles de connaissance produits par les réseaux neurones profonds génératifs. Notre première contribution est la démonstration de l'importance des représentations et leur impact sur le type de nouveautés qui peuvent être générées : une conséquence clé est qu'un agent créatif a besoin de re-représenter les objets connus et utiliser cette représentation pour générer des objets nouveaux. Ensuite, on démontre que les fonctions objectives traditionnelles utilisées dans la théorie de l'apprentissage statistique, comme le maximum de vraisemblance, ne sont pas nécessairement les plus adaptées pour étudier la génération de nouveauté. On propose plusieurs alternatives à un niveau conceptuel. Un deuxième résultat clé est la confirmation que les modèles actuels - qui utilisent les fonctions objectives traditionnelles - peuvent en effet générer des objets inconnus. Cela montre que même si les fonctions objectives comme le maximum de vraisemblance s'efforcent à éliminer la nouveauté, les implémentations en pratique échouent à le faire. A travers une série d'expérimentations, on étudie le comportement de ces modèles ainsi que les objets qu'ils génèrent. En particulier, on propose une nouvelle tâche et des métriques pour la sélection de bons modèles génératifs pour la génération de la nouveauté. Finalement, la thèse conclue avec une série d'expérimentations qui clarifie les caractéristiques des modèles qui génèrent de la nouveauté. Les expériences montrent que la sparsité, le niveaux du niveau de corruption et la restriction de la capacité des modèles tuent la nouveauté et que les modèles qui arrivent à reconnaître des objets nouveaux arrivent généralement aussi à générer de la nouveauté. / In recent years, significant advances made in deep neural networks enabled the creation of groundbreaking technologies such as self-driving cars and voice-enabled personal assistants. Almost all successes of deep neural networks are about prediction, whereas the initial breakthroughs came from generative models. Today, although we have very powerful deep generative modeling techniques, these techniques are essentially being used for prediction or for generating known objects (i.e., good quality images of known classes): any generated object that is a priori unknown is considered as a failure mode (Salimans et al., 2016) or as spurious (Bengio et al., 2013b). In other words, when prediction seems to be the only possible objective, novelty is seen as an error that researchers have been trying hard to eliminate. This thesis defends the point of view that, instead of trying to eliminate these novelties, we should study them and the generative potential of deep nets to create useful novelty, especially given the economic and societal importance of creating new objects in contemporary societies. The thesis sets out to study novelty generation in relationship with data-driven knowledge models produced by deep generative neural networks. Our first key contribution is the clarification of the importance of representations and their impact on the kind of novelties that can be generated: a key consequence is that a creative agent might need to rerepresent known objects to access various kinds of novelty. We then demonstrate that traditional objective functions of statistical learning theory, such as maximum likelihood, are not necessarily the best theoretical framework for studying novelty generation. We propose several other alternatives at the conceptual level. A second key result is the confirmation that current models, with traditional objective functions, can indeed generate unknown objects. This also shows that even though objectives like maximum likelihood are designed to eliminate novelty, practical implementations do generate novelty. Through a series of experiments, we study the behavior of these models and the novelty they generate. In particular, we propose a new task setup and metrics for selecting good generative models. Finally, the thesis concludes with a series of experiments clarifying the characteristics of models that can exhibit novelty. Experiments show that sparsity, noise level, and restricting the capacity of the net eliminates novelty and that models that are better at recognizing novelty are also good at generating novelty. Apprentissage automatique Réseaux profonds Créativité Modèles génératifs Conception Machine learning Deep learning Creativity Unsupervised representation learning Generative models Design
72	Relational Representation Learning Incorporating Textual Communication for Social Networks Yi-Yu Lai (10157291) 01 March 2021 (has links) <div>Representation learning (RL) for social networks facilitates real-world tasks such as visualization, link prediction and friend recommendation. Many methods have been proposed in this area to learn continuous low-dimensional embedding of nodes, edges or relations in social and information networks. However, most previous network RL methods neglect social signals, such as textual communication between users (nodes). Unlike more typical binary features on edges, such as post likes and retweet actions, social signals are more varied and contain ambiguous information. This makes it more challenging to incorporate them into RL methods, but the ability to quantify social signals should allow RL methods to better capture the implicit relationships among real people in social networks. Second, most previous work in network RL has focused on learning from homogeneous networks (i.e., single type of node, edge, role, and direction) and thus, most existing RL methods cannot capture the heterogeneous nature of relationships in social networks. Based on these identified gaps, this thesis aims to study the feasibility of incorporating heterogeneous information, e.g., texts, attributes, multiple relations and edge types (directions), to learn more accurate, fine-grained network representations. </div><div> </div><div>In this dissertation, we discuss a preliminary study and outline three major works that aim to incorporate textual interactions to improve relational representation learning. The preliminary study learns a joint representation that captures the textual similarity in content between interacting nodes. The promising results motivate us to pursue broader research on using social signals for representation learning. The first major component aims to learn explicit node and relation embeddings in social networks. Traditional knowledge graph (KG) completion models learn latent representations of entities and relations by interpreting them as translations operating on the embedding of the entities. However, existing approaches do not consider textual communications between users, which contain valuable information to provide meaning and context for social relationships. We propose a novel approach that incorporates textual interactions between each pair of users to improve representation learning of both users and relationships. The second major component focuses on analyzing how users interact with each other via natural language content. Although the data is interconnected and dependent, previous research has primarily focused on modeling the social network behavior separately from the textual content. In this work, we model the data in a holistic way, taking into account the connections between the social behavior of users and the content generated when they interact, by learning a joint embedding over user characteristics and user language. In the third major component, we consider the task of learning edge representations in social networks. Edge representations are especially beneficial as we need to describe or explain the relationships, activities, and interactions among users. However, previous work in this area lack well-defined edge representations and ignore the relational signals over multiple views of social networks, which typically contain multi-view contexts (due to multiple edge types) that need to be considered when learning the representation. We propose a new methodology that captures asymmetry in multiple views by learning well-defined edge representations and incorporates textual communications to identify multiple sources of social signals that moderate the impact of different views between users.</div> Network Embedding Representation Learning Social Network Data Social Network Mining Textual Communication Natural Language Processing Relationship Embedding Edge Representation Multi-View Learning
73	On Non-Convex Splitting Methods For Markovian Information Theoretic Representation Learning Teng Hui Huang (12463926) 27 April 2022 (has links) <p>In this work, we study a class of Markovian information theoretic optimization problems motivated by the recent interests in incorporating mutual information as performance metrics which gives evident success in representation learning, feature extraction and clustering problems. In particular, we focus on the information bottleneck (IB) and privacy funnel (PF) methods and their recent multi-view, multi-source generalizations that gain attention because the performance significantly improved with multi-view, multi-source data. Nonetheless, the generalized problems challenge existing IB and PF solves in terms of the complexity and their abilities to tackle large-scale data. </p> <p>To address this, we study both the IB and PF under a unified framework and propose solving it through splitting methods, including renowned algorithms such as alternating directional method of multiplier (ADMM), Peaceman-Rachford splitting (PRS) and Douglas-Rachford splitting (DRS) as special cases. Our convergence analysis and the locally linear rate of convergence results give rise to new splitting method based IB and PF solvers that can be easily generalized to multi-view IB, multi-source PF. We implement the proposed methods with gradient descent and empirically evaluate the new solvers in both synthetic and real-world datasets. Our numerical results demonstrate improved performance over the state-of-the-art approach with significant reduction in complexity. Furthermore, we consider the practical scenario where there is distribution mismatch between training and testing data generating processes under a known bounded divergence constraint. In analyzing the generalization error, we develop new techniques inspired by the input-output mutual information approach and tighten the existing generalization error bounds.</p> Optimisation Coding and Information Theory Information Engineering and Theory Rate distortion optimization non-convex optimization information bottleneck theory multi-view data representation learning generalization errors splitting methods ADMM algorithm
74	Analysis of user popularity pattern and engagement prediction in online social networks / Analyse du modèle de popularité de l'utilisateur et de la prédiction d'engagement en les réseaux sociaux en ligne Mohammadi, Samin 04 December 2018 (has links) De nos jours, les médias sociaux ont largement affecté tous les aspects de la vie humaine. Le changement le plus significatif dans le comportement des gens après l'émergence des réseaux sociaux en ligne (OSNs) est leur méthode de communication et sa portée. Avoir plus de connexions sur les OSNs apporte plus d'attention et de visibilité aux gens, où cela s'appelle la popularité sur les médias sociaux. Selon le type de réseau social, la popularité se mesure par le nombre d'adeptes, d'amis, de retweets, de goûts et toutes les autres mesures qui servaient à calculer l'engagement. L'étude du comportement de popularité des utilisateurs et des contenus publiés sur les médias sociaux et la prédiction de leur statut futur sont des axes de recherche importants qui bénéficient à différentes applications telles que les systèmes de recommandation, les réseaux de diffusion de contenu, les campagnes publicitaires, la prévision des résultats des élections, etc. Cette thèse porte sur l'analyse du comportement de popularité des utilisateurs d'OSN et de leurs messages publiés afin, d'une part, d'identifier les tendances de popularité des utilisateurs et des messages et, d'autre part, de prévoir leur popularité future et leur niveau d'engagement pour les messages publiés par les utilisateurs. A cette fin, i) l'évolution de la popularité des utilisateurs de l'ONS est étudiée à l'aide d'un ensemble de données d'utilisateurs professionnels 8K Facebook collectées par un crawler avancé. L'ensemble de données collectées comprend environ 38 millions d'instantanés des valeurs de popularité des utilisateurs et 64 millions de messages publiés sur une période de 4 ans. Le regroupement des séquences temporelles des valeurs de popularité des utilisateurs a permis d'identifier des modèles d'évolution de popularité différents et intéressants. Les grappes identifiées sont caractérisées par l'analyse du secteur d'activité des utilisateurs, appelé catégorie, leur niveau d'activité, ainsi que l'effet des événements externes. Ensuite ii) la thèse porte sur la prédiction de l'engagement des utilisateurs sur les messages publiés par les utilisateurs sur les OSNs. Un nouveau modèle de prédiction est proposé qui tire parti de l'information mutuelle par points (PMI) et prédit la réaction future des utilisateurs aux messages nouvellement publiés. Enfin, iii) le modèle proposé est élargi pour tirer profit de l'apprentissage de la représentation et prévoir l'engagement futur des utilisateurs sur leurs postes respectifs. L'approche de prédiction proposée extrait l'intégration de l'utilisateur de son historique de réaction au lieu d'utiliser les méthodes conventionnelles d'extraction de caractéristiques. La performance du modèle proposé prouve qu'il surpasse les méthodes d'apprentissage conventionnelles disponibles dans la littérature. Les modèles proposés dans cette thèse, non seulement déplacent les modèles de prédiction de réaction vers le haut pour exploiter les fonctions d'apprentissage de la représentation au lieu de celles qui sont faites à la main, mais pourraient également aider les nouvelles agences, les campagnes publicitaires, les fournisseurs de contenu dans les CDN et les systèmes de recommandation à tirer parti de résultats de prédiction plus précis afin d'améliorer leurs services aux utilisateurs / Nowadays, social media has widely affected every aspect of human life. The most significant change in people's behavior after emerging Online Social Networks (OSNs) is their communication method and its range. Having more connections on OSNs brings more attention and visibility to people, where it is called popularity on social media. Depending on the type of social network, popularity is measured by the number of followers, friends, retweets, likes, and all those other metrics that is used to calculate engagement. Studying the popularity behavior of users and published contents on social media and predicting its future status are the important research directions which benefit different applications such as recommender systems, content delivery networks, advertising campaign, election results prediction and so on. This thesis addresses the analysis of popularity behavior of OSN users and their published posts in order to first, identify the popularity trends of users and posts and second, predict their future popularity and engagement level for published posts by users. To this end, i) the popularity evolution of ONS users is studied using a dataset of 8K Facebook professional users collected by an advanced crawler. The collected dataset includes around 38 million snapshots of users' popularity values and 64 million published posts over a period of 4 years. Clustering temporal sequences of users' popularity values led to identifying different and interesting popularity evolution patterns. The identified clusters are characterized by analyzing the users' business sector, called category, their activity level, and also the effect of external events. Then ii) the thesis focuses on the prediction of user engagement on the posts published by users on OSNs. A novel prediction model is proposed which takes advantage of Point-wise Mutual Information (PMI) and predicts users' future reaction to newly published posts. Finally, iii) the proposed model is extended to get benefits of representation learning and predict users' future engagement on each other's posts. The proposed prediction approach extracts user embedding from their reaction history instead of using conventional feature extraction methods. The performance of the proposed model proves that it outperforms conventional learning methods available in the literature. The models proposed in this thesis, not only improves the reaction prediction models to exploit representation learning features instead of hand-crafted features but also could help news agencies, advertising campaigns, content providers in CDNs, and recommender systems to take advantage of more accurate prediction results in order to improve their user services Réseaux sociaux en ligne Apprentissage machine Prédiction Popularité Apprentissage de la représentation Exploration de données Online social networks Machine learning Prediction Popularity Representation learning Data Mining
75	Pattern Recognition in the Usage Sequences of Medical Apps / Analyse des Séquences d'Usage d'Applications Médicales Adam, Chloé 01 April 2019 (has links) Les radiologues utilisent au quotidien des solutions d'imagerie médicale pour le diagnostic. L'amélioration de l'expérience utilisateur est toujours un axe majeur de l'effort continu visant à améliorer la qualité globale et l'ergonomie des produits logiciels. Les applications de monitoring permettent en particulier d'enregistrer les actions successives effectuées par les utilisateurs dans l'interface du logiciel. Ces interactions peuvent être représentées sous forme de séquences d'actions. Sur la base de ces données, ce travail traite de deux sujets industriels : les pannes logicielles et l'ergonomie des logiciels. Ces deux thèmes impliquent d'une part la compréhension des modes d'utilisation, et d'autre part le développement d'outils de prédiction permettant soit d'anticiper les pannes, soit d'adapter dynamiquement l'interface logicielle en fonction des besoins des utilisateurs. Tout d'abord, nous visons à identifier les origines des crashes du logiciel qui sont essentielles afin de pouvoir les corriger. Pour ce faire, nous proposons d'utiliser un test binomial afin de déterminer quel type de pattern est le plus approprié pour représenter les signatures de crash. L'amélioration de l'expérience utilisateur par la personnalisation et l'adaptation des systèmes aux besoins spécifiques de l'utilisateur exige une très bonne connaissance de la façon dont les utilisateurs utilisent le logiciel. Afin de mettre en évidence les tendances d'utilisation, nous proposons de regrouper les sessions similaires. Nous comparons trois types de représentation de session dans différents algorithmes de clustering. La deuxième contribution de cette thèse concerne le suivi dynamique de l'utilisation du logiciel. Nous proposons deux méthodes -- basées sur des représentations différentes des actions d'entrée -- pour répondre à deux problématiques industrielles distinctes : la prédiction de la prochaine action et la détection du risque de crash logiciel. Les deux méthodologies tirent parti de la structure récurrente des réseaux LSTM pour capturer les dépendances entre nos données séquentielles ainsi que leur capacité à traiter potentiellement différents types de représentations d'entrée pour les mêmes données. / Radiologists use medical imaging solutions on a daily basis for diagnosis. Improving user experience is a major line of the continuous effort to enhance the global quality and usability of software products. Monitoring applications enable to record the evolution of various software and system parameters during their use and in particular the successive actions performed by the users in the software interface. These interactions may be represented as sequences of actions. Based on this data, this work deals with two industrial topics: software crashes and software usability. Both topics imply on one hand understanding the patterns of use, and on the other developing prediction tools either to anticipate crashes or to dynamically adapt software interface according to users' needs. First, we aim at identifying crash root causes. It is essential in order to fix the original defects. For this purpose, we propose to use a binomial test to determine which type of patterns is the most appropriate to represent crash signatures. The improvement of software usability through customization and adaptation of systems to each user's specific needs requires a very good knowledge of how users use the software. In order to highlight the trends of use, we propose to group similar sessions into clusters. We compare 3 session representations as inputs of different clustering algorithms. The second contribution of our thesis concerns the dynamical monitoring of software use. We propose two methods -- based on different representations of input actions -- to address two distinct industrial issues: next action prediction and software crash risk detection. Both methodologies take advantage of the recurrent structure of LSTM neural networks to capture dependencies among our sequential data as well as their capacity to potentially handle different types of input representations for the same data. Exploration de motifs fréquents Représentations pour l’apprentissage Représentations d’action Clustering Réseaux de Neurones Récurrents LSTM Frequent pattern mining Representation learning Action embeddings Clustering LSTM Recurrent Neural Networks
76	From specialists to generalists : inductive biases of deep learning for higher level cognition Goyal, Anirudh 10 1900 (has links) Les réseaux de neurones actuels obtiennent des résultats de pointe dans une gamme de domaines problématiques difficiles. Avec suffisamment de données et de calculs, les réseaux de neurones actuels peuvent obtenir des résultats de niveau humain sur presque toutes les tâches. En ce sens, nous avons pu former des spécialistes capables d'effectuer très bien une tâche particulière, que ce soit le jeu de Go, jouer à des jeux Atari, manipuler le cube Rubik, mettre des légendes sur des images ou dessiner des images avec des légendes. Le prochain défi pour l'IA est de concevoir des méthodes pour former des généralistes qui, lorsqu'ils sont exposés à plusieurs tâches pendant l'entraînement, peuvent s'adapter rapidement à de nouvelles tâches inconnues. Sans aucune hypothèse sur la distribution génératrice de données, il peut ne pas être possible d'obtenir une meilleure généralisation et une meilleure adaptation à de nouvelles tâches (inconnues). Les réseaux de neurones actuels obtiennent des résultats de pointe dans une gamme de domaines problématiques difficiles. Une possibilité fascinante est que l'intelligence humaine et animale puisse être expliquée par quelques principes, plutôt qu'une encyclopédie de faits. Si tel était le cas, nous pourrions plus facilement à la fois comprendre notre propre intelligence et construire des machines intelligentes. Tout comme en physique, les principes eux-mêmes ne suffiraient pas à prédire le comportement de systèmes complexes comme le cerveau, et des calculs importants pourraient être nécessaires pour simuler l'intelligence humaine. De plus, nous savons que les vrais cerveaux intègrent des connaissances a priori détaillées spécifiques à une tâche qui ne pourraient pas tenir dans une courte liste de principes simples. Nous pensons donc que cette courte liste explique plutôt la capacité des cerveaux à apprendre et à s'adapter efficacement à de nouveaux environnements, ce qui est une grande partie de ce dont nous avons besoin pour l'IA. Si cette hypothèse de simplicité des principes était correcte, cela suggérerait que l'étude du type de biais inductifs (une autre façon de penser aux principes de conception et aux a priori, dans le cas des systèmes d'apprentissage) que les humains et les animaux exploitent pourrait aider à la fois à clarifier ces principes et à fournir source d'inspiration pour la recherche en IA. L'apprentissage en profondeur exploite déjà plusieurs biais inductifs clés, et mon travail envisage une liste plus large, en se concentrant sur ceux qui concernent principalement le traitement cognitif de niveau supérieur. Mon travail se concentre sur la conception de tels modèles en y incorporant des hypothèses fortes mais générales (biais inductifs) qui permettent un raisonnement de haut niveau sur la structure du monde. Ce programme de recherche est à la fois ambitieux et pratique, produisant des algorithmes concrets ainsi qu'une vision cohérente pour une recherche à long terme vers la généralisation dans un monde complexe et changeant. / Current neural networks achieve state-of-the-art results across a range of challenging problem domains. Given enough data, and computation, current neural networks can achieve human-level results on mostly any task. In the sense, that we have been able to train \textit{specialists} that can perform a particular task really well whether it's the game of GO, playing Atari games, Rubik's cube manipulation, image caption or drawing images given captions. The next challenge for AI is to devise methods to train \textit{generalists} that when exposed to multiple tasks during training can quickly adapt to new unknown tasks. Without any assumptions about the data generating distribution it may not be possible to achieve better generalization and adaption to new (unknown) tasks. A fascinating possibility is that human and animal intelligence could be explained by a few principles (rather than an encyclopedia). If that was the case, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human intelligence. In addition, we know that real brains incorporate some detailed task-specific a priori knowledge which could not fit in a short list of simple principles. So we think of that short list rather as explaining the ability of brains to learn and adapt efficiently to new environments, which is a great part of what we need for AI. If that simplicity of principles hypothesis was correct it would suggest that studying the kind of inductive biases (another way to think about principles of design and priors, in the case of learning systems) that humans and animals exploit could help both clarify these principles and provide inspiration for AI research. Deep learning already exploits several key inductive biases, and my work considers a larger list, focusing on those which concern mostly higher-level cognitive processing. My work focuses on designing such models by incorporating in them strong but general assumptions (inductive biases) that enable high-level reasoning about the structure of the world. This research program is both ambitious and practical, yielding concrete algorithms as well as a cohesive vision for long-term research towards generalization in a complex and changing world. Deep Learning Apprentissage en profondeur Traitement du langage naturel Apprentissage des représentations Modèles génératifs Modélisation du langage Natural Language Processing Representation Learning Generative Models Language Modeling
77	Multimodal Classification of Second-Hand E-Commerce Ads / Multimodal klassiciering av annonser på Second-Hand-Marknadsplatser Åberg, Ludvig January 2018 (has links) In second-hand e-commerce, categorization of new products is typically done by the seller. Automating this process makes it easier to upload ads and could lower the number of incorrectly categorized ads. Automatic ad categorization also makes it possible for a second-hand e-commerce platform to use a more detailed category system, which could make the shopping experience better for potential buyers. Product ad categorization is typically addressed as a text classification problem as most metadata associated with products are textual. By including image information, i.e. using a multimodal approach, better performance can however be expected. The work done in this thesis evaluates different multimodal deep learning models for the task of ad categorization on data from Blocket.se. We examine late fusion models, where the modalities are combined at decision level, and early fusion models, where the modalities are combined at feature level. We also introduce our own approach Text Based Visual Attention (TBVA), which extends the image CNN Inception v3 with an attention mechanism to incorporate textual information. For all models evaluated, the text classifier fastText is used to process text data and the Inception v3 network to process image data. Our results show that the late fusion models perform best in our setting. We conclude that these models generally learn which of the baseline models to ’trust’, while early fusion and the TBVA models learn more abstract concepts. As future work, we would like to examine how the TBVA models perform on other tasks, such as ad similarity. / Produkter som läggs ut på marknadsplatser, såsom Blocket.se, kategoriseras oftast av säljaren själv. Att automatisera processen för kategorisering gör det därför både enklare och snabbare att lägga upp annonser och kan minska antalet produkter med felaktig kategori. Automatisk kategorisering gör det ocksåmöjligt för marknadsplatsen att använda ett mer detaljerat kategorisystem, vilket skulle kunna effektivisera sökandet efter produkter för potentiella köpare.Produktkategorisering adresseras ofta som ett klassificeringsproblem för text, eftersom den största delen av produktinformationen finns i skriftlig form. Genom att också inkludera produktbilder kan vi dock förvänta oss bättre resultat.I den här uppsatsen evalueras olika metoder för att använda både bild och text för annonsklassificering av data från blocket.se. I synnerhetundersökslate fusion modeller, där informationen från modaliteterna kombineras i samband med klassificeringen, samt early fusion modeller, där modaliteterna istället kombineras på en abstrakt nivå innan klassificeringen. Vi introduserar också vår egen modell Text Based Visual Attention (TBVA), en utvidgning av bildklassificeraren Inception v3 [1], som använder en attention mekanism för att inkorporera textinformation. För alla modeller som beskrivs i denna uppsats används textklassificeraren fast Text[2] för att processa text och bildklassificeraren Inception v3 för att processa bild. Våra resultat visar att late fusion modeller presterar bäst med vår data. I slutsatsen konstateras att late fusion modellerna lär sig vilka fall den ska 'lita' på text eller bild informationen, där early fusion och TBVA modellerna istället lär sig mer abstrakta koncept. Som framtida arbete tror vi det skulle vara av värde att undersöka hur TBVA modellerna presterar på andra uppgifter, såsom att bedöma likheter mellan annonser. Machine Learning Classification Multimodal Classification Multimodal Learning Representation Learning Late Fusion Early Fusion Visual Attention Marketplace Second Hand E-commerce Blocket.se Computer Sciences Datavetenskap (datalogi)
78	Image-Text context relation using Machine Learning : Research on performance of different datasets Sun, Yuqi January 2022 (has links) Based on the progress in Computer Vision and Natural Language Processing fields, Vision-Language (VL) models are designed to process information from images and texts. The thesis focused on the performance of a model, Oscar, on different datasets. Oscar is a State-of-The-Art VL representation learning model based on a pre-trained model for Object Detection and a pre-trained Bert model. By comparing the performance of datasets, we could understand the relationship between the properties of datasets and the performance of models. The conclusions could provide the direction for future work on VL datasets and models. In this thesis, I collected five VL datasets that have at least one main difference from each other and generated 8 subsets from these datasets. I trained the same model with different subsets to classify whether an image is related to a text. In common sense, clear datasets have better performance because their images are of everyday scenes and annotated by human annotators. Thus, the size of clear datasets is always limited. However, an interesting phenomenon in the thesis is that the dataset generated by models trained on different datasets has achieved as good performance as clear datasets. This would encourage the research on models for data collection. The experiment results also indicated that future work on the VL model could focus on improving feature extraction from images, as the images have a great influence on the performance of VL models. / Baserat på prestationerna inom Computer Vision och Natural Language Processing-fält, är Vision-Language (VL)-modeller utformade för att bearbeta information från bilder och texter. Projektet fokuserade på prestanda av en modell, Oscar, på olika datamängder. Oscar är en State-of-The-Art VL-representationsinlärningsmodell baserad på en förutbildad modell för Objektdetektion och en förutbildad Bert-modell. Genom att jämföra datauppsättningarnas prestanda kunde vi förstå sambandet mellan datauppsättningarnas egenskaper och modellernas prestanda. Slutsatserna skulle kunna ge riktning för framtida arbete med VL-datauppsättningar och modeller. I detta projekt samlade jag fem VL-datauppsättningar som har minst en huvudskillnad från varandra och genererade 8 delmängder från dessa datauppsättningar. Jag tränade samma modell med olika delmängder för att klassificera om en bild är relaterad till en text. I sunt förnuft har tydliga datauppsättningar bättre prestanda eftersom deras bilder är av vardagliga scener och kommenterade av människor. Storleken på tydliga datamängder är därför alltid begränsad. Ett intressant fenomen i projektet är dock att den datauppsättning som genereras av modeller har uppnått lika bra prestanda som tydliga datauppsättningar. Detta skulle uppmuntra forskning om modeller för datainsamling. Experimentresultaten indikerade också att framtida arbete med VL-modellen kan fokusera på att förbättra funktionsextraktion från bilder, eftersom bilderna har ett stort inflytande på prestandan hos VL-modeller. Vision-Language Representation learning Bert Faster R-CNN Oscar Datasets Visual-Language Representation inlärning Bert Faster R-CNN Oscar Datamängder Computer and Information Sciences Data- och informationsvetenskap
79	Feature extraction with self-supervised learning on eye-tracking data from Parkinson’s patients and healthy individuals / Extrahering av särdrag med hjälp av självövervakande maskininlärning applicerad på ögonrörelsedata från parkinsonpatienter och friska försökspersoner. Bergman, Leo January 2022 (has links) Eye-tracking is a method for monitoring and measuring eye movements. The technology has had a significant impact so far and new application areas are emerging. Today, the technology is used in the gaming industry, health industry, self-driving cars, and not least in medicine. In the latter, large research resources are invested to investigate the extent to which eye-tracking can help with disease diagnostics. One disease of interest is Parkinson’s disease, a neuro-degenerative disease in which the dopamine production in nerve cells is destroyed. This leads to detoriating nerve signal transmission, which in turn affects the motor skills. One of the affected motor functions associated with PD is the oculomotor function, affecting the eye function. The declination can be observed clinically by physicians, however eye-tracking technology has a high potential here, but it remains to investigate which methodology and which test protocols are relevant to study and to what extent the technology can be used as a diagnostic tool. A novel class of algorithms for finding representations of data is called self-supervised learning (SSL). The class of algorithms seems to have a high potential in terms of categorizing biomarkers. This thesis examines to which extent an SSL network can learn representations of eye-tracking data on Parkinson’s patients, in order to distinguish between healthy and sick, patients on and off medication. The result suggests that the network does not succeed in learning distinct differences between groups. Furthermore, no difference is observed in the result when we in the model take into account the task-specific target information that the subjects are following. Today in the UK approximately 26 percent of Parkinson’s patients are misdiagnosed. In the initial state of the disease, the misdiagnosis is even higher. Potentially, the method can be used as a complement to regular diagnosis in different stages of the disease. This would provide better conditions for the patient as well as for medical and pharmaceutical research. The method also has the potential to reduce physicians’ workload. / Eye-tracking eller ögonrörelsemätning som är den svenska termen, är en metod för att följa och mäta ögats rörelser. Tekniken har fått en betydande genomslagskraft hittills och nya applikationsområden dyker upp titt som tätt. Idag används tekniken inom spelindustrin, hälsa, i självkörande bilar och inte minst inom medicin. Inom det senare läggs idag stora forskningsresurser för att undersöka i vilken utsträckning eye-tracking kan hjälpa till att diagnosticera sjukdomar. En sjukdom av intresse är Parkinson’s sjukdom, vilket är en neurodegenerativ sjukdom där dopaminproduktionen i nervceller förstörs. Det leder till att transmissionen av nervsignaler försämras som i sin tur gör att motoriken påverkas vilket bland annat leder till en nedsättning i ögats motorik. Det är något som man idag kan observera kliniskt, eye-tracking teknik har här en hög potential men det återstår att undersöka vilken metodik och vilka testprotokoll som är relevanta att undersöka och i vilken grad tekniken kan användas som ett diagnostiskt verktyg. En ny typ av algoritmer för att hitta representationer av data kallas för self-supervised learning (SSL), dessa algoritmer verkar ha en hög potential vad gäller kategorisering av biomarkörer. I denna uppsats undersöks i vilken grad ett SSL-nätverk kan lära sig representationer av eye-tracking data på Parkinson’s patienter för att kunna särskilja mellan friska och sjuka, medicinerade och omedicinerade. Resultatet är att nätverket inte lyckas lära sig skiljaktigheter mellan dessa klasser. Vidare noteras ingen skillnad i resultatet då vi i modellen tar hänsyn till de specifika uppgifterna som försökspersonerna fått. Idag får 30 procent av parkinsonpatienterna fel diagnos. I ett initialt tillstånd av sjukdomen är feldiagnosticeringen ännu högre. Potentiellt kan metoden användas som komplement till diagnosticering i olika skeden av sjukdomen. Detta skulle ge bättre förutsättningar för såväl patienten som för den medicinska och farmaceutiska forskningen. Metoden har dessutom potential att minska läkares arbetsbörda. Eye-tracking Representation learning Self-supervised learning Parkinson’s disease Feature extraction Clustering analysis Ögonspårning Särdragsextraktion Parkinsonssjukdom Representationsinlärning Maskininlärning Klustring Computer Sciences Datavetenskap (datalogi)
80	Learning and planning with noise in optimization and reinforcement learning Thomas, Valentin 06 1900 (has links) La plupart des algorithmes modernes d'apprentissage automatique intègrent un certain degré d'aléatoire dans leurs processus, que nous appellerons le bruit, qui peut finalement avoir un impact sur les prédictions du modèle. Dans cette thèse, nous examinons de plus près l'apprentissage et la planification en présence de bruit pour les algorithmes d'apprentissage par renforcement et d'optimisation. Les deux premiers articles présentés dans ce document se concentrent sur l'apprentissage par renforcement dans un environnement inconnu, et plus précisément sur la façon dont nous pouvons concevoir des algorithmes qui utilisent la stochasticité de leur politique et de l'environnement à leur avantage. Notre première contribution présentée dans ce document se concentre sur le cadre de l'apprentissage par renforcement non supervisé. Nous montrons comment un agent laissé seul dans un monde inconnu sans but précis peut apprendre quels aspects de l'environnement il peut contrôler indépendamment les uns des autres, ainsi qu'apprendre conjointement une représentation latente démêlée de ces aspects que nous appellerons \emph{facteurs de variation}. La deuxième contribution se concentre sur la planification dans les tâches de contrôle continu. En présentant l'apprentissage par renforcement comme un problème d'inférence, nous empruntons des outils provenant de la littérature sur les m\'thodes de Monte Carlo séquentiel pour concevoir un algorithme efficace et théoriquement motiv\'{e} pour la planification probabiliste en utilisant un modèle appris du monde. Nous montrons comment l'agent peut tirer parti de note objectif probabiliste pour imaginer divers ensembles de solutions. Les deux contributions suivantes analysent l'impact du bruit de gradient dû à l'échantillonnage dans les algorithmes d'optimisation. La troisième contribution examine le rôle du bruit de l'estimateur du gradient dans l'estimation par maximum de vraisemblance avec descente de gradient stochastique, en explorant la relation entre la structure du bruit du gradient et la courbure locale sur la généralisation et la vitesse de convergence du modèle. Notre quatrième contribution revient sur le sujet de l'apprentissage par renforcement pour analyser l'impact du bruit d'échantillonnage sur l'algorithme d'optimisation de la politique par ascension du gradient. Nous constatons que le bruit d'échantillonnage peut avoir un impact significatif sur la dynamique d'optimisation et les politiques découvertes en apprentissage par renforcement. / Most modern machine learning algorithms incorporate a degree of randomness in their processes, which we will refer to as noise, which can ultimately impact the model's predictions. In this thesis, we take a closer look at learning and planning in the presence of noise for reinforcement learning and optimization algorithms. The first two articles presented in this document focus on reinforcement learning in an unknown environment, specifically how we can design algorithms that use the stochasticity of their policy and of the environment to their advantage. Our first contribution presented in this document focuses on the unsupervised reinforcement learning setting. We show how an agent left alone in an unknown world without any specified goal can learn which aspects of the environment it can control independently from each other as well as jointly learning a disentangled latent representation of these aspects, or factors of variation. The second contribution focuses on planning in continuous control tasks. By framing reinforcement learning as an inference problem, we borrow tools from Sequential Monte Carlo literature to design a theoretically grounded and efficient algorithm for probabilistic planning using a learned model of the world. We show how the agent can leverage the uncertainty of the model to imagine a diverse set of solutions. The following two contributions analyze the impact of gradient noise due to sampling in optimization algorithms. The third contribution examines the role of gradient noise in maximum likelihood estimation with stochastic gradient descent, exploring the relationship between the structure of the gradient noise and local curvature on the generalization and convergence speed of the model. Our fourth contribution returns to the topic of reinforcement learning to analyze the impact of sampling noise on the policy gradient algorithm. We find that sampling noise can significantly impact the optimization dynamics and policies discovered in on-policy reinforcement learning. Deep reinforcement learning Planning Stochastic optimization Generalization Control as inference Representation learning Optimisation stochastique Apprentissage par renforcement Apprentissage de representations Planification

Search results