Global ETD Search

31	Imitation Learning on Branching Strategies for Branch and Bound Problems / Imitationsinlärning av Grenstrategier för Branch and Bound-Problem Axén, Magnus January 2023 (has links) A new branch of machine and deep learning models has evolved in constrained optimization, specifically in mixed integer programming problems (MIP). These models draw inspiration from earlier solver methods, primarily the heuristic, branch and bound. While utilizing the branch and bound framework, machine and deep learning models enhance either the computational efficiency or performance of the model. This thesis examines how imitating different variable selection strategies of classical MIP solvers behave on a state-of-the-art deep learning model. A recently developed deep learning algorithm is used in this thesis, which represents the branch and bound state as a bipartite graph. This graph serves as the input to a graph network model, which determines the variable in the MIP on which branching occurs. This thesis compares how imitating different classical branching strategies behaves on different algorithm outputs and, most importantly, time span. More specifically, this thesis conducts an empirical study on a MIP known as the facility location problem (FLP) and compares the different methods for imitation. This thesis shows that the deep learning algorithm can outperform the classical methods in terms of time span. More specifically, imitating the branching strategies resulting in small branch and bound trees give rise to a more rapid performance in finding the global optimum. Lastly, it is shown that a smaller embedding size in the network model is preferred for these instances when looking at the trade-off between variable selection and time cost. / En ny typ av maskin och djupinlärningsmodeller har utvecklats inom villkors optimering, specifikt för så kallade blandade heltalsproblem (MIP). Dessa modeller hämtar inspiration från tidigare lösningsmetoder, främst en heuristisk som kallas “branch and bound”. Genom att använda “branch and bound” ramverket förbättrar maskin och djupinlärningsmodeller antingen beräkningshastigheten eller prestandan hos modellen. Denna uppsats undersöker hur imitation av olika strategier för val av variabler från klassiska MIP-algoritmer beter sig på en modern djupinlärningsmodell. I denna uppsats används en nyligen utvecklad djupinlärningsalgoritm som representerar “branch and bound” tillståndet som en bipartit graf. Denna graf används som indata till en “graph network” modell som avgör vilken variabel i MIP-problemet som tas hänsyn till. Uppsatsen jämför hur imitation av olika klassiska “branching” strategier påverkar olika algoritmutgångar, framför allt, tidslängd. Mer specifikt utför denna uppsats en empirisk studie på ett MIP-problem som kallas för “facility location problem” (FLP) och jämför imitationen av de olika metoderna. I denna uppsats visas det att denna djupinlärningsalgoritm kan överträffa de klassiska metoderna när det gäller tidslängd. Mer specifikt ger imitation av “branching” strategier som resulterar i små “branch and bound” träd upphov till en snabbare prestation vid sökning av den globala optimala lösningen. Slutligen visas det att en mindre inbäddningsstorlek i nätverksmodellen föredras i dessa fall när man ser på avvägningen mellan val av variabler och tidskostnad. Graph Networks Convolutions MIP Branch and Bound Facility Location Problem MDP Imitation Learning Graf nätverk Faltning Blandade heltaltsproblem Branch and Bound Facility Location Problem Markov Imitationsinlärning Other Mathematics Annan matematik
32	Emergence of language-like latents in deep neural networks Lu, Yuchen 05 1900 (has links) L'émergence du langage est considérée comme l'une des marques de l'intelligence humaine. Par conséquent, nous émettons l'hypothèse que l'émergence de latences ou de représentations similaires au langage dans un système d'apprentissage profond pourrait aider les modèles à obtenir une meilleure généralisation compositionnelle et hors distribution. Dans cette thèse, nous présentons une série d'articles qui explorent cette hypothèse dans différents domaines, notamment l'apprentissage interactif du langage, l'apprentissage par imitation et la vision par ordinateur. / The emergence of language is regarded as one of the hallmarks of human intelligence. Therefore, we hypothesize that the emergence of language-like latents or representations in a deep learning system could help models achieve better compositional and out-of-distribution generalization. In this thesis, we present a series of papers that explores this hypothesis in different fields including interactive language learning, imitation learning and computer vision. Deep Learning Language Emergence Compositionality Imitation Learning Self-supervised Learning Apprentissage Profond Émergence du Langage Compositionnalité Apprentissage par Imitation Apprentissage Auto-supervisé
33	Offline Reinforcement Learning for Downlink Link Adaption : A study on dataset and algorithm requirements for offline reinforcement learning. / Offline Reinforcement Learning för nedlänksanpassning : En studie om krav på en datauppsättning och algoritm för offline reinforcement learning Dalman, Gabriella January 2024 (has links) This thesis studies offline reinforcement learning as an optimization technique for downlink link adaptation, which is one of many control loops in Radio access networks. The work studies the impact of the quality of pre-collected datasets, in terms of how much the data covers the state-action space and whether it is collected by an expert policy or not. The data quality is evaluated by training three different algorithms: Deep Q-networks, Critic regularized regression, and Monotonic advantage re-weighted imitation learning. The performance is measured for each combination of algorithm and dataset, and their need for hyperparameter tuning and sample efficiency is studied. The results showed Critic regularized regression to be the most robust because it could learn well from any of the datasets that were used in the study and did not require extensive hyperparameter tuning. Deep Q-networks required careful hyperparameter tuning, but paired with the expert data it managed to reach rewards equally as high as the agents trained with Critic Regularized Regression. Monotonic advantage re-weighted imitation learning needed data from an expert policy to reach a high reward. In summary, offline reinforcement learning can perform with success in a telecommunication use case such as downlink link adaptation. Critic regularized regression was the preferred algorithm because it could perform great with all the three different datasets presented in the thesis. / Denna avhandling studerar offline reinforcement learning som en optimeringsteknik för nedlänks länkanpassning, vilket är en av många kontrollcyklar i radio access networks. Arbetet undersöker inverkan av kvaliteten på förinsamlade dataset, i form av hur mycket datan täcker state-action rymden och om den samlats in av en expertpolicy eller inte. Datakvaliteten utvärderas genom att träna tre olika algoritmer: Deep Q-nätverk, Critic regularized regression och Monotonic advantage re-weighted imitation learning. Prestanda mäts för varje kombination av algoritm och dataset, och deras behov av hyperparameterinställning och effektiv användning av data studeras. Resultaten visade att Critic regularized regression var mest robust, eftersom att den lyckades lära sig mycket från alla dataseten som användes i studien och inte krävde omfattande hyperparameterinställning. Deep Q-nätverk krävde noggrann hyperparameterinställning och tillsammans med expertdata lyckades den nå högst prestanda av alla agenter i studien. Monotonic advantage re-weighted imitation learning behövde data från en expertpolicy för att lyckas lära sig problemet. Det datasetet som var mest framgångsrikt var expertdatan. Sammanfattningsvis kan offline reinforcement learning vara framgångsrik inom telekommunikation, specifikt nedlänks länkanpassning. Critic regularized regression var den föredragna algoritmen för att den var stabil och kunde prestera bra med alla tre olika dataseten som presenterades i avhandlingen. Offline Reinforcement Learning Downlink Link adaptation data analysis Deep Q-networks Critic Regularized Regression Offline Reinforcement Learning nedlänksanpassning data analys Deep Qnetworks Critic Regularized Regression Computer and Information Sciences Data- och informationsvetenskap
34	Un robot curieux pour l’apprentissage actif par babillage d’objectifs : choisir de manière stratégique quoi, comment, quand et de qui apprendre / A Curious Robot Learner for Interactive Goal-Babbling : Strategically Choosing What, How, When and from Whom to Learn Nguyen, Sao Mai 27 November 2013 (has links) Les déﬁs pour voir des robots opérant dans l’environnement de tous les jours des humains et sur unelongue durée soulignent l’importance de leur adaptation aux changements qui peuvent être imprévisiblesau moment de leur construction. Ils doivent être capable de savoir quelles parties échantillonner, et quelstypes de compétences il a intérêt à acquérir. Une manière de collecter des données est de décider par soi-même où explorer. Une autre manière est de se référer à un mentor. Nous appelons ces deux manièresde collecter des données des modes d’échantillonnage. Le premier mode d’échantillonnage correspondà des algorithmes développés dans la littérature pour automatiquement pousser l’agent vers des partiesintéressantes de l’environnement ou vers des types de compétences utiles. De tels algorithmes sont appelésdes algorithmes de curiosité artiﬁcielle ou motivation intrinsèque. Le deuxième mode correspond au guidagesocial ou l’imitation, où un partenaire humain indique où explorer et où ne pas explorer.Nous avons construit une architecture algorithmique intrinsèquement motivée pour apprendre commentproduire par ses actions des eﬀets et conséquences variées. Il apprend de manière active et en ligne encollectant des données qu’il choisit en utilisant plusieurs modes d’échantillonnage. Au niveau du metaapprentissage, il apprend de manière active quelle stratégie d’échantillonnage est plus eﬃcace pour améliorersa compétence et généraliser à partir de son expérience à un grand éventail d’eﬀets. Par apprentissage parinteraction, il acquiert de multiples compétences de manière structurée, en découvrant par lui-même lesséquences développementale. / The challenges posed by robots operating in human environments on a daily basis and in the long-termpoint out the importance of adaptivity to changes which can be unforeseen at design time. The robot mustlearn continuously in an open-ended, non-stationary and high dimensional space. It must be able to knowwhich parts to sample and what kind of skills are interesting to learn. One way is to decide what to exploreby oneself. Another way is to refer to a mentor. We name these two ways of collecting data sampling modes.The ﬁrst sampling mode correspond to algorithms developed in the literature in order to autonomously drivethe robot in interesting parts of the environment or useful kinds of skills. Such algorithms are called artiﬁcialcuriosity or intrinsic motivation algorithms. The second sampling mode correspond to social guidance orimitation where the teacher indicates where to explore as well as where not to explore. Starting fromthe study of the relationships between these two concurrent methods, we ended up building an algorithmicarchitecture with a hierarchical learning structure, called Socially Guided Intrinsic Motivation (SGIM).We have built an intrinsically motivated active learner which learns how its actions can produce variedconsequences or outcomes. It actively learns online by sampling data which it chooses by using severalsampling modes. On the meta-level, it actively learns which data collection strategy is most eﬃcient forimproving its competence and generalising from its experience to a wide variety of outcomes. The interactivelearner thus learns multiple tasks in a structured manner, discovering by itself developmental sequences. Apprentissage actif Apprentissage interactif Apprentissage par imitation Exploration orientée par objectifs Collecte de données Apprentissage par démonstration Active learning Interactive learning Imitation learning Goal-oriented exploration Data-collection, exploration Programming by demonstration
35	Einen Roboter das Fahren Lehren - ein auf Fähigkeitslernen basierter Ansatz / Teaching a Robot to Drive - A Skill Learning Inspired Approach Markelic, Irene 06 August 2010 (has links) No description available. Fähigkeitslernen Maschinenlernen autonomes Fahren Roboter Kontrolle Imitationslernen Fahrerassistenz visuelle interne Modelle skill learning machine learning autonomous driving robot control programming by demonstration imitation learning driver assistance visual internal models
36	Estrat?gias baseadas em aprendizado para coordena??o de uma frota de rob?s em tarefas cooperativas Aranibar, Dennis Barrios 14 October 2005 (has links) Made available in DSpace on 2014-12-17T14:56:04Z (GMT). No. of bitstreams: 1 DennisBA.pdf: 1210954 bytes, checksum: f42a19fb396d47e801ab673ab1f88887 (MD5) Previous issue date: 2005-10-14 / Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico / In multi-robot systems, both control architecture and work strategy represent a challenge for researchers. It is important to have a robust architecture that can be easily adapted to requirement changes. It is also important that work strategy allows robots to complete tasks efficiently, considering that robots interact directly in environments with humans. In this context, this work explores two approaches for robot soccer team coordination for cooperative tasks development. Both approaches are based on a combination of imitation learning and reinforcement learning. Thus, in the first approach was developed a control architecture, a fuzzy inference engine for recognizing situations in robot soccer games, a software for narration of robot soccer games based on the inference engine and the implementation of learning by imitation from observation and analysis of others robotic teams. Moreover, state abstraction was efficiently implemented in reinforcement learning applied to the robot soccer standard problem. Finally, reinforcement learning was implemented in a form where actions are explored only in some states (for example, states where an specialist robot system used them) differently to the traditional form, where actions have to be tested in all states. In the second approach reinforcement learning was implemented with function approximation, for which an algorithm called RBF-Sarsa($lambda$) was created. In both approaches batch reinforcement learning algorithms were implemented and imitation learning was used as a seed for reinforcement learning. Moreover, learning from robotic teams controlled by humans was explored. The proposal in this work had revealed efficient in the robot soccer standard problem and, when implemented in other robotics systems, they will allow that these robotics systems can efficiently and effectively develop assigned tasks. These approaches will give high adaptation capabilities to requirements and environment changes. / Em sistemas multi-rob?s a arquitetura de controle e a estrat?gia de trabalho representam um desafio para os pesquisadores. ? importante que a arquitetura de controle seja robusta, de forma que se adapte naturalmente ?s mudan?as nas caracter?sticas do problema e tamb?m que a estrat?gia de trabalho permita aos rob?s desenvolver as tarefas atribu?das eficaz e eficientemente, levando em considera??o a restri??o de que os rob?s v?o interagir diretamente em ambientes povoados de seres humanos. Neste contexto, este trabalho explora duas abordagens para a coordena??o de uma frota de rob?s desenvolvendo tarefas cooperativas. Ambas as abordagens s?o baseadas em uma mistura de aprendizado por imita??o e por experi?ncia. Assim, na primeira abordagem desenvolveu-se uma arquitetura de controle, uma m?quina de infer?ncia difusa para reconhecimento de fatos em jogos de futebol, um software narrador de jogos baseado na m?quina de infer?ncia difusa, e a implementa??o de aprendizado por imita??o a partir de observa??o e an?lise de outros times rob?ticos. Al?m disso, aplicou-se eficientemente abstra??o de estados em aprendizado por refor?o no problema padr?o de futebol de rob?s. Finalmente, o aprendizado por refor?o foi implementado de forma que as a??es somente s?o executadas em certos estados (por exemplo os estados onde algum sistema rob?tico especialista j? as utilizou) diferentemente da forma tradicional onde as a??es no banco de conhecimento t?m que ser testadas em todos os estados. No caso da segunda abordagem, implementou-se aprendizado por refor?o com aproxima??o de fun??es, para o que foi criado um algoritmo chamado RBF-Sarsa($lambda$). Em ambas as abordagens implementou-se o aprendizado por refor?o em lotes e o aprendizado por imita??o como semente para aprendizado por refor?o. Al?m disso, explorou-se o aprendizado com times de rob?s controlados por seres humanos. As propostas deste trabalho mostraram-se eficientes no problema padr?o de futebol de rob?s, e ao serem implementadas em outros sistemas rob?ticos permitir?o que os mesmos sejam eficazes e eficientes no desenvolvimento das tarefas atribu?das com um alto grau de adapta??o ?s mudan?as dos requerimentos e do ambiente. Sistemas rob?ticos aut?nomos Sistemas multi-rob?s Aprendizado por reforzo Aprendizado por imita??o Autonomous robots systems Multi-robot systems Reinforcement learning Imitation learning CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
37	Learning Continuous Human-Robot Interactions from Human-Human Demonstrations Vogt, David 02 March 2018 (has links) (PDF) In der vorliegenden Dissertation wurde ein datengetriebenes Verfahren zum maschinellen Lernen von Mensch-Roboter Interaktionen auf Basis von Mensch-Mensch Demonstrationen entwickelt. Während einer Trainingsphase werden Bewegungen zweier Interakteure mittels Motion Capture erfasst und in einem Zwei-Personen Interaktionsmodell gelernt. Zur Laufzeit wird das Modell sowohl zur Erkennung von Bewegungen des menschlichen Interaktionspartners als auch zur Generierung angepasster Roboterbewegungen eingesetzt. Die Leistungsfähigkeit des Ansatzes wird in drei komplexen Anwendungen evaluiert, die jeweils kontinuierliche Bewegungskoordination zwischen Mensch und Roboter erfordern. Das Ergebnis der Dissertation ist ein Lernverfahren, das intuitive, zielgerichtete und sichere Kollaboration mit Robotern ermöglicht. Mensch-Roboter Interaktion Imitationslernen Maschinelles Lernen Interaktionslernen MRI Mensch-Roboter Kollaboration Virtuelle Realität Human-Robot Interaction Imitation Learning Machine Learning Interaction Learning HRI Human-Robot Collaboration Virtual Reality ddc:004 Mensch-Maschine-Kommunikation Roboter Maschinelles Lernen Modelllernen Programmierung durch Vormachen
38	Imitation from observation using behavioral learning Djeafea Sonwa, Medric B. 11 1900 (has links) L'Imitation par observation (IPO) est un paradigme d'apprentissage qui consiste à entraîner des agents autonomes dans un processus de décision markovien (PDM) en observant les démonstrations d'un expert et sans avoir accès à ses actions. Ces démonstrations peuvent être des séquences d'états de l'environnement ou des observations visuelles brutes de l'environnement. Bien que le cadre utilisant des états à dimensions réduites ait permis d'obtenir des résultats convaincants avec des approches récentes, l'utilisation d'observations visuelles reste un défi important en IPO. Une des procédures très adoptée pour résoudre le problème d’IPO consiste à apprendre une fonction de récompense à partir des démonstrations, toutefois la nécessité d’analyser l'environnement et l'expert à partir de vidéos pour récompenser l'agent augmente la complexité du problème. Nous abordons ce problème avec une méthode basée sur la représentation des comportements de l'agent dans un espace vectoriel en utilisant des vidéos démonstratives. Notre approche exploite les techniques récentes d'apprentissage contrastif d'images et vidéos et utilise un algorithme de bootstrapping pour entraîner progressivement une fonction d'encodage de trajectoires à partir de la variation du comportement de l'agent. Simultanément, cette fonction récompense l'agent imitateur lors de l'exécution d'un algorithme d'apprentissage par renforcement. Notre méthode utilise un nombre limité de vidéos démonstratives et nous n'avons pas accès à comportement expert. Nos agents imitateurs montrent des performances convaincantes sur un ensemble de tâches de contrôle et démontrent que l'apprentissage d'une fonction de codage du comportement à partir de vidéos permet de construire une fonction de récompense efficace dans un PDM. / Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process (MDP) by observing an expert's demonstrations and without access to its actions. These demonstrations could be sequences of environment states or raw visual observations of the environment. Although the setting using low-dimensional states has allowed obtaining convincing results with recent approaches, the use of visual observations remains an important challenge in IfO. One of the most common procedures adopted to solve the IfO problem is to learn a reward function from the demonstrations, but the need to understand the environment and the expert's moves through videos to appropriately reward the learning agent increases the complexity of the problem. We approach this problem with a method that focuses on the representation of the agent’s behaviors in a latent space using demonstrative videos. Our approach exploits recent techniques of contrastive learning of image and video and uses a bootstrapping algorithm to progressively train a trajectory encoding function from the variation of the agent’s policy. Simultaneously, this function rewards the imitating agent through a Reinforcement Learning (RL) algorithm. Our method uses a limited number of demonstrative videos and we do not have access to any expert policy. Our imitating agents in experiments show convincing performances on a set of control tasks and demonstrate that learning a behavior encoding function from videos allows for building an efficient reward function in MDP. Apprentissage par renforcement Apprentissage par imitation Imitation par observation Apprentissage contrastif Reconnaissance d'actions Reinforcement learning Imitation learning Imitation from observation Contrastive learning Action recognition
39	Learning to compare nodes in branch and bound with graph neural networks Labassi, Abdel Ghani 08 1900 (has links) En informatique, la résolution de problèmes NP-difficiles en un temps raisonnable est d’une grande importance : optimisation de la chaîne d’approvisionnement, planification, routage, alignement de séquences biologiques multiples, inference dans les modèles graphiques pro- babilistes, et même certains problèmes de cryptographie sont tous des examples de la classe NP-complet. En pratique, nous modélisons beaucoup d’entre eux comme un problème d’op- timisation en nombre entier, que nous résolvons à l’aide de la méthodologie séparation et évaluation. Un algorithme de ce style divise un espace de recherche pour l’explorer récursi- vement (séparation), et obtient des bornes d’optimalité en résolvant des relaxations linéaires sur les sous-espaces (évaluation). Pour spécifier un algorithme, il faut définir plusieurs pa- ramètres, tel que la manière d’explorer les espaces de recherche, de diviser une recherche l’espace une fois exploré, ou de renforcer les relaxations linéaires. Ces politiques peuvent influencer considérablement la performance de résolution. Ce travail se concentre sur une nouvelle manière de dériver politique de recherche, c’est à dire le choix du prochain sous-espace à séparer étant donné une partition en cours, en nous servant de l’apprentissage automatique profond. Premièrement, nous collectons des données résumant, sur une collection de problèmes donnés, quels sous-espaces contiennent l’optimum et quels ne le contiennent pas. En représentant ces sous-espaces sous forme de graphes bipartis qui capturent leurs caractéristiques, nous entraînons un réseau de neurones graphiques à déterminer la probabilité qu’un sous-espace contienne la solution optimale par apprentissage supervisé. Le choix d’un tel modèle est particulièrement utile car il peut s’adapter à des problèmes de différente taille sans modifications. Nous montrons que notre approche bat celle de nos concurrents, consistant à des modèles d’apprentissage automatique plus simples entraînés à partir des statistiques du solveur, ainsi que la politique par défaut de SCIP, un solveur open-source compétitif, sur trois familles NP-dures: des problèmes de recherche de stables de taille maximum, de flots de réseau multicommodité à charge fixe, et de satisfiabilité maximum. / In computer science, solving NP-hard problems in a reasonable time is of great importance, such as in supply chain optimization, scheduling, routing, multiple biological sequence align- ment, inference in probabilistic graphical models, and even some problems in cryptography. In practice, we model many of them as a mixed integer linear optimization problem, which we solve using the branch and bound framework. An algorithm of this style divides a search space to explore it recursively (branch) and obtains optimality bounds by solving linear relaxations in such sub-spaces (bound). To specify an algorithm, one must set several pa- rameters, such as how to explore search spaces, how to divide a search space once it has been explored, or how to tighten these linear relaxations. These policies can significantly influence resolution performance. This work focuses on a novel method for deriving a search policy, that is, a rule for select- ing the next sub-space to explore given a current partitioning, using deep machine learning. First, we collect data summarizing which subspaces contain the optimum, and which do not. By representing these sub-spaces as bipartite graphs encoding their characteristics, we train a graph neural network to determine the probability that a subspace contains the optimal so- lution by supervised learning. The choice of such design is particularly useful as the machine learning model can automatically adapt to problems of different sizes without modifications. We show that our approach beats the one of our competitors, consisting of simpler machine learning models trained from solver statistics, as well as the default policy of SCIP, a state- of-the-art open-source solver, on three NP-hard benchmarks: generalized independent set, fixed-charge multicommodity network flow, and maximum satisfiability problems. Optimisation combinatoire Séparation et évaluation Recherche de solutions Plongement-à-l’optimum Apprentissage par imitation Réseaux de neurones graphiques Combinatorial Optimization Branch and Bound Solution Search Diving- to-Optimum Imitation Learning Graph Neural Networks
40	Learning Continuous Human-Robot Interactions from Human-Human Demonstrations Vogt, David 02 March 2018 (has links) In der vorliegenden Dissertation wurde ein datengetriebenes Verfahren zum maschinellen Lernen von Mensch-Roboter Interaktionen auf Basis von Mensch-Mensch Demonstrationen entwickelt. Während einer Trainingsphase werden Bewegungen zweier Interakteure mittels Motion Capture erfasst und in einem Zwei-Personen Interaktionsmodell gelernt. Zur Laufzeit wird das Modell sowohl zur Erkennung von Bewegungen des menschlichen Interaktionspartners als auch zur Generierung angepasster Roboterbewegungen eingesetzt. Die Leistungsfähigkeit des Ansatzes wird in drei komplexen Anwendungen evaluiert, die jeweils kontinuierliche Bewegungskoordination zwischen Mensch und Roboter erfordern. Das Ergebnis der Dissertation ist ein Lernverfahren, das intuitive, zielgerichtete und sichere Kollaboration mit Robotern ermöglicht. info:eu-repo/classification/ddc/004 ddc:004

Search results