Spelling suggestions: "subject:"inverse reinforcement 1earning"" "subject:"inverse reinforcement c1earning""
1 |
Inverse Reinforcement Learning and Routing Metric DiscoveryShiraev, Dmitry Eric 01 September 2003 (has links)
Uncovering the metrics and procedures employed by an autonomous networking system is an important problem with applications in instrumentation, traffic engineering, and game-theoretic studies of multi-agent environments. This thesis presents a method for utilizing inverse reinforcement learning (IRL)techniques for the purpose of discovering a composite metric used by a dynamic routing algorithm on an Internet Protocol (IP) network. The network and routing algorithm are modeled as a reinforcement learning (RL) agent and a Markov decision process (MDP). The problem of routing metric discovery is then posed as a problem of recovering the reward function, given observed optimal behavior. We show that this approach is empirically suited for determining the relative contributions of factors that constitute a composite metric. Experimental results for many classes of randomly generated networks are presented. / Master of Science
|
2 |
Nonparametric Inverse Reinforcement Learning and Approximate Optimal Control with Temporal Logic TasksPerundurai Rajasekaran, Siddharthan 30 August 2017 (has links)
"This thesis focuses on two key problems in reinforcement learning: How to design reward functions to obtain intended behaviors in autonomous systems using the learning-based control? Given complex mission specification, how to shape the reward function to achieve fast convergence and reduce sample complexity while learning the optimal policy? To answer these questions, the first part of this thesis investigates inverse reinforcement learning (IRL) method with a purpose of learning a reward function from expert demonstrations. However, existing algorithms often assume that the expert demonstrations are generated by the same reward function. Such an assumption may be invalid as one may need to aggregate data from multiple experts to obtain a sufficient set of demonstrations. In the first and the major part of the thesis, we develop a novel method, called Non-parametric Behavior Clustering IRL. This algorithm allows one to simultaneously cluster behaviors while learning their reward functions from demonstrations that are generated from more than one expert/behavior. Our approach is built upon the expectation-maximization formulation and non-parametric clustering in the IRL setting. We apply the algorithm to learn, from driving demonstrations, multiple driver behaviors (e.g., aggressive vs. evasive driving behaviors). In the second task, we study whether reinforcement learning can be used to generate complex behaviors specified in formal logic — Linear Temporal Logic (LTL). Such LTL tasks may specify temporally extended goals, safety, surveillance, and reactive behaviors in a dynamic environment. We introduce reward shaping under LTL constraints to improve the rate of convergence in learning the optimal and probably correct policies. Our approach exploits the relation between reward shaping and actor-critic methods for speeding up the convergence and, as a consequence, reducing training samples. We integrate compositional reasoning in formal methods with actor-critic reinforcement learning algorithms to initialize a heuristic value function for reward shaping. This initialization can direct the agent towards efficient planning subject to more complex behavior specifications in LTL. The investigation takes the initial step to integrate machine learning with formal methods and contributes to building highly autonomous and self-adaptive robots under complex missions."
|
3 |
Efficient supervision for robot learning via imitation, simulation, and adaptationWulfmeier, Markus January 2018 (has links)
In order to enable more widespread application of robots, we are required to reduce the human effort for the introduction of existing robotic platforms to new environments and tasks. In this thesis, we identify three complementary strategies to address this challenge, via the use of imitation learning, domain adaptation, and transfer learning based on simulations. The overall work strives to reduce the effort of generating training data by employing inexpensively obtainable labels and by transferring information between different domains with deviating underlying properties. Imitation learning enables a straightforward way for untrained personnel to teach robots to perform tasks by providing demonstrations, which represent a comparably inexpensive source of supervision. We develop a scalable approach to identify the preferences underlying demonstration data via the framework of inverse reinforcement learning. The method enables integration of the extracted preferences as cost maps into existing motion planning systems. We further incorporate prior domain knowledge and demonstrate that the approach outperforms the baselines including manually crafted cost functions. In addition to employing low-cost labels from demonstration, we investigate the adaptation of models to domains without available supervisory information. Specifically, the challenge of appearance changes in outdoor robotics such as illumination and weather shifts is addressed using an adversarial domain adaptation approach. A principal advantage of the method over prior work is the straightforwardness of adapting arbitrary, state-of-the-art neural network architectures. Finally, we demonstrate performance benefits of the method for semantic segmentation of drivable terrain. Our last contribution focuses on simulation to real world transfer learning, where the characteristic differences are not only regarding the visual appearance but the underlying system dynamics. Our work aims at parallel training in both systems and mutual guidance via auxiliary alignment rewards to accelerate training for real world systems. The approach is shown to outperform various baselines as well as a unilateral alignment variant.
|
4 |
Revisiting user simulation in dialogue systems : do we still need them ? : will imitation play the role of simulation ?Chandramohan, Senthilkumar 25 September 2012 (has links) (PDF)
Recent advancements in the area of spoken language processing and the wide acceptance of portable devices, have attracted signicant interest in spoken dialogue systems.These conversational systems are man-machine interfaces which use natural language (speech) as the medium of interaction.In order to conduct dialogues, computers must have the ability to decide when and what information has to be exchanged with the users. The dialogue management module is responsible to make these decisions so that the intended task (such as ticket booking or appointment scheduling) can be achieved.Thus learning a good strategy for dialogue management is a critical task.In recent years reinforcement learning-based dialogue management optimization has evolved to be the state-of-the-art. A majority of the algorithms used for this purpose needs vast amounts of training data.However, data generation in the dialogue domain is an expensive and time consuming process. In order to cope with this and also to evaluatethe learnt dialogue strategies, user modelling in dialogue systems was introduced. These models simulate real users in order to generate synthetic data.Being computational models, they introduce some degree of modelling errors. In spite of this, system designers are forced to employ user models due to the data requirement of conventional reinforcement learning algorithms can learn optimal dialogue strategies from limited amount of training data when compared to the conventional algorithms. As a consequence of this, user models are no longer required for the purpose of optimization, yet they continue to provide a fast and easy means for quantifying the quality of dialogue strategies. Since existing methods for user modelling are relatively less realistic compared to real user behaviors, the focus is shifted towards user modelling by means of inverse reinforcement learning. Using experimental results, the proposed method's ability to learn a computational models with real user like qualities is showcased as part of this work.
|
5 |
Abusive and Hate Speech Tweets Detection with Text GenerationNalamothu, Abhishek 06 September 2019 (has links)
No description available.
|
6 |
Specification-guided imitation learningZhou, Weichao 13 September 2024 (has links)
Imitation learning is a powerful data-driven paradigm that enables machines to acquire advanced skills at a human-level proficiency by learning from demonstrations provided by humans or other agents. This approach has found applications in various domains such as robotics, autonomous driving, and text generation. However, the effectiveness of imitation learning depends heavily on the quality of the demonstrations it receives. Human demonstrations can often be inadequate, partial, environment-specific, and sub-optimal. For example, experts may only demonstrate successful task completion in ideal conditions, neglecting potential failure scenarios and important aspects of system safety considerations. The lack of diversity in the demonstrations can introduce bias in the learning process and compromise the safety and robustness of the learning systems. Additionally, current imitation learning algorithms primarily focus on replicating expert behaviors and are thus limited to learning from successful demonstrations alone. This inherent inability to learn to avoid failure is a significant limitation of existing methodologies. As a result, when faced with real-world uncertainties, imitation learning systems encounter challenges in ensuring safety, particularly in critical domains such as autonomous vehicles, healthcare, and finance, where system failures can have serious consequences. Therefore, it is crucial to develop mechanisms that ensure safety, reliability, and transparency in the decision-making process within imitation learning systems.
To address these challenges, this thesis proposes innovative approaches that go beyond traditional imitation learning methodologies by enabling imitation learning systems to incorporate explicit task specifications provided by human designers. Inspired by the idea that humans acquire skills not only by learning from demonstrations but also by following explicit rules, our approach aims to complement expert demonstrations with rule-based specifications. We show that in machine learning tasks, experts can use specifications to convey information that can be difficult to express through demonstrations alone. For instance, in safety-critical scenarios where demonstrations are infeasible, explicitly specifying safety requirements for the learner can be highly effective. We also show that experts can introduce well-structured biases into the learning model, ensuring that the learning process adheres to correct-by-construction principles from its inception. Our approach, called ‘specification-guided imitation learning’, seamlessly integrates formal specifications into the data-driven learning process, laying the theoretical foundations for this framework and developing algorithms to incorporate formal specifications at various stages of imitation learning. We explore the use of different types of specifications in various types of imitation learning tasks and envision that this framework will significantly advance the applicability of imitation learning and create new connections between formal methods and machine learning. Additionally, we anticipate significant impacts across a range of domains, including robotics, autonomous driving, and gaming, by enhancing core machine learning components in future autonomous systems and improving their performance, safety, and reliability.
|
7 |
Towards Provable Guarantees for Learning-based Control ParadigmsShanelle Gertrude Clarke (14247233) 12 December 2022 (has links)
<p> Within recent years, there has been a renewed interest in developing data-driven learning based algorithms for solving longstanding challenging control problems. This interest is primarily motivated by the availability of ubiquitous data and an increase in computational resources of modern machines. However, there is a prevailing concern on the lack of provable performance guarantees on data-driven/model-free learning based control algorithms. This dissertation focuses the following key aspects: i) with what facility can state-of-the-art learning-based control methods eke out successful performance for challenging flight control applications such as aerobatic maneuvering?; and ii) can we leverage well-established tools and techniques in control theory to provide some provable guarantees for different types of learning-based algorithms? </p>
<p>To these ends, a deep RL-based controller is implemented, via high-fidelity simulations, for Fixed-Wing aerobatic maneuvering. which shows the facility with which learning-control methods can eke out successful performances and further encourages the development of learning-based control algorithms with an eye towards providing provable guarantees.<br>
</p>
<p>Two learning-based algorithms are also developed: i) a model-free algorithm which learns a stabilizing optimal control policy for the bilinear biquadratic regulator (BBR) which solves the regulator problem with a biquadratic performance index given an unknown bilinear system; and ii) a model-free inverse reinforcement learning algorithm, called the Model-Free Stochastic inverse LQR (iLQR) algorithm, which solves a well-posed semidefinite programming optimization problem to obtain unique solutions on the linear control gain and the parameters of the quadratic performance index given zero-mean noisy optimal trajectories generated by a linear time-invariant dynamical system. Theoretical analysis and numerical results are provided to validate the effectiveness of all proposed algorithms.</p>
|
8 |
Revisiting user simulation in dialogue systems : do we still need them ? : will imitation play the role of simulation ? / Revisiter la simulation d'utilisateurs dans les systèmes de dialogue parlé : est-elle encore nécessaire ? : est-ce que l'imitation peut jouer le rôle de la simulation ?Chandramohan, Senthilkumar 25 September 2012 (has links)
Les récents progrès dans le domaine du traitement du langage ont apporté un intérêt significatif à la mise en oeuvre de systèmes de dialogue parlé. Ces derniers sont des interfaces utilisant le langage naturel comme medium d'interaction entre le système et l'utilisateur. Le module de gestion de dialogue choisit le moment auquel l'information qu'il choisit doit être échangée avec l'utilisateur. Ces dernières années, l'optimisation de dialogue parlé en utilisant l'apprentissage par renforcement est devenue la référence. Cependant, une grande partie des algorithmes utilisés nécessite une importante quantité de données pour être efficace. Pour gérer ce problème, des simulations d'utilisateurs ont été introduites. Cependant, ces modèles introduisent des erreurs. Par un choix judicieux d'algorithmes, la quantité de données d'entraînement peut être réduite et ainsi la modélisation de l'utilisateur évitée. Ces travaux concernent une partie des contributions présentées. L'autre partie des travaux consiste à proposer une modélisation à partir de données réelles des utilisateurs au moyen de l'apprentissage par renforcement inverse / Recent advancements in the area of spoken language processing and the wide acceptance of portable devices, have attracted signicant interest in spoken dialogue systems.These conversational systems are man-machine interfaces which use natural language (speech) as the medium of interaction.In order to conduct dialogues, computers must have the ability to decide when and what information has to be exchanged with the users. The dialogue management module is responsible to make these decisions so that the intended task (such as ticket booking or appointment scheduling) can be achieved.Thus learning a good strategy for dialogue management is a critical task.In recent years reinforcement learning-based dialogue management optimization has evolved to be the state-of-the-art. A majority of the algorithms used for this purpose needs vast amounts of training data.However, data generation in the dialogue domain is an expensive and time consuming process. In order to cope with this and also to evaluatethe learnt dialogue strategies, user modelling in dialogue systems was introduced. These models simulate real users in order to generate synthetic data.Being computational models, they introduce some degree of modelling errors. In spite of this, system designers are forced to employ user models due to the data requirement of conventional reinforcement learning algorithms can learn optimal dialogue strategies from limited amount of training data when compared to the conventional algorithms. As a consequence of this, user models are no longer required for the purpose of optimization, yet they continue to provide a fast and easy means for quantifying the quality of dialogue strategies. Since existing methods for user modelling are relatively less realistic compared to real user behaviors, the focus is shifted towards user modelling by means of inverse reinforcement learning. Using experimental results, the proposed method's ability to learn a computational models with real user like qualities is showcased as part of this work.
|
9 |
On inverse reinforcement learning and dynamic discrete choice for predicting path choicesKristensen, Drew 11 1900 (has links)
La modélisation du choix d'itinéraire est un sujet de recherche bien étudié avec des implications, par exemple, pour la planification urbaine et l'analyse des flux d'équilibre du trafic. En raison de l'ampleur des effets que ces problèmes peuvent avoir sur les communautés, il n'est pas surprenant que plusieurs domaines de recherche aient tenté de résoudre le même problème. Les défis viennent cependant de la taille des réseaux eux-mêmes, car les grandes villes peuvent avoir des dizaines de milliers de segments de routes reliés par des dizaines de milliers d'intersections. Ainsi, les approches discutées dans cette thèse se concentreront sur la comparaison des performances entre des modèles de deux domaines différents, l'économétrie et l'apprentissage par renforcement inverse (IRL).
Tout d'abord, nous fournissons des informations sur le sujet pour que des chercheurs d'un domaine puissent se familiariser avec l'autre domaine. Dans un deuxième temps, nous décrivons les algorithmes utilisés avec une notation commune, ce qui facilite la compréhension entre les domaines. Enfin, nous comparons les performances des modèles sur des ensembles de données du monde réel, à savoir un ensemble de données couvrant des choix d’itinéraire de cyclistes collectés dans un réseau avec 42 000 liens.
Nous rapportons nos résultats pour les deux modèles de l'économétrie que nous discutons, mais nous n'avons pas pu générer les mêmes résultats pour les deux modèles IRL. Cela était principalement dû aux instabilités numériques que nous avons rencontrées avec le code que nous avions modifié pour fonctionner avec nos données. Nous proposons une discussion de ces difficultés parallèlement à la communication de nos résultats. / Route choice modeling is a well-studied topic of research with implications, for example, for city planning and traffic equilibrium flow analysis. Due to the scale of effects these problems can have on communities, it is no surprise that diverse fields have attempted solutions to the same problem. The challenges, however, come with the size of networks themselves, as large cities may have tens of thousands of road segments connected by tens of thousands of intersections. Thus, the approaches discussed in this thesis will be focusing on the performance comparison between models from two different fields, econometrics and inverse reinforcement learning (IRL).
First, we provide background on the topic to introduce researchers from one field to become acquainted with the other. Secondly, we describe the algorithms used with a common notation to facilitate this building of understanding between the fields. Lastly, we aim to compare the performance of the models on real-world datasets, namely covering bike route choices collected in a network of 42,000 links.
We report our results for the two models from econometrics that we discuss, but were unable to generate the same results for the two IRL models. This was primarily due to numerical instabilities we encountered with the code we had modified to work with our data. We provide a discussion of these difficulties alongside the reporting of our results.
|
10 |
Emergence de concepts multimodaux : de la perception de mouvements primitifs à l'ancrage de mots acoustiques / The Emergence of Multimodal Concepts : From Perceptual Motion Primitives to Grounded Acoustic WordsMangin, Olivier 19 March 2014 (has links)
Cette thèse considère l'apprentissage de motifs récurrents dans la perception multimodale. Elle s'attache à développer des modèles robotiques de ces facultés telles qu'observées chez l'enfant, et elle s'inscrit en cela dans le domaine de la robotique développementale.Elle s'articule plus précisément autour de deux thèmes principaux qui sont d'une part la capacité d'enfants ou de robots à imiter et à comprendre le comportement d'humains, et d'autre part l'acquisition du langage. A leur intersection, nous examinons la question de la découverte par un agent en développement d'un répertoire de motifs primitifs dans son flux perceptuel. Nous spécifions ce problème et établissons son lien avec ceux de l'indétermination de la traduction décrit par Quine et de la séparation aveugle de source tels qu'étudiés en acoustique.Nous en étudions successivement quatre sous-problèmes et formulons une définition expérimentale de chacun. Des modèles d'agents résolvant ces problèmes sont également décrits et testés. Ils s'appuient particulièrement sur des techniques dites de sacs de mots, de factorisation de matrices et d'apprentissage par renforcement inverse. Nous approfondissons séparément les trois problèmes de l'apprentissage de sons élémentaires tels les phonèmes ou les mots, de mouvements basiques de danse et d'objectifs primaires composant des tâches motrices complexes. Pour finir nous étudions le problème de l'apprentissage d'éléments primitifs multimodaux, ce qui revient à résoudre simultanément plusieurs des problèmes précédents. Nous expliquons notamment en quoi cela fournit un modèle de l'ancrage de mots acoustiques / This thesis focuses on learning recurring patterns in multimodal perception. For that purpose it develops cognitive systems that model the mechanisms providing such capabilities to infants; a methodology that fits into thefield of developmental robotics.More precisely, this thesis revolves around two main topics that are, on the one hand the ability of infants or robots to imitate and understand human behaviors, and on the other the acquisition of language. At the crossing of these topics, we study the question of the how a developmental cognitive agent can discover a dictionary of primitive patterns from its multimodal perceptual flow. We specify this problem and formulate its links with Quine's indetermination of translation and blind source separation, as studied in acoustics.We sequentially study four sub-problems and provide an experimental formulation of each of them. We then describe and test computational models of agents solving these problems. They are particularly based on bag-of-words techniques, matrix factorization algorithms, and inverse reinforcement learning approaches. We first go in depth into the three separate problems of learning primitive sounds, such as phonemes or words, learning primitive dance motions, and learning primitive objective that compose complex tasks. Finally we study the problem of learning multimodal primitive patterns, which corresponds to solve simultaneously several of the aforementioned problems. We also details how the last problems models acoustic words grounding.
|
Page generated in 0.091 seconds