11 |
Specification-guided imitation learningZhou, Weichao 13 September 2024 (has links)
Imitation learning is a powerful data-driven paradigm that enables machines to acquire advanced skills at a human-level proficiency by learning from demonstrations provided by humans or other agents. This approach has found applications in various domains such as robotics, autonomous driving, and text generation. However, the effectiveness of imitation learning depends heavily on the quality of the demonstrations it receives. Human demonstrations can often be inadequate, partial, environment-specific, and sub-optimal. For example, experts may only demonstrate successful task completion in ideal conditions, neglecting potential failure scenarios and important aspects of system safety considerations. The lack of diversity in the demonstrations can introduce bias in the learning process and compromise the safety and robustness of the learning systems. Additionally, current imitation learning algorithms primarily focus on replicating expert behaviors and are thus limited to learning from successful demonstrations alone. This inherent inability to learn to avoid failure is a significant limitation of existing methodologies. As a result, when faced with real-world uncertainties, imitation learning systems encounter challenges in ensuring safety, particularly in critical domains such as autonomous vehicles, healthcare, and finance, where system failures can have serious consequences. Therefore, it is crucial to develop mechanisms that ensure safety, reliability, and transparency in the decision-making process within imitation learning systems.
To address these challenges, this thesis proposes innovative approaches that go beyond traditional imitation learning methodologies by enabling imitation learning systems to incorporate explicit task specifications provided by human designers. Inspired by the idea that humans acquire skills not only by learning from demonstrations but also by following explicit rules, our approach aims to complement expert demonstrations with rule-based specifications. We show that in machine learning tasks, experts can use specifications to convey information that can be difficult to express through demonstrations alone. For instance, in safety-critical scenarios where demonstrations are infeasible, explicitly specifying safety requirements for the learner can be highly effective. We also show that experts can introduce well-structured biases into the learning model, ensuring that the learning process adheres to correct-by-construction principles from its inception. Our approach, called ‘specification-guided imitation learning’, seamlessly integrates formal specifications into the data-driven learning process, laying the theoretical foundations for this framework and developing algorithms to incorporate formal specifications at various stages of imitation learning. We explore the use of different types of specifications in various types of imitation learning tasks and envision that this framework will significantly advance the applicability of imitation learning and create new connections between formal methods and machine learning. Additionally, we anticipate significant impacts across a range of domains, including robotics, autonomous driving, and gaming, by enhancing core machine learning components in future autonomous systems and improving their performance, safety, and reliability.
|
12 |
Communication-Driven Robot Learning for Human-Robot CollaborationHabibian, Soheil 25 July 2024 (has links)
The growing presence of modern learning robots necessitates a fundamental shift in design, as these robots must learn skills from human inputs. Two main components close the loop in a human-robot interaction: learning and communication. Learning derives robot behaviors from human inputs, and communication conveys information about the robot's learning to the human. This dissertation focuses on methods that enable robots to communicate their internal state clearly while learning precisely from human inputs.
We first consider the information implicitly communicated by robot behavior during human interactions and whether it can be utilized to form human-robot teams. We investigate behavioral economics to identify biases and expectations in human team dynamics and incorporate them into human-robot teams. We develop and demonstrate an optimization approach that relates high-level subtask allocations to low-level robot actions, which implicitly communicates learning to encourage human participation in robot teams. We then study how communication helps humans teach tasks to robots using active learning and interactive imitation learning algorithms. Within the active learning approach, we develop a model that forms a belief over the human's mental model about the robot's learning. We validate that our algorithm enables the robot to balance between learning human preferences and implicitly communicating its learning through questions. Within the imitation learning approach, we integrate a wrapped haptic display that explicitly communicates representations from the robot's learned behavior to the user. We show that our framework helps the human teacher improve different aspects of the robot's learning during kinesthetic teaching. We then extend this system to a more comprehensive interactive learning architecture that provides multi-modal feedback through augmented reality and haptic interfaces. We present a case study with this closed-loop system and illustrate improved teaching, trust, and co-adaptation as the measured benefits of communicating robot learning. Overall, this dissertation demonstrates that bi-directional communication helps robots learn faster and adapt better, while humans experience a more intuitive and trust-based interaction. / Doctor of Philosophy / The growing presence of modern learning robots necessitates a fundamental shift in design, as these robots must learn skills from human inputs. This dissertation focuses on methods that enable robots to communicate their internal state clearly while learning precisely from human inputs. We first consider how robot behaviors during human interactions can be used to form human-robot teams. We investigate human-human teams in behavioral economics to better understand human expectations in human-robot teams. We develop a model that enables robots to distribute subtasks in a way that encourages their human partner to keep collaborating with them. We then study how communication helps human-in-the-loop robot teaching. Within active learning, we develop a model that infers what the human thinks about the robot's learning. We validate that, with our algorithm, the robot efficiently learns human preferences and keeps the human updated about what it has learned. Within imitation learning, we integrate a haptic device that explicitly communicates features from the robot's learned behavior to the user. We show that our framework helps users effectively improve their kinesthetic teaching. We then extend this system to a more comprehensive interactive robot learning architecture that provides feedback through augmented reality and haptic interfaces. We conduct a case study and illustrate that our framework improves robot teaching, human trust, and human-robot co-adaptation. Overall, this dissertation demonstrates that bi-directional communication helps robots learn faster and adapt better, while humans experience a more intuitive and trust-based interaction.
|
13 |
Imitation Of Human Body Poses And Hand Gestures Using A Particle Based Fluidics MethodTilki, Umut 01 October 2012 (has links) (PDF)
In this thesis, a new approach is developed, avoiding the correspondence problem caused by the difference in embodiment between imitator and demonstrator in imitation learning. In our work, the imitator is a fluidic system of dynamics totally different than the imitatee, which is a human performing hand gestures and human body postures. The fluidic system is composed of fluid
particles, which are used for the discretization of the problem domain. In this work, we demonstrate the fluidics formation control so as to imitate by observation initially given human
body poses and hand gestures. Our fluidic formation control is based on setting suitable parameters of Smoothed Particle Hydrodynamics (SPH), which is a particle based Lagrangian
method, according to imitation learning. In the controller part, we developed three approaches: In the first one, we used Artificial Neural Networks (ANN) for training of the input-output pairs on the fluidic imitation system. We extracted shape based feature vectors for human hand gestures as inputs of the system and for output we took the fluid dynamics parameters. In the second approach, we employed the Principal Component Analysis (PCA) method for human hand gesture and human body pose classification and imitation. Lastly, we developed a region based
controller which assigns the fluid parameters according to the human body poses and hand gestures. In this controller, our algorithm determines the best fitting ellipses on human body
regions and human hand finger positions and maps ellipse parameters to the fluid parameters. The fluid parameters adjusted by the fluidics imitation controller are body force (f), density, stiffness coefficient and velocity of particles (V) so as to lead formations of fluidic swarms to human body poses and hand gestures.
|
14 |
Cognitive Interactive Robot LearningFonooni, Benjamin January 2014 (has links)
Building general purpose autonomous robots that suit a wide range of user-specified applications, requires a leap from today's task-specific machines to more flexible and general ones. To achieve this goal, one should move from traditional preprogrammed robots to learning robots that easily can acquire new skills. Learning from Demonstration (LfD) and Imitation Learning (IL), in which the robot learns by observing a human or robot tutor, are among the most popular learning techniques. Showing the robot how to perform a task is often more natural and intuitive than figuring out how to modify a complex control program. However, teaching robots new skills such that they can reproduce the acquired skills under any circumstances, on the right time and in an appropriate way, require good understanding of all challenges in the field. Studies of imitation learning in humans and animals show that several cognitive abilities are engaged to learn new skills correctly. The most remarkable ones are the ability to direct attention to important aspects of demonstrations, and adapting observed actions to the agents own body. Moreover, a clear understanding of the demonstrator's intentions and an ability to generalize to new situations are essential. Once learning is accomplished, various stimuli may trigger the cognitive system to execute new skills that have become part of the robot's repertoire. The goal of this thesis is to develop methods for learning from demonstration that mainly focus on understanding the tutor's intentions, and recognizing which elements of a demonstration need the robot's attention. An architecture containing required cognitive functions for learning and reproduction of high-level aspects of demonstrations is proposed. Several learning methods for directing the robot's attention and identifying relevant information are introduced. The architecture integrates motor actions with concepts, objects and environmental states to ensure correct reproduction of skills. Another major contribution of this thesis is methods to resolve ambiguities in demonstrations where the tutor's intentions are not clearly expressed and several demonstrations are required to infer intentions correctly. The provided solution is inspired by human memory models and priming mechanisms that give the robot clues that increase the probability of inferring intentions correctly. In addition to robot learning, the developed techniques are applied to a shared control system based on visual servoing guided behaviors and priming mechanisms. The architecture and learning methods are applied and evaluated in several real world scenarios that require clear understanding of intentions in the demonstrations. Finally, the developed learning methods are compared, and conditions where each of them has better applicability are discussed. / Att bygga autonoma robotar som passar ett stort antal olika användardefinierade applikationer kräver ett språng från dagens specialiserade maskiner till mer flexibla lösningar. För att nå detta mål, bör man övergå från traditionella förprogrammerade robotar till robotar som själva kan lära sig nya färdigheter. Learning from Demonstration (LfD) och Imitation Learning (IL), där roboten lär sig genom att observera en människa eller en annan robot, är bland de mest populära inlärningsteknikerna. Att visa roboten hur den ska utföra en uppgift är ofta mer naturligt och intuitivt än att modifiera ett komplicerat styrprogram. Men att lära robotar nya färdigheter så att de kan reproducera dem under nya yttre förhållanden, på rätt tid och på ett lämpligt sätt, kräver god förståelse för alla utmaningar inom området. Studier av LfD och IL hos människor och djur visar att flera kognitiva förmågor är inblandade för att lära sig nya färdigheter på rätt sätt. De mest anmärkningsvärda är förmågan att rikta uppmärksamheten på de relevanta aspekterna i en demonstration, och förmågan att anpassa observerade rörelser till robotens egen kropp. Dessutom är det viktigt att ha en klar förståelse av lärarens avsikter, och att ha förmågan att kunna generalisera dem till nya situationer. När en inlärningsfas är slutförd kan stimuli trigga det kognitiva systemet att utföra de nya färdigheter som blivit en del av robotens repertoar. Målet med denna avhandling är att utveckla metoder för LfD som huvudsakligen fokuserar på att förstå lärarens intentioner, och vilka delar av en demonstration som ska ha robotens uppmärksamhet. Den föreslagna arkitekturen innehåller de kognitiva funktioner som behövs för lärande och återgivning av högnivåaspekter av demonstrationer. Flera inlärningsmetoder för att rikta robotens uppmärksamhet och identifiera relevant information föreslås. Arkitekturen integrerar motorkommandon med begrepp, föremål och omgivningens tillstånd för att säkerställa korrekt återgivning av beteenden. Ett annat huvudresultat i denna avhandling rör metoder för att lösa tvetydigheter i demonstrationer, där lärarens intentioner inte är klart uttryckta och flera demonstrationer är nödvändiga för att kunna förutsäga intentioner på ett korrekt sätt. De utvecklade lösningarna är inspirerade av modeller av människors minne, och en primingmekanism används för att ge roboten ledtrådar som kan öka sannolikheten för att intentioner förutsägs på ett korrekt sätt. De utvecklade teknikerna har, i tillägg till robotinlärning, använts i ett halvautomatiskt system (shared control) baserat på visuellt guidade beteenden och primingmekanismer. Arkitekturen och inlärningsteknikerna tillämpas och utvärderas i flera verkliga scenarion som kräver en tydlig förståelse av mänskliga intentioner i demonstrationerna. Slutligen jämförs de utvecklade inlärningsmetoderna, och deras applicerbarhet under olika förhållanden diskuteras. / INTRO
|
15 |
Représentations relationnelles et apprentissage interactif pour l'apprentissage efficace du comportement coopératif / Relational representations and interactive learning for efficient cooperative behavior learningMunzer, Thibaut 21 April 2017 (has links)
Cette thèse présente de nouvelles approches permettant l’apprentissage efficace et intuitif de plans de haut niveau pour les robots collaboratifs. Plus précisément, nous étudions l’application d’algorithmes d’apprentissage par démonstration dans des domaines relationnels. L’utilisation de domaines relationnels pour représenter le monde permet de simplifier la représentation de comportements concurrents et collaboratifs. Nous avons commencé par développer et étudier le premier algorithme d’apprentissage par renforcement inverse pour domaines relationnels. Nous avons ensuite présenté comment utiliser le formalisme RAP pour représenter des tâches collaboratives comprenant un robot et un opérateur humain. RAP est une extension des MDP relationnels qui permet de modéliser des activités concurrentes. Utiliser RAP nous a permis de représenter à la fois l’humain et le robot dans le même processus, mais également de modéliser des activités concurrentes du robot. Sous ce formalisme, nous avons montré qu’il était possible d’apprendre le comportement d’une équipe, à la fois comme une politique et une récompense. Si des connaissances a priori sur la tâche à réaliser sont disponibles, il est possible d’utiliser le même algorithme pour apprendre uniquement les préférences de l’opérateur. Cela permet de s’adapter à l’utilisateur. Nous avons montré que l’utilisation des représentations relationnelles permet d’apprendre des comportements collaboratifs à partir de peu de démonstrations.Ces comportements sont à la fois robustes au bruit, généralisables à de nouveaux états, et transférables à de nouveaux domaines (par exemple en ajoutant des objets). Nous avons également introduit une architecture d’apprentissage interactive qui permet au système de faire moins d’erreurs tout en demandant moins d’efforts à l’opérateur humain. Le robot, en estimant sa confiance dans ses décisions, est capable de demander des instructions quand il est incertain de l’activité à réaliser. Enfin, nous avons implémenté ces approches sur un robot et montré leurs impacts potentiels dans un scenario réaliste. / This thesis presents new approaches toward efficient and intuitive high-level plan learning for cooperative robots. More specifically this work study Learning from Demonstration algorithm for relational domains. Using relational representation to model the world, simplify representing concurrentand cooperative behavior.We have first developed and studied the first algorithm for Inverse ReinforcementLearning in relational domains. We have then presented how one can use the RAP formalism to represent Cooperative Tasks involving a robot and a human operator. RAP is an extension of the Relational MDP framework that allows modeling concurrent activities. Using RAP allow us to represent both the human and the robot in the same process but also to model concurrent robot activities. Under this formalism, we have demonstrated that it is possible to learn behavior, as policy and as reward, of a cooperative team. Prior knowledge about the task can also be used to only learn preferences of the operator.We have shown that, using relational representation, it is possible to learn cooperative behaviors from a small number of demonstration. That these behaviors are robust to noise, can generalize to new states and can transfer to different domain (for example adding objects). We have also introduced an interactive training architecture that allows the system to make fewer mistakes while requiring less effort from the human operator. By estimating its confidence the robot is able to ask for instructions when the correct activity to dois unsure. Lastly, we have implemented these approaches on a real robot and showed their potential impact on an ecological scenario.
|
16 |
An Imitation-Learning based Agentplaying Super MarioLindberg, Magnus January 2014 (has links)
Context. Developing an Artificial Intelligence (AI) agent that canpredict and act in all possible situations in the dynamic environmentsthat modern video games often consists of is on beforehand nearly im-possible and would cost a lot of money and time to create by hand. Bycreating a learning AI agent that could learn by itself by studying itsenvironment with the help of Reinforcement Learning (RL) it wouldsimplify this task. Another wanted feature that often is required is AIagents with a natural acting behavior and a try to solve that problemcould be to imitating a human by using Imitation Learning (IL). Objectives. The purpose of this investigation is to study if it is pos-sible to create a learning AI agent feasible to play and complete somelevels in a platform game with the combination of the two learningtechniques RL and IL. Methods. To be able to investigate the research question an imple-mentation is done that combines one RL technique and one IL tech-nique. By letting a set of human players play the game their behavioris saved and applied to the agents. The RL is then used to train andtweak the agents playing performance. A couple of experiments areexecuted to evaluate the differences between the trained agents againsttheir respective human teacher. Results. The results of these experiments showed promising indica-tions that the agents during different phases of the experiments hadsimilarly behavior compared to their human trainers. The agents alsoperformed well when comparing them to other already existing ones. Conclusions. To conclude there is promising results of creating dy-namical agents with natural behavior with the combination of RL andIL and that it with additional adjustments would make it performeven better as a learning AI with a more natural behavior.
|
17 |
Programming by demonstration of robot manipulatorsSkoglund, Alexander January 2009 (has links)
If a non-expert wants to program a robot manipulator he needs a natural interface that does not require rigorous robot programming skills. Programming-by-demonstration (PbD) is an approach which enables the user to program a robot by simply showing the robot how to perform a desired task. In this approach, the robot recognizes what task it should perform and learn how to perform it by imitating the teacher. One fundamental problem in imitation learning arises from the fact that embodied agents often have different morphologies. Thus, a direct skill transfer from human to a robot is not possible in the general case. Therefore, we need a systematic approach to PbD that takes the capabilities of the robot into account–regarding both perception and body structure. In addition, the robot should be able to learn from experience and improve over time. This raises the question of how to determine the demonstrator’s goal or intentions. We show that this is possible–to some degree–to infer from multiple demonstrations. We address the problem of generation of a reach-to-grasp motion that produces the same results as a human demonstration. It is also of interest to learn what parts of a demonstration provide important information about the task. The major contribution is the investigation of a next-state-planner using a fuzzy time-modeling approach to reproduce a human demonstration on a robot. We show that the proposed planner can generate executable robot trajectories based on a generalization of multiple human demonstrations. We use the notion of hand-states as a common motion language between the human and the robot. It allows the robot to interpret the human motions as its own, and it also synchronizes reaching with grasping. Other contributions include the model-free learning of human to robot mapping, and how an imitation metric ca be used for reinforcement learning of new robot skills. The experimental part of this thesis presents the implementation of PbD of pick-and-place-tasks on different robotic hands/grippers. The different platforms consist of manipulators and motion capturing devices.
|
18 |
Novel Learning-Based Task Schedulers for Domain-Specific SoCsJanuary 2020 (has links)
abstract: This Master’s thesis includes the design, integration on-chip, and evaluation of a set of imitation learning (IL)-based scheduling policies: deep neural network (DNN)and decision tree (DT). We first developed IL-based scheduling policies for heterogeneous systems-on-chips (SoCs). Then, we tested these policies using a system-level domain-specific system-on-chip simulation framework [11]. Finally, we transformed them into efficient code using a cloud engine [1] and implemented on a user-space emulation framework [61] on a Unix-based SoC. IL is one area of machine learning (ML) and a useful method to train artificial intelligence (AI) models by imitating the decisions of an expert or Oracle that knows the optimal solution. This thesis's primary focus is to adapt an ML model to work on-chip and optimize the resource allocation for a set of domain-specific wireless and radar systems applications. Evaluation results with four streaming applications from wireless communications and radar domains show how the proposed IL-based scheduler approximates an offline Oracle expert with more than 97% accuracy and 1.20× faster execution time. The models have been implemented as an add-on, making it easy to port to other SoCs. / Dissertation/Thesis / Masters Thesis Computer Engineering 2020
|
19 |
Apprendre par imitation : applications à quelques problèmes d'apprentissage structuré en traitement des langues / Imitation learning : application to several structured learning tasks in natural language processingKnyazeva, Elena 25 May 2018 (has links)
L’apprentissage structuré est devenu omniprésent dans le traitement automatique des langues naturelles. De nombreuses applications qui font maintenant partie de notre vie telles que des assistants personnels, la traduction automatique, ou encore la reconnaissance vocale, reposent sur ces techniques. Les problèmes d'apprentissage structuré qu’il est nécessaire de résoudre sont de plus en plus complexes et demandent de prendre en compte de plus en plus d’informations à des niveaux linguistiques variés (morphologique, syntaxique, etc.) et reposent la question du meilleurs compromis entre la finesse de la modélisation et l’exactitude des algorithmes d’apprentissage et d’inférence. L’apprentissage par imitation propose de réaliser les procédures d’apprentissage et d’inférence de manière approchée afin de pouvoir exploiter pleinement des structures de dépendance plus riches. Cette thèse explore ce cadre d’apprentissage, en particulier l’algorithme SEARN, à la fois sur le plan théorique ainsi que ses possibilités d’application aux tâches de traitement automatique des langues, notamment aux plus complexes telles que la traduction. Concernant les aspects théoriques, nous présentons un cadre unifié pour les différentes familles d’apprentissage par imitation, qui permet de redériver de manière simple les propriétés de convergence de ces algorithmes; concernant les aspects plus appliqués, nous utilisons l’apprentissage par imitation d’une part pour explorer l’étiquetage de séquences en ordre libre; d’autre part pour étudier des stratégies de décodage en deux étapes pour la traduction automatique. / Structured learning has become ubiquitousin Natural Language Processing; a multitude ofapplications, such as personal assistants, machinetranslation and speech recognition, to name just afew, rely on such techniques. The structured learningproblems that must now be solved are becomingincreasingly more complex and require an increasingamount of information at different linguisticlevels (morphological, syntactic, etc.). It is thereforecrucial to find the best trade-off between the degreeof modelling detail and the exactitude of the inferencealgorithm. Imitation learning aims to perform approximatelearning and inference in order to better exploitricher dependency structures. In this thesis, we explorethe use of this specific learning setting, in particularusing the SEARN algorithm, both from a theoreticalperspective and in terms of the practical applicationsto Natural Language Processing tasks, especiallyto complex tasks such as machine translation.Concerning the theoretical aspects, we introduce aunified framework for different imitation learning algorithmfamilies, allowing us to review and simplifythe convergence properties of the algorithms. With regardsto the more practical application of our work, weuse imitation learning first to experiment with free ordersequence labelling and secondly to explore twostepdecoding strategies for machine translation.
|
20 |
Creating Human-like AI Movement in Games Using Imitation Learning / Imitation Learning som verktyg för att skapa människolik rörelse för AI-karaktärer i spelRenman, Casper January 2017 (has links)
The way characters move and behave in computer and video games are important factors in their believability, which has an impact on the player’s experience. This project explores Imitation Learning using limited amounts of data as an approach to creating human-like AI behaviour in games, and through a user study investigates what factors determine if a character is human-like, when observed through the characters first-person perspective. The idea is to create or shape AI behaviour by recording one's own actions. The implemented framework uses a Nearest Neighbour algorithm with a KD-tree as the policy which maps a state to an action. Results showed that the chosen approach was able to create human-like AI behaviour while respecting the performance constraints of a modern 3D game. / Sättet karaktärer rör sig och beter sig på i dator- och tvspel är viktiga faktoreri deras trovärdighet, som i sin tur har en inverkan på spelarens upplevelse. Det här projektet utforskar Imitation Learning med begränsad mängd data som etttillvägagångssätt för att skapa människolik rörelse för AI-karaktärer i spel, ochutforskar genom en användarstudie vilka faktorer som avgör om en karaktärär människolik, när karaktären observeras genom dess förstapersonsperspektiv. Iden är att skapa eller forma AI-beteende genom att spela in sina egna handlingar. Det implementerade ramverket använder en Nearest Neighbour-algoritmmed ett KD-tree som den policy som kopplar ett tillstånd till en handling. Resultatenvisade att det valda tillvägagångssättet lyckades skapa människolikt AI-beteende samtidigt som det respekterar beräkningskomplexitetsrestriktionersom ett modernt 3D-spel har.
|
Page generated in 0.1348 seconds