21 |
Cognition Rehearsed : Recognition and Reproduction of Demonstrated Behavior / Robotövningar : Igenkänning och återgivande av demonstrerat beteendeBilling, Erik January 2012 (has links)
The work presented in this dissertation investigates techniques for robot Learning from Demonstration (LFD). LFD is a well established approach where the robot is to learn from a set of demonstrations. The dissertation focuses on LFD where a human teacher demonstrates a behavior by controlling the robot via teleoperation. After demonstration, the robot should be able to reproduce the demonstrated behavior under varying conditions. In particular, the dissertation investigates techniques where previous behavioral knowledge is used as bias for generalization of demonstrations. The primary contribution of this work is the development and evaluation of a semi-reactive approach to LFD called Predictive Sequence Learning (PSL). PSL has many interesting properties applied as a learning algorithm for robots. Few assumptions are introduced and little task-specific configuration is needed. PSL can be seen as a variable-order Markov model that progressively builds up the ability to predict or simulate future sensory-motor events, given a history of past events. The knowledge base generated during learning can be used to control the robot, such that the demonstrated behavior is reproduced. The same knowledge base can also be used to recognize an on-going behavior by comparing predicted sensor states with actual observations. Behavior recognition is an important part of LFD, both as a way to communicate with the human user and as a technique that allows the robot to use previous knowledge as parts of new, more complex, controllers. In addition to the work on PSL, this dissertation provides a broad discussion on representation, recognition, and learning of robot behavior. LFD-related concepts such as demonstration, repetition, goal, and behavior are defined and analyzed, with focus on how bias is introduced by the use of behavior primitives. This analysis results in a formalism where LFD is described as transitions between information spaces. Assuming that the behavior recognition problem is partly solved, ways to deal with remaining ambiguities in the interpretation of a demonstration are proposed. The evaluation of PSL shows that the algorithm can efficiently learn and reproduce simple behaviors. The algorithm is able to generalize to previously unseen situations while maintaining the reactive properties of the system. As the complexity of the demonstrated behavior increases, knowledge of one part of the behavior sometimes interferes with knowledge of another parts. As a result, different situations with similar sensory-motor interactions are sometimes confused and the robot fails to reproduce the behavior. One way to handle these issues is to introduce a context layer that can support PSL by providing bias for predictions. Parts of the knowledge base that appear to fit the present context are highlighted, while other parts are inhibited. Which context should be active is continually re-evaluated using behavior recognition. This technique takes inspiration from several neurocomputational models that describe parts of the human brain as a hierarchical prediction system. With behavior recognition active, continually selecting the most suitable context for the present situation, the problem of knowledge interference is significantly reduced and the robot can successfully reproduce also more complex behaviors.
|
22 |
Learning Robotic Reactive Behaviour from Demonstration via Dynamic Tree / Lära sig robotreaktivt beteende från demonstration via dynamiskt trädYadav, Mayank January 2020 (has links)
Programming a complex robot is difficult, time-consuming and expensive. Learning from Demonstration (LfD) is a methodology where a teacher demonst--rates a task and the robot learns to execute the task. This thesis presents a method which generates reactive robot behaviour learned from demonstration where sequences of action are implicitly coded in a rule-based manner. It also presents a novel approach to find behaviour hierarchy among behaviours of a demonstration.In the thesis, the system learns the activation rule of primitives as well as the association that should be performed between sensor and motor primitives. In order to do so, we use the Playful programming language which is based on the reactive programming paradigm. The underlying rule for the activation of associations is learned using a neural network from demonstrated data. Behaviour hierarchy among different sensor-motor associations is learnt using heuristic logic minimization technique called espresso algorithm. Once relationship among the associations is learnt, all the logical relationships are used to generate a hierarchical tree of behaviours using a novel approach that is proposed in the thesis. This allows us to represent the behaviour in hierarchical way as a set of associations between sensor and motor primitives in a readable script which is deployed on Playful.The method is tested on a simulation by varying the number of targets, showing that the system learns underlying rules for sensor-motor association providing high F1-score for each association. It is also shown by changing the complexity of simulation that the system generalises the solution and the knowledge learnt from a sensor-motor association is transferable with all the instances of that association. / Att programmera en komplex robot är svårt, tidskrävande och dyrt. Learning from Demonstration (LfD) är en metod där en lärare visar en uppgift och roboten lär sig att utföra uppgiften. Denna avhandling presenterar en metod som genererar reaktivt robotbeteende lärt från demonstration där handlingssek--venser implicit kodas på ett regelbaserat sätt. Den presenterar också ett nytt tillvägagån- -gssätt för att hitta beteendeshierarki bland beteenden i en demonstration.I avhandlingen lär sig systemet aktiveringsregeln för primitiva såväl som sambandet som ska utföras mellan sensor och motor primitives. För att göra det använder vi det lekfulla programmeringsspråket som bygger på reaktivt programmeringsparadigm. Den underliggande regeln för aktivering av föreningar lärs med hjälp av ett neuralt nätverk från demonstrerade data. Beteendeshierarki mellan olika sensor-motorföreningar lärs med hjälp av heuristisk logikminimeringsteknik som kallas espressoalgoritm. När förhållandet mellan föreningarna har lärt sig används alla logiska förhållanden för att generera ett hierarkiskt beteendeträd med den nya metoden som föreslås i avhandlingen. Detta gör att vi kan representera beteendet på hierarkiskt sätt som en uppsättning associeringar mellan sensor och motorprimitiv i ett läsbart skript som används på lekfull.Metoden testas på en simulering genom att variera antalet mål, vilket visar att systemet lär sig underliggande regler för sensor-motorassociation som ger hög F1-poäng för varje association. Det visas också genom att ändra komplexiteten i simuleringen att systemet generaliserar lösningen och kunskapen som lärts från en sensor-motorförening är överförbar med alla förekomster av den associeringen.
|
23 |
Learning a Reactive Task Plan from Human Demonstrations : Building Behavior Trees using Learning from Demonstration and Planning Constraints / Automatisk inlärning av en reaktiv uppgiftsplan från mänskliga demonstrationer : Byggande av beteendeträd via inlärning från demonstrationer och planeringsbivillkorGustavsson, Oscar January 2021 (has links)
Robot programming can be an expensive and tedious task and companies may have to employ dedicated staff. A promising framework that can alleviate some of the most repetitive tasks and potentially make robots more accessible to non-experts is Learning from Demonstration (LfD). LfD is a framework where the robot learns how to solve a task by observing a human demonstrating it. A representation of the learned policy is needed and Behavior Trees (BTs) are promising. They are a representation of a controller that organizes the switching between tasks and naturally provides the modularity required for learning and the reactivity required for operating in an uncertain environment. Furthermore, BTs are transparent, allowing the user to inspect the policy and verify its safety before executing it. Learning BTs from demonstration has not been studied much in the past. The aim of this thesis is therefore to investigate the feasibility of using BTs in the context of LfD and how such a structure could be learned. To evaluate the feasibility of BTs and answering how they can be learned, a new algorithm for learning BTs from demonstration is presented and evaluated. The algorithm detects similarities between multiple demonstrations to infer in what reference frames different parts of a task occur. The similarities are also used to detect hidden task constraints and goal conditions that are given to a planner that outputs a reactive task plan in the form of a BT. The algorithm is evaluated on manipulation tasks in both simulation and a real robot. The results show that the resulting BT can successfully solve the task while being robust to initial conditions and reactive towards disturbances. These results suggest that BTs are a suitable policy representation for LfD. Furthermore, the results suggest that the presented algorithm is capable of learning a reactive and fault-tolerant task plan and can be used as a basis for future algorithms. / Robotprogrammering kan vara kostsamt och repetitivt och företag kan behöva anställa särskild personal. Ett lovande ramverk som kan underlätta några av de mest repetitiva uppgifterna och potentiellt göra robotar mer tillgängliga för icke-experter är Inlärning från Demonstrationer (eng. Learning from Demonstration, LfD). LfD är ett ramverk där roboten lär sig lösa en uppgift genom att observera hur en människa gör det. En representation av den inlärda policyn behövs och Beteendeträd (eng. Behavior Trees, BTs) är lovande. De är en representation av en kontroller som organiserar växlandet mellan olika uppgifter och tillhandahåller naturligt den moduläritet som krävs för lärande och den reaktivitet som krävs för att verka i en oviss miljö. Dessutom är BTs transparenta, vilket gör det möjligt för användaren att inspektera policyn och verifiera dess säkerhet innan den körs. Att lära sig BTs från demonstrationer har inte studerats mycket tidigare. Syftet med det här arbetet är därför att undersöka genomförbarheten av att använda BTs inom sammanhanget av LfD och hur en sådan struktur kan läras. För att utvärdera genomförbarheten hos BTs och svara på hur de kan läras in presenteras och utvärderas en ny algoritm för inlärning av BTs. Algoritmen detekterar likheter mellan flera demonstrationer för att avgöra i vilken referensram olika delar av uppgiften sker. Likheterna används även för att upptäcka dolda bivillkor och målvillkor i uppgiften som ges till en planerare som skapar en reaktiv uppgiftsplan i form av en BT. Algoritmen utvärderas på manipuleringsuppgifter både i simulering och på en verklig robot. Resultaten visar att de resulterande BTs kan lösa uppgifterna med framgång och samtidigt vara robusta mot begynnelsevillkor och reaktiva mot störningar. Resultaten antyder att BTs är lämpade som en policyrepresentation för LfD. Vidare antyder resultaten att den presenterade algoritmen är kapabel att lära sig en reaktiv och feltolerant uppgiftsplan och kan användas som en utgångspunkt för framtida algoritmer.
|
24 |
Apprentissage Intelligent des Robots Mobiles dans la Navigation Autonome / Intelligent Mobile Robot Learning in Autonomous NavigationXia, Chen 24 November 2015 (has links)
Les robots modernes sont appelés à effectuer des opérations ou tâches complexes et la capacité de navigation autonome dans un environnement dynamique est un besoin essentiel pour les robots mobiles. Dans l’objectif de soulager de la fastidieuse tâche de préprogrammer un robot manuellement, cette thèse contribue à la conception de commande intelligente afin de réaliser l’apprentissage des robots mobiles durant la navigation autonome. D’abord, nous considérons l’apprentissage des robots via des démonstrations d’experts. Nous proposons d’utiliser un réseau de neurones pour apprendre hors-ligne une politique de commande à partir de données utiles extraites d’expertises. Ensuite, nous nous intéressons à l’apprentissage sans démonstrations d’experts. Nous utilisons l’apprentissage par renforcement afin que le robot puisse optimiser une stratégie de commande pendant le processus d’interaction avec l’environnement inconnu. Un réseau de neurones est également incorporé et une généralisation rapide permet à l’apprentissage de converger en un certain nombre d’épisodes inférieur à la littérature. Enfin, nous étudions l’apprentissage par fonction de récompenses potentielles compte rendu des démonstrations d’experts optimaux ou non-optimaux. Nous proposons un algorithme basé sur l’apprentissage inverse par renforcement. Une représentation non-linéaire de la politique est désignée et la méthode du max-margin est appliquée permettant d’affiner les récompenses et de générer la politique de commande. Les trois méthodes proposées sont évaluées sur des robots mobiles afin de leurs permettre d’acquérir les compétences de navigation autonome dans des environnements dynamiques et inconnus / Modern robots are designed for assisting or replacing human beings to perform complicated planning and control operations, and the capability of autonomous navigation in a dynamic environment is an essential requirement for mobile robots. In order to alleviate the tedious task of manually programming a robot, this dissertation contributes to the design of intelligent robot control to endow mobile robots with a learning ability in autonomous navigation tasks. First, we consider the robot learning from expert demonstrations. A neural network framework is proposed as the inference mechanism to learn a policy offline from the dataset extracted from experts. Then we are interested in the robot self-learning ability without expert demonstrations. We apply reinforcement learning techniques to acquire and optimize a control strategy during the interaction process between the learning robot and the unknown environment. A neural network is also incorporated to allow a fast generalization, and it helps the learning to converge in a number of episodes that is greatly smaller than the traditional methods. Finally, we study the robot learning of the potential rewards underneath the states from optimal or suboptimal expert demonstrations. We propose an algorithm based on inverse reinforcement learning. A nonlinear policy representation is designed and the max-margin method is applied to refine the rewards and generate an optimal control policy. The three proposed methods have been successfully implemented on the autonomous navigation tasks for mobile robots in unknown and dynamic environments.
|
25 |
BI-DIRECTIONAL COACHING THROUGH SPARSE HUMAN-ROBOT INTERACTIONSMythra Varun Balakuntala Srinivasa Mur (16377864) 15 June 2023 (has links)
<p>Robots have become increasingly common in various sectors, such as manufacturing, healthcare, and service industries. With the growing demand for automation and the expectation for interactive and assistive capabilities, robots must learn to adapt to unpredictable environments like humans can. This necessitates the development of learning methods that can effectively enable robots to collaborate with humans, learn from them, and provide guidance. Human experts commonly teach their collaborators to perform tasks via a few demonstrations, often followed by episodes of coaching that refine the trainee’s performance during practice. Adopting a similar approach that facilitates interactions to teaching robots is highly intuitive and enables task experts to teach the robots directly. Learning from Demonstration (LfD) is a popular method for robots to learn tasks by observing human demonstrations. However, for contact-rich tasks such as cleaning, cutting, or writing, LfD alone is insufficient to achieve a good performance. Further, LfD methods are developed to achieve observed goals while ignoring actions to maximize efficiency. By contrast, we recognize that leveraging human social learning strategies of practice and coaching in conjunction enables learning tasks with improved performance and efficacy. To address the deficiencies of learning from demonstration, we propose a Coaching by Demonstration (CbD) framework that integrates LfD-based practice with sparse coaching interactions from a human expert.</p>
<p><br></p>
<p>The LfD-based practice in CbD was implemented as an end-to-end off-policy reinforcement learning (RL) agent with the action space and rewards inferred from the demonstration. By modeling the reward as a similarity network trained on expert demonstrations, we eliminate the need for designing task-specific engineered rewards. Representation learning was leveraged to create a novel state feature that captures interaction markers necessary for performing contact-rich skills. This LfD-based practice was combined with coaching, where the human expert can improve or correct the objectives through a series of interactions. The dynamics of interaction in coaching are formalized using a partially observable Markov decision process. The robot aims to learn the true objectives by observing the corrective feedback from the human expert. We provide an approximate solution by reducing this to a policy parameter update using KL divergence between the RL policy and a Gaussian approximation based on coaching. The proposed framework was evaluated on a dataset of 10 contact-rich tasks from the assembly (peg-insertion), service (cleaning, writing, peeling), and medical domains (cricothyroidotomy, sonography). Compared to baselines of behavioral cloning and reinforcement learning algorithms, CbD demonstrates improved performance and efficiency.</p>
<p><br></p>
<p>During the learning process, the demonstrations and coaching feedback imbue the robot with expert knowledge of the task. To leverage this expertise, we develop a reverse coaching model where the robot can leverage knowledge from demonstrations and coaching corrections to provide guided feedback to human trainees to improve their performance. Providing feedback adapted to individual trainees' "style" is vital to coaching. To this end, we have proposed representing style as objectives in the task null space. Unsupervised clustering of the null-space trajectories using Gaussian mixture models allows the robot to learn different styles of executing the same skill. Given the coaching corrections and style clusters database, a style-conditioned RL agent was developed to provide feedback to human trainees by coaching their execution using virtual fixtures. The reverse coaching model was evaluated on two tasks, a simulated incision and obstacle avoidance through a haptic teleoperation interface. The model improves human trainees’ accuracy and completion time compared to a baseline without corrective feedback. Thus, by taking advantage of different human-social learning strategies, human-robot collaboration can be realized in human-centric environments. </p>
<p><br></p>
|
26 |
Control-Induced Learning for Autonomous RobotsWanxin Jin (11013834) 23 July 2021 (has links)
<div>The recent progress of machine learning, driven by pervasive data and increasing computational power, has shown its potential to achieve higher robot autonomy. Yet, with too much focus on generic models and data-driven paradigms while ignoring inherent structures of control systems and tasks, existing machine learning methods typically suffer from data and computation inefficiency, hindering their public deployment onto general real-world robots. In this thesis work, we claim that the efficiency of autonomous robot learning can be boosted by two strategies. One is to incorporate the structures of optimal control theory into control-objective learning, and this leads to a series of control-induced learning methods that enjoy the complementary benefits of machine learning for higher algorithm autonomy and control theory for higher algorithm efficiency. The other is to integrate necessary human guidance into task and control objective learning, leading to a series of paradigms for robot learning with minimal human guidance on the loop.</div><div><br></div><div>The first part of this thesis focuses on the control-induced learning, where we have made two contributions. One is a set of new methods for inverse optimal control, which address three existing challenges in control objective learning: learning from minimal data, learning time-varying objective functions, and learning under distributed settings. The second is a Pontryagin Differentiable Programming methodology, which bridges the concepts of optimal control theory, deep learning, and backpropagation, and provides a unified end-to-end learning framework to solve a broad range of learning and control tasks, including inverse reinforcement learning, neural ODEs, system identification, model-based reinforcement learning, and motion planning, with data- and computation- efficient performance.</div><div><br></div><div>The second part of this thesis focuses on the paradigms for robot learning with necessary human guidance on the loop. We have made two contributions. The first is an approach of learning from sparse demonstrations, which allows a robot to learn its control objective function only from human-specified sparse waypoints given in the observation (task) space; and the second is an approach of learning from</div><div>human’s directional corrections, which enables a robot to incrementally learn its control objective, with guaranteed learning convergence, from human’s directional correction feedback while it is acting.</div><div><br></div>
|
27 |
Towards Socially Intelligent Robots in Human Centered Environment / Vers des robots socialement intelligents en environnement humainPandey, Amit kumar 20 June 2012 (has links)
Bientôt, les robots ne travailleront plus de manière isolée mais avec nous. Ils entrent peu à peu dans notre vie de tous les jours pour coopérer, assister, aider, servir, apprendre, enseigner ou même jouer avec l'homme. Dans ce contexte, nous considérons que ce ne doit pas être à l'homme de s'adapter au robot. Au contraire, le robot doit être capable d'intégrer, dans ses stratégies de planification et de décision, différents facteurs d'effort et de confort et de prendre en compte les préférences et désirs de l'homme ainsi que les normes sociales de son environnement. Tout en respectant les principes de sécurité réglementaire, le robot doit se comporter, naviguer, manipuler, communiquer et apprendre d'une manière qui soit pertinente, acceptée et compréhensible par l'homme. Cette thèse explore et définit les ingrédients clés nécessaires au robot pour développer une telle intelligence socio-cognitive. Elle définit également un cadre pour l'interaction homme-robot permettant de s'attaquer à ces challenges dans le but de rendre le robot socialement intelligent / Robots will no longer be working isolated from us. They are entering into our day-to-day life to cooperate, assist, help, serve, learn, teach and play with us. In this context, it is important that because of the presence of robots, the human should not be on compromising side. To achieve this, beyond the basic safety requirements, robots should take into account various factors ranging from human’s effort, comfort, preferences, desire, to social norms, in their various planning and decision making strategies. They should behave, navigate, manipulate, interact and learn in a way, which is expected, accepted, and understandable by us, the human. This thesis begins by exploring and identifying the basic yet key ingredients of such socio-cognitive intelligence. Then we develop generic frameworks and concepts from HRI perspective to address these additional challenges, and to elevate the robots capabilities towards being socially intelligent
|
28 |
Sensorimotor learning and simulation of experience as a basis for the development of cognition in roboticsSchillaci, Guido 11 March 2014 (has links)
Heutige Roboter sind nur begrenzt in der Lage etwas zu erlernen, sich unerwarteten Umständen anzupassen oder auf diese zu reagieren. Als Antwort auf diese Fragen, develomental robotics setzt sich den Aufbau eines künstlichen Systems zum Ziel, das motorische und kognitive Fähigkeiten analog zur menschlichen Entwicklung durch Interaktion mit der Umgebung entwickeln kann. In dieser Arbeit wird ein ähnlich Ansatz verwendet, mit Hilfe dessen grundlegende Verhaltenskomponenten identifiziert werden sollen, die eine autonome Aneignung motorischer und kognitive Fähigkeiten durch die Roboter ermöglichen könnten. Diese Arbeit untersucht die sensomotorische Interaktion als Mittel zur Schaffung von Erfahrungen. Es werden Experimente zu explorative Verhaltensweisen zur Aneigung von Arbewegungen, der Werkzeugnutzung und von interaktiven Fähigkeiten vorgestellt. In diesem Rahmen wird auch die Entwicklung sozialer Fähigkeiten, insbesondere durch joint attention, behandelt. Dabei werden zwei Vorraussetzugen zu joint attention untersucht: Zeigegesten und Erkennung von visueller Salienz. Dabei wurde das Framework der interen Modelle für die Darstellung von sensomotorischen Erfahrungen angewendet. Insbesondere wurden inverse und Vorwärtsmodelle mit unterschiedlichen Konfigurationen am sensorischen und motorischen Daten, die vom Roboter durch exploratives Verhalten, durch Beobachtung menschliche Vorführern, oder durch kinästhetisches Lehren erzeugt wurden geschult. Die Entscheidung zu Gunsten dieses Framework wurde getroffen, da es in der Lage ist, sensomotorische Zyklen zu simulieren. Diese Arbeit untersucht, wie grundlegende kognitive Fähigkeiten in einen humanoiden Roboter unter Berücksichtigung sensorischer und motorischer Erfahrungen implementiert werden können. Insbesondere wurden interne Simulationsprozesse für die Implementierung von Kognitivenfahigkeiten wie die Aktionsauswahl, die Werkzeugnutzung, die Verhaltenserkennung und die Self-Other distinction, eingesetzt. / State-of-the-art robots are still not properly able to learn from, adapt to, react to unexpected circumstances, and to autonomously and safely operate in uncertain environments. Researchers in developmental robotics address these issues by building artificial systems capable of acquiring motor and cognitive capabilities by interacting with their environment, inspired by human development. This thesis adopts a similar approach in finding some of those basic behavioural components that may allow for the autonomous development of sensorimotor and social skills in robots. Here, sensorimotor interactions are investigated as a mean for the acquisition of experience. Experiments on exploration behaviours for the acquisition of arm movements, tool-use and interactive capabilities are presented. The development of social skills is also addressed, in particular of joint attention, the capability to share the focus of attention between individuals. Two prerequisites of joint attention are investigated: imperative pointing gestures and visual saliency detection. The established framework of the internal models is adopted for coding sensorimotor experience in robots. In particular, inverse and forward models are trained with different configurations of low-level sensory and motor data generated by the robot through exploration behaviours, or observed by human demonstrator, or acquired through kinaesthetic teaching. The internal models framework allows the generation of simulations of sensorimotor cycles. This thesis investigates also how basic cognitive skills can be implemented in a humanoid robot by allowing it to recreate the perceptual and motor experience gathered in past interactions with the external world. In particular, simulation processes are used as a basis for implementing cognitive skills such as action selection, tool-use, behaviour recognition and self-other distinction.
|
Page generated in 0.1873 seconds