Global ETD Search

11	Cognition Rehearsed : Recognition and Reproduction of Demonstrated Behavior / Robotövningar : Igenkänning och återgivande av demonstrerat beteende Billing, Erik January 2012 (has links) The work presented in this dissertation investigates techniques for robot Learning from Demonstration (LFD). LFD is a well established approach where the robot is to learn from a set of demonstrations. The dissertation focuses on LFD where a human teacher demonstrates a behavior by controlling the robot via teleoperation. After demonstration, the robot should be able to reproduce the demonstrated behavior under varying conditions. In particular, the dissertation investigates techniques where previous behavioral knowledge is used as bias for generalization of demonstrations. The primary contribution of this work is the development and evaluation of a semi-reactive approach to LFD called Predictive Sequence Learning (PSL). PSL has many interesting properties applied as a learning algorithm for robots. Few assumptions are introduced and little task-specific configuration is needed. PSL can be seen as a variable-order Markov model that progressively builds up the ability to predict or simulate future sensory-motor events, given a history of past events. The knowledge base generated during learning can be used to control the robot, such that the demonstrated behavior is reproduced. The same knowledge base can also be used to recognize an on-going behavior by comparing predicted sensor states with actual observations. Behavior recognition is an important part of LFD, both as a way to communicate with the human user and as a technique that allows the robot to use previous knowledge as parts of new, more complex, controllers. In addition to the work on PSL, this dissertation provides a broad discussion on representation, recognition, and learning of robot behavior. LFD-related concepts such as demonstration, repetition, goal, and behavior are defined and analyzed, with focus on how bias is introduced by the use of behavior primitives. This analysis results in a formalism where LFD is described as transitions between information spaces. Assuming that the behavior recognition problem is partly solved, ways to deal with remaining ambiguities in the interpretation of a demonstration are proposed. The evaluation of PSL shows that the algorithm can efficiently learn and reproduce simple behaviors. The algorithm is able to generalize to previously unseen situations while maintaining the reactive properties of the system. As the complexity of the demonstrated behavior increases, knowledge of one part of the behavior sometimes interferes with knowledge of another parts. As a result, different situations with similar sensory-motor interactions are sometimes confused and the robot fails to reproduce the behavior. One way to handle these issues is to introduce a context layer that can support PSL by providing bias for predictions. Parts of the knowledge base that appear to fit the present context are highlighted, while other parts are inhibited. Which context should be active is continually re-evaluated using behavior recognition. This technique takes inspiration from several neurocomputational models that describe parts of the human brain as a hierarchical prediction system. With behavior recognition active, continually selecting the most suitable context for the present situation, the problem of knowledge interference is significantly reduced and the robot can successfully reproduce also more complex behaviors. Behavior Recognition Learning and Adaptive Systems Learning from Demonstration Neurocomputational Modeling Robot Learning
12	Biologically inspired action representation on humanoids with a perspective for soft wearable robots Nassour, John 10 September 2021 (has links) Although in many of the tasks in robotics, what is sought mainly includes accuracy, precision, flexibility, adaptivity, etc., yet in wearable robotics, there are some other aspects as well that could distinguish a reliable and promising approach. The three key elements that are addressed are as follows: control, actuation, and sensors. Where the goal for each of the previously mentioned objectives is to find a solution/design compatible with humans. A possible way to understand the human motor behaviours is to generate them on human-like robots. Biologically inspired action generation is promising in control of wearable robots as they provide more natural movements. Furthermore, wearable robotics shows exciting progress, also with its design. Soft exosuits use soft materials to build both sensors and actuators. This work investigates an adaptive representation model for actions in robotics. The concrete action model is composed of four modularities: pattern selection, spatial coordination, temporal coordination, and sensory-motor adaptation. Modularity in motor control might provide us with more insights about action learning and generalisation not only for humanoid robots but also for their biological counterparts. Successfully, we tested the model on a humanoid robot by learning to perform a variety of tasks (push recovery, walking, drawing, grasping, etc.). In the next part, we suggest several soft actuation mechanisms that overcome the problem of holding heavy loads and also the issue of on-line programming of the robot motion. The soft actuators use textile materials hosting thermoplastic polyurethane formed as inflatable tubes. Tubes were folded inside housing channels with one strain-limited side to create a flexor actuator. We proposed a new design to control the strained side of the actuator by adding four textile cords along its longitudinal axis. As a result, the actuator behaviour can be on-line programmed to bend and twist in several directions. In the last part of this thesis, we organised piezoresistive elements in a superimposition structure. The sensory structure is used on a sensory gripper to sense and distinguish between pressure and curvature stimuli. Next, we elaborated the sensing gripper by adding proximity sensing through conductive textile parts added to the gripper and work as capacitive sensors. We finally developed a versatile soft strain sensor that uses silicone tubes with an embedded solution that has an electrical resistance proportional to the strain applied on the tubes. Therefore, an entirely soft sensing glove exhibits hand gestures recognition. The proposed combinations of soft actuators, soft sensors, and biologically inspired action representation might open a new perspective to obtain smart wearable robots. / Obwohl bei vielen Aufgaben in der Robotik vor allem Genauigkeit, Präzision, Flexibilität, Anpassungsfähigkeit usw. gefragt sind, gibt es in der Wearable-Robotik auch einige andere Aspekte, die einen zuverlässigen und vielversprechenden Ansatz kennzeichnen. Die drei Schlüsselelemente, sind die folgenden: Steuerung, Aktuatoren und Sensoren. Dabei ist das Ziel für jedes der genannten Elemente, eine menschengerechte Lösung und ein menschengerechtes Design zu finden. Eine Möglichkeit, die menschliche Motorik zu verstehen, besteht darin, sie auf menschenähnlichen Robotern zu erzeugen. Biologisch inspirierte Bewegungsabläufe sind vielversprechend bei der Steuerung von tragbaren Robotern, da sie natürlichere Bewegungen ermöglichen. Darüber hinaus zeigt die tragbare Robotik spannende Fortschritte bei ihrem Design. Zum Beispiel verwenden softe Exoskelette weiche Materialien, um sowohl Sensoren als auch Aktuatoren zu erschaffen. Diese Arbeit erforscht ein adaptives Repräsentationsmodell für Bewegungen in der Robotik. Das konkrete Bewegungsmodell besteht aus vier Modularitäten: Musterauswahl, räumliche Koordination, zeitliche Koordination und sensorisch-motorische Anpassung. Diese Modularität in der Motorsteuerung könnte uns mehr Erkenntnisse über das Erlernen und Verallgemeinern von Handlungen nicht nur für humanoide Roboter, sondern auch für ihre biologischen Gegenstücke liefern. Erfolgreich testeten wir das Modell an einem humanoiden Roboter, indem dieser gelernt hat eine Vielzahl von Aufgaben auszuführen (Stoß-Ausgleichsbewegungen, Gehen, Zeichnen, Greifen, etc.). Im Folgenden schlagen wir mehrere weiche Aktuatoren vor, welche das Problem des Haltens schwerer Lasten und auch die Frage der Online- Programmierung der Roboterbewegung lösen. Diese weichen Aktuatoren verwenden textile Materialien mit thermoplastischem Polyurethan, die als aufblasbare Schläuche geformt sind. Die Schläuche wurden in Gehäusekanäle mit einer dehnungsbegrenzten Seite gefaltet, um Flexoren zu schaffen. Wir haben ein neues Design vorgeschlagen, um die angespannte Seite eines Flexors zu kontrollieren, indem wir vier textile Schnüre entlang seiner Längsachse hinzufügen. Dadurch kann das Verhalten des Flexors online programmiert werden, um ihn in mehrere Richtungen zu biegen und zu verdrehen. Im letzten Teil dieser Arbeit haben wir piezoresistive Elemente in einer Überlagerungsstruktur organisiert. Die sensorische Struktur wird auf einem sensorischen Greifer verwendet, um Druck- und Krümmungsreize zu erfassen und zu unterscheiden. Den sensorischen Greifer haben wir weiterentwickelt indem wir kapazitiv arbeitende Näherungssensoren mittels leitfähiger Textilteile hinzufügten. Schließlich entwickelten wir einen vielseitigen weichen Dehnungssensor, der Silikonschläuche mit einer eingebetteten resistiven Lösung verwendet, deren Wiederstand sich proportional zur Belastung der Schläuche verhält. Dies ermöglicht einem völlig weichen Handschuh die Erkennung von Handgesten. Die vorgeschlagenen Kombinationen aus weichen Aktuatoren, weichen Sensoren und biologisch inspirierter Bewegungsrepräsentation kann eine neue Perspektive eröffnen, um intelligente tragbare Roboter zu erschaffen. info:eu-repo/classification/ddc/620 ddc:620
13	Learning a Reactive Task Plan from Human Demonstrations : Building Behavior Trees using Learning from Demonstration and Planning Constraints / Automatisk inlärning av en reaktiv uppgiftsplan från mänskliga demonstrationer : Byggande av beteendeträd via inlärning från demonstrationer och planeringsbivillkor Gustavsson, Oscar January 2021 (has links) Robot programming can be an expensive and tedious task and companies may have to employ dedicated staff. A promising framework that can alleviate some of the most repetitive tasks and potentially make robots more accessible to non-experts is Learning from Demonstration (LfD). LfD is a framework where the robot learns how to solve a task by observing a human demonstrating it. A representation of the learned policy is needed and Behavior Trees (BTs) are promising. They are a representation of a controller that organizes the switching between tasks and naturally provides the modularity required for learning and the reactivity required for operating in an uncertain environment. Furthermore, BTs are transparent, allowing the user to inspect the policy and verify its safety before executing it. Learning BTs from demonstration has not been studied much in the past. The aim of this thesis is therefore to investigate the feasibility of using BTs in the context of LfD and how such a structure could be learned. To evaluate the feasibility of BTs and answering how they can be learned, a new algorithm for learning BTs from demonstration is presented and evaluated. The algorithm detects similarities between multiple demonstrations to infer in what reference frames different parts of a task occur. The similarities are also used to detect hidden task constraints and goal conditions that are given to a planner that outputs a reactive task plan in the form of a BT. The algorithm is evaluated on manipulation tasks in both simulation and a real robot. The results show that the resulting BT can successfully solve the task while being robust to initial conditions and reactive towards disturbances. These results suggest that BTs are a suitable policy representation for LfD. Furthermore, the results suggest that the presented algorithm is capable of learning a reactive and fault-tolerant task plan and can be used as a basis for future algorithms. / Robotprogrammering kan vara kostsamt och repetitivt och företag kan behöva anställa särskild personal. Ett lovande ramverk som kan underlätta några av de mest repetitiva uppgifterna och potentiellt göra robotar mer tillgängliga för icke-experter är Inlärning från Demonstrationer (eng. Learning from Demonstration, LfD). LfD är ett ramverk där roboten lär sig lösa en uppgift genom att observera hur en människa gör det. En representation av den inlärda policyn behövs och Beteendeträd (eng. Behavior Trees, BTs) är lovande. De är en representation av en kontroller som organiserar växlandet mellan olika uppgifter och tillhandahåller naturligt den moduläritet som krävs för lärande och den reaktivitet som krävs för att verka i en oviss miljö. Dessutom är BTs transparenta, vilket gör det möjligt för användaren att inspektera policyn och verifiera dess säkerhet innan den körs. Att lära sig BTs från demonstrationer har inte studerats mycket tidigare. Syftet med det här arbetet är därför att undersöka genomförbarheten av att använda BTs inom sammanhanget av LfD och hur en sådan struktur kan läras. För att utvärdera genomförbarheten hos BTs och svara på hur de kan läras in presenteras och utvärderas en ny algoritm för inlärning av BTs. Algoritmen detekterar likheter mellan flera demonstrationer för att avgöra i vilken referensram olika delar av uppgiften sker. Likheterna används även för att upptäcka dolda bivillkor och målvillkor i uppgiften som ges till en planerare som skapar en reaktiv uppgiftsplan i form av en BT. Algoritmen utvärderas på manipuleringsuppgifter både i simulering och på en verklig robot. Resultaten visar att de resulterande BTs kan lösa uppgifterna med framgång och samtidigt vara robusta mot begynnelsevillkor och reaktiva mot störningar. Resultaten antyder att BTs är lämpade som en policyrepresentation för LfD. Vidare antyder resultaten att den presenterade algoritmen är kapabel att lära sig en reaktiv och feltolerant uppgiftsplan och kan användas som en utgångspunkt för framtida algoritmer. Behavior Trees Learning from Demonstration Robotics Robot learning Human-robot interaction Beteendeträd Inlärning från Demonstrationer Robotik Robotinlärning Människa- robotinteraktion Elektroteknik och elektronik
14	Micro-Data Reinforcement Learning for Adaptive Robots / Apprentissage micro-data pour l'adaptation en robotique Chatzilygeroudis, Konstantinos 14 December 2018 (has links) Les robots opèrent dans le monde réel, dans lequel essayer quelque chose prend beaucoup de temps. Pourtant, les methodes d’apprentissage par renforcement actuels (par exemple, deep reinforcement learning) nécessitent de longues périodes d’interaction pour trouver des politiques efficaces. Dans cette thèse, nous avons exploré des algorithmes qui abordent le défi de l’apprentissage par essai-erreur en quelques minutes sur des robots physiques. Nous appelons ce défi “Apprentissage par renforcement micro-data”. Dans la première contribution, nous avons proposé un nouvel algorithme d’apprentissage appelé “Reset-free Trial-and-Error” qui permet aux robots complexes de s’adapter rapidement dans des circonstances inconnues (par exemple, des dommages) tout en accomplissant leurs tâches; en particulier, un robot hexapode endommagé a retrouvé la plupart de ses capacités de marche dans un environnement avec des obstacles, et sans aucune intervention humaine. Dans la deuxième contribution, nous avons proposé un nouvel algorithme de recherche de politique “basé modèle”, appelé Black-DROPS, qui: (1) n’impose aucune contrainte à la fonction de récompense ou à la politique, (2) est aussi efficace que les algorithmes de l’état de l’art, et (3) est aussi rapide que les approches analytiques lorsque plusieurs processeurs sont disponibles. Nous avons aussi proposé Multi-DEX, une extension qui s’inspire de l’algorithme “Novelty Search” et permet de résoudre plusieurs scénarios où les récompenses sont rares. Dans la troisième contribution, nous avons introduit une nouvelle procédure d’apprentissage du modèle dans Black-DROPS qui exploite un simulateur paramétré pour permettre d’apprendre des politiques sur des systèmes avec des espaces d’état de grande taille; par exemple, cette extension a trouvé des politiques performantes pour un robot hexapode (espace d’état 48D et d’action 18D) en moins d’une minute d’interaction. Enfin, nous avons exploré comment intégrer les contraintes de sécurité, améliorer la robustesse et tirer parti des multiple a priori en optimisation bayésienne. L'objectif de la thèse était de concevoir des méthodes qui fonctionnent sur des robots physiques (pas seulement en simulation). Par conséquent, tous nos approches ont été évaluées sur au moins un robot physique. Dans l’ensemble, nous proposons des méthodes qui permettre aux robots d’être plus autonomes et de pouvoir apprendre en poignée d’essais / Robots have to face the real world, in which trying something might take seconds, hours, or even days. Unfortunately, the current state-of-the-art reinforcement learning algorithms (e.g., deep reinforcement learning) require big interaction times to find effective policies. In this thesis, we explored approaches that tackle the challenge of learning by trial-and-error in a few minutes on physical robots. We call this challenge “micro-data reinforcement learning”. In our first contribution, we introduced a novel learning algorithm called “Reset-free Trial-and-Error” that allows complex robots to quickly recover from unknown circumstances (e.g., damages or different terrain) while completing their tasks and taking the environment into account; in particular, a physical damaged hexapod robot recovered most of its locomotion abilities in an environment with obstacles, and without any human intervention. In our second contribution, we introduced a novel model-based reinforcement learning algorithm, called Black-DROPS that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. We additionally proposed Multi-DEX, a model-based policy search approach, that takes inspiration from novelty-based ideas and effectively solved several sparse reward scenarios. In our third contribution, we introduced a new model learning procedure in Black-DROPS (we call it GP-MI) that leverages parameterized black-box priors to scale up to high-dimensional systems; for instance, it found high-performing walking policies for a physical damaged hexapod robot (48D state and 18D action space) in less than 1 minute of interaction time. Finally, in the last part of the thesis, we explored a few ideas on how to incorporate safety constraints, robustness and leverage multiple priors in Bayesian optimization in order to tackle the micro-data reinforcement learning challenge. Throughout this thesis, our goal was to design algorithms that work on physical robots, and not only in simulation. Consequently, all the proposed approaches have been evaluated on at least one physical robot. Overall, this thesis aimed at providing methods and algorithms that will allow physical robots to be more autonomous and be able to learn in a handful of trials Apprentissage Micro-Data Apprentissage en robotique Apprentissage par essais et erreurs Apprentissage par renforcement Agents autonomes Micro-Data Policy Search Robot Learning Trial and Error Learning Reinforcement Learning Autonomous agents 006.310 15181
15	BI-DIRECTIONAL COACHING THROUGH SPARSE HUMAN-ROBOT INTERACTIONS Mythra Varun Balakuntala Srinivasa Mur (16377864) 15 June 2023 (has links) <p>Robots have become increasingly common in various sectors, such as manufacturing, healthcare, and service industries. With the growing demand for automation and the expectation for interactive and assistive capabilities, robots must learn to adapt to unpredictable environments like humans can. This necessitates the development of learning methods that can effectively enable robots to collaborate with humans, learn from them, and provide guidance. Human experts commonly teach their collaborators to perform tasks via a few demonstrations, often followed by episodes of coaching that refine the trainee’s performance during practice. Adopting a similar approach that facilitates interactions to teaching robots is highly intuitive and enables task experts to teach the robots directly. Learning from Demonstration (LfD) is a popular method for robots to learn tasks by observing human demonstrations. However, for contact-rich tasks such as cleaning, cutting, or writing, LfD alone is insufficient to achieve a good performance. Further, LfD methods are developed to achieve observed goals while ignoring actions to maximize efficiency. By contrast, we recognize that leveraging human social learning strategies of practice and coaching in conjunction enables learning tasks with improved performance and efficacy. To address the deficiencies of learning from demonstration, we propose a Coaching by Demonstration (CbD) framework that integrates LfD-based practice with sparse coaching interactions from a human expert.</p> <p><br></p> <p>The LfD-based practice in CbD was implemented as an end-to-end off-policy reinforcement learning (RL) agent with the action space and rewards inferred from the demonstration. By modeling the reward as a similarity network trained on expert demonstrations, we eliminate the need for designing task-specific engineered rewards. Representation learning was leveraged to create a novel state feature that captures interaction markers necessary for performing contact-rich skills. This LfD-based practice was combined with coaching, where the human expert can improve or correct the objectives through a series of interactions. The dynamics of interaction in coaching are formalized using a partially observable Markov decision process. The robot aims to learn the true objectives by observing the corrective feedback from the human expert. We provide an approximate solution by reducing this to a policy parameter update using KL divergence between the RL policy and a Gaussian approximation based on coaching. The proposed framework was evaluated on a dataset of 10 contact-rich tasks from the assembly (peg-insertion), service (cleaning, writing, peeling), and medical domains (cricothyroidotomy, sonography). Compared to baselines of behavioral cloning and reinforcement learning algorithms, CbD demonstrates improved performance and efficiency.</p> <p><br></p> <p>During the learning process, the demonstrations and coaching feedback imbue the robot with expert knowledge of the task. To leverage this expertise, we develop a reverse coaching model where the robot can leverage knowledge from demonstrations and coaching corrections to provide guided feedback to human trainees to improve their performance. Providing feedback adapted to individual trainees' "style" is vital to coaching. To this end, we have proposed representing style as objectives in the task null space. Unsupervised clustering of the null-space trajectories using Gaussian mixture models allows the robot to learn different styles of executing the same skill. Given the coaching corrections and style clusters database, a style-conditioned RL agent was developed to provide feedback to human trainees by coaching their execution using virtual fixtures. The reverse coaching model was evaluated on two tasks, a simulated incision and obstacle avoidance through a haptic teleoperation interface. The model improves human trainees’ accuracy and completion time compared to a baseline without corrective feedback. Thus, by taking advantage of different human-social learning strategies, human-robot collaboration can be realized in human-centric environments. </p> <p><br></p> Medical robotics Intelligent robotics Social robotics robot learning and behavior adaptation Learning from Demonstration (LfD) coaching (performance) Human- robot/agent interaction human action segmentation reinforcement learning agent Programming by Demonstration
16	Control-Induced Learning for Autonomous Robots Wanxin Jin (11013834) 23 July 2021 (has links) <div>The recent progress of machine learning, driven by pervasive data and increasing computational power, has shown its potential to achieve higher robot autonomy. Yet, with too much focus on generic models and data-driven paradigms while ignoring inherent structures of control systems and tasks, existing machine learning methods typically suffer from data and computation inefficiency, hindering their public deployment onto general real-world robots. In this thesis work, we claim that the efficiency of autonomous robot learning can be boosted by two strategies. One is to incorporate the structures of optimal control theory into control-objective learning, and this leads to a series of control-induced learning methods that enjoy the complementary benefits of machine learning for higher algorithm autonomy and control theory for higher algorithm efficiency. The other is to integrate necessary human guidance into task and control objective learning, leading to a series of paradigms for robot learning with minimal human guidance on the loop.</div><div><br></div><div>The first part of this thesis focuses on the control-induced learning, where we have made two contributions. One is a set of new methods for inverse optimal control, which address three existing challenges in control objective learning: learning from minimal data, learning time-varying objective functions, and learning under distributed settings. The second is a Pontryagin Differentiable Programming methodology, which bridges the concepts of optimal control theory, deep learning, and backpropagation, and provides a unified end-to-end learning framework to solve a broad range of learning and control tasks, including inverse reinforcement learning, neural ODEs, system identification, model-based reinforcement learning, and motion planning, with data- and computation- efficient performance.</div><div><br></div><div>The second part of this thesis focuses on the paradigms for robot learning with necessary human guidance on the loop. We have made two contributions. The first is an approach of learning from sparse demonstrations, which allows a robot to learn its control objective function only from human-specified sparse waypoints given in the observation (task) space; and the second is an approach of learning from</div><div>human’s directional corrections, which enables a robot to incrementally learn its control objective, with guaranteed learning convergence, from human’s directional correction feedback while it is acting.</div><div><br></div> Control Systems, Robotics and Automation Automation and Control Engineering Robot Learning Machine Learning for Robot Control control-induced learning Autonomous robots human-robot systems control and learning control objective learning inverse optimal control Inverse Reinforcement Learning Learning from Demonstration (LfD) Optimal Control Motion planning differentiable control Reinforcement Learning Control-based Reinforcement Learning control for learning robot autonomy

Page generated in 0.0735 seconds