Spelling suggestions: "subject:"1earning from demonstration (LfD)"" "subject:"1earning from remonstration (LfD)""
1 |
Human skill capturing and modelling using wearable devicesZhao, Yuchen January 2017 (has links)
Industrial robots are delivering more and more manipulation services in manufacturing. However, when the task is complex, it is difficult to programme a robot to fulfil all the requirements because even a relatively simple task such as a peg-in-hole insertion contains many uncertainties, e.g. clearance, initial grasping position and insertion path. Humans, on the other hand, can deal with these variations using their vision and haptic feedback. Although humans can adapt to uncertainties easily, most of the time, the skilled based performances that relate to their tacit knowledge cannot be easily articulated. Even though the automation solution may not fully imitate human motion since some of them are not necessary, it would be useful if the skill based performance from a human could be firstly interpreted and modelled, which will then allow it to be transferred to the robot. This thesis aims to reduce robot programming efforts significantly by developing a methodology to capture, model and transfer the manual manufacturing skills from a human demonstrator to the robot. Recently, Learning from Demonstration (LfD) is gaining interest as a framework to transfer skills from human teacher to robot using probability encoding approaches to model observations and state transition uncertainties. In close or actual contact manipulation tasks, it is difficult to reliabley record the state-action examples without interfering with the human senses and activities. Therefore, wearable sensors are investigated as a promising device to record the state-action examples without restricting the human experts during the skilled execution of their tasks. Firstly to track human motions accurately and reliably in a defined 3-dimensional workspace, a hybrid system of Vicon and IMUs is proposed to compensate for the known limitations of the individual system. The data fusion method was able to overcome occlusion and frame flipping problems in the two camera Vicon setup and the drifting problem associated with the IMUs. The results indicated that occlusion and frame flipping problems associated with Vicon can be mitigated by using the IMU measurements. Furthermore, the proposed method improves the Mean Square Error (MSE) tracking accuracy range from 0.8˚ to 6.4˚ compared with the IMU only method. Secondly, to record haptic feedback from a teacher without physically obstructing their interactions with the workpiece, wearable surface electromyography (sEMG) armbands were used as an indirect method to indicate contact feedback during manual manipulations. A muscle-force model using a Time Delayed Neural Network (TDNN) was built to map the sEMG signals to the known contact force. The results indicated that the model was capable of estimating the force from the sEMG armbands in the applications of interest, namely in peg-in-hole and beater winding tasks, with MSE of 2.75N and 0.18N respectively. Finally, given the force estimation and the motion trajectories, a Hidden Markov Model (HMM) based approach was utilised as a state recognition method to encode and generalise the spatial and temporal information of the skilled executions. This method would allow a more representative control policy to be derived. A modified Gaussian Mixture Regression (GMR) method was then applied to enable motions reproduction by using the learned state-action policy. To simplify the validation procedure, instead of using the robot, additional demonstrations from the teacher were used to verify the reproduction performance of the policy, by assuming human teacher and robot learner are physical identical systems. The results confirmed the generalisation capability of the HMM model across a number of demonstrations from different subjects; and the reproduced motions from GMR were acceptable in these additional tests. The proposed methodology provides a framework for producing a state-action model from skilled demonstrations that can be translated into robot kinematics and joint states for the robot to execute. The implication to industry is reduced efforts and time in programming the robots for applications where human skilled performances are required to cope robustly with various uncertainties during tasks execution.
|
2 |
BI-DIRECTIONAL COACHING THROUGH SPARSE HUMAN-ROBOT INTERACTIONSMythra Varun Balakuntala Srinivasa Mur (16377864) 15 June 2023 (has links)
<p>Robots have become increasingly common in various sectors, such as manufacturing, healthcare, and service industries. With the growing demand for automation and the expectation for interactive and assistive capabilities, robots must learn to adapt to unpredictable environments like humans can. This necessitates the development of learning methods that can effectively enable robots to collaborate with humans, learn from them, and provide guidance. Human experts commonly teach their collaborators to perform tasks via a few demonstrations, often followed by episodes of coaching that refine the trainee’s performance during practice. Adopting a similar approach that facilitates interactions to teaching robots is highly intuitive and enables task experts to teach the robots directly. Learning from Demonstration (LfD) is a popular method for robots to learn tasks by observing human demonstrations. However, for contact-rich tasks such as cleaning, cutting, or writing, LfD alone is insufficient to achieve a good performance. Further, LfD methods are developed to achieve observed goals while ignoring actions to maximize efficiency. By contrast, we recognize that leveraging human social learning strategies of practice and coaching in conjunction enables learning tasks with improved performance and efficacy. To address the deficiencies of learning from demonstration, we propose a Coaching by Demonstration (CbD) framework that integrates LfD-based practice with sparse coaching interactions from a human expert.</p>
<p><br></p>
<p>The LfD-based practice in CbD was implemented as an end-to-end off-policy reinforcement learning (RL) agent with the action space and rewards inferred from the demonstration. By modeling the reward as a similarity network trained on expert demonstrations, we eliminate the need for designing task-specific engineered rewards. Representation learning was leveraged to create a novel state feature that captures interaction markers necessary for performing contact-rich skills. This LfD-based practice was combined with coaching, where the human expert can improve or correct the objectives through a series of interactions. The dynamics of interaction in coaching are formalized using a partially observable Markov decision process. The robot aims to learn the true objectives by observing the corrective feedback from the human expert. We provide an approximate solution by reducing this to a policy parameter update using KL divergence between the RL policy and a Gaussian approximation based on coaching. The proposed framework was evaluated on a dataset of 10 contact-rich tasks from the assembly (peg-insertion), service (cleaning, writing, peeling), and medical domains (cricothyroidotomy, sonography). Compared to baselines of behavioral cloning and reinforcement learning algorithms, CbD demonstrates improved performance and efficiency.</p>
<p><br></p>
<p>During the learning process, the demonstrations and coaching feedback imbue the robot with expert knowledge of the task. To leverage this expertise, we develop a reverse coaching model where the robot can leverage knowledge from demonstrations and coaching corrections to provide guided feedback to human trainees to improve their performance. Providing feedback adapted to individual trainees' "style" is vital to coaching. To this end, we have proposed representing style as objectives in the task null space. Unsupervised clustering of the null-space trajectories using Gaussian mixture models allows the robot to learn different styles of executing the same skill. Given the coaching corrections and style clusters database, a style-conditioned RL agent was developed to provide feedback to human trainees by coaching their execution using virtual fixtures. The reverse coaching model was evaluated on two tasks, a simulated incision and obstacle avoidance through a haptic teleoperation interface. The model improves human trainees’ accuracy and completion time compared to a baseline without corrective feedback. Thus, by taking advantage of different human-social learning strategies, human-robot collaboration can be realized in human-centric environments. </p>
<p><br></p>
|
3 |
Control-Induced Learning for Autonomous RobotsWanxin Jin (11013834) 23 July 2021 (has links)
<div>The recent progress of machine learning, driven by pervasive data and increasing computational power, has shown its potential to achieve higher robot autonomy. Yet, with too much focus on generic models and data-driven paradigms while ignoring inherent structures of control systems and tasks, existing machine learning methods typically suffer from data and computation inefficiency, hindering their public deployment onto general real-world robots. In this thesis work, we claim that the efficiency of autonomous robot learning can be boosted by two strategies. One is to incorporate the structures of optimal control theory into control-objective learning, and this leads to a series of control-induced learning methods that enjoy the complementary benefits of machine learning for higher algorithm autonomy and control theory for higher algorithm efficiency. The other is to integrate necessary human guidance into task and control objective learning, leading to a series of paradigms for robot learning with minimal human guidance on the loop.</div><div><br></div><div>The first part of this thesis focuses on the control-induced learning, where we have made two contributions. One is a set of new methods for inverse optimal control, which address three existing challenges in control objective learning: learning from minimal data, learning time-varying objective functions, and learning under distributed settings. The second is a Pontryagin Differentiable Programming methodology, which bridges the concepts of optimal control theory, deep learning, and backpropagation, and provides a unified end-to-end learning framework to solve a broad range of learning and control tasks, including inverse reinforcement learning, neural ODEs, system identification, model-based reinforcement learning, and motion planning, with data- and computation- efficient performance.</div><div><br></div><div>The second part of this thesis focuses on the paradigms for robot learning with necessary human guidance on the loop. We have made two contributions. The first is an approach of learning from sparse demonstrations, which allows a robot to learn its control objective function only from human-specified sparse waypoints given in the observation (task) space; and the second is an approach of learning from</div><div>human’s directional corrections, which enables a robot to incrementally learn its control objective, with guaranteed learning convergence, from human’s directional correction feedback while it is acting.</div><div><br></div>
|
Page generated in 0.1474 seconds