Global ETD Search

1	Explainable Reinforcement Learning for Risk Mitigation in Human-Robot Collaboration Scenarios / Förklarbar förstärkningsinlärning inom människa-robot sammarbete för riskreducering Iucci, Alessandro January 2021 (has links) Reinforcement Learning (RL) algorithms are highly popular in the robotics field to solve complex problems, learn from dynamic environments and generate optimal outcomes. However, one of the main limitations of RL is the lack of model transparency. This includes the inability to provide explanations of why the output was generated. The explainability becomes even more crucial when RL outputs influence human decisions, such as in Human-Robot Collaboration (HRC) scenarios, where safety requirements should be met. This work focuses on the application of two explainability techniques, “Reward Decomposition” and “Autonomous Policy Explanation”, on a RL algorithm which is the core of a risk mitigation module for robots’ operation in a collaborative automated warehouse scenario. The “Reward Decomposition” gives an insight into the factors that impacted the robot’s choice by decomposing the reward function into sub-functions. It also allows creating Minimal Sufficient Explanation (MSX), sets of relevant reasons for each decision taken during the robot’s operation. The second applied technique, “Autonomous Policy Explanation”, provides a global overview of the robot’s behavior by answering queries asked by human users. It also provides insights into the decision guidelines embedded in the robot’s policy. Since the synthesis of the policy descriptions and the queries’ answers are in natural language, this tool facilitates algorithm diagnosis even by non-expert users. The results proved that there is an improvement in the RL algorithm which now chooses more evenly distributed actions and a full policy to the robot’s decisions is produced which is for the most part aligned with the expectations. The work provides an analysis of the results of the application of both techniques which both led to increased transparency of the robot’s decision process. These explainability methods not only built trust in the robot’s choices, which proved to be among the optimal ones in most of the cases but also made it possible to find weaknesses in the robot’s policy, making them a tool helpful for debugging purposes. / Algoritmer för förstärkningsinlärning (RL-algoritmer) är mycket populära inom robotikområdet för att lösa komplexa problem, att lära sig av dynamiska miljöer och att generera optimala resultat. En av de viktigaste begränsningarna för RL är dock bristen på modellens transparens. Detta inkluderar den oförmåga att förklara bakomliggande process (algoritm eller modell) som genererade ett visst returvärde. Förklarbarheten blir ännu viktigare när resultatet från en RL-algoritm påverkar mänskliga beslut, till exempel i HRC-scenarier där säkerhetskrav bör uppfyllas. Detta arbete fokuserar på användningen av två förklarbarhetstekniker, “Reward Decomposition” och “Autonomous policy Explanation”, tillämpat på en RL-algoritm som är kärnan i en riskreduceringsmodul för drift av samarbetande robotars på ett automatiserat lager. “Reward Decomposition” ger en inblick i vilka faktorer som påverkade robotens val genom att bryta ner belöningsfunktionen i mindre funktioner. Det gör det också möjligt att formulera en MSX (minimal sufficient explanation), uppsättning av relevanta skäl för varje beslut som har fattas under robotens drift. Den andra tillämpade tekniken, “Autonomous Policy Explanation”, ger en generellt prespektiv över robotens beteende genom att mänskliga användare får ställa frågor till roboten. Detta ger även insikt i de beslutsriktlinjer som är inbäddade i robotens policy. Ty syntesen av policybeskrivningarna och frågornas svar är naturligt språk underlättar detta en algoritmdiagnos även för icke-expertanvändare. Resultaten visade att det finns en förbättring av RL-algoritmen som nu väljer mer jämnt fördelade åtgärder. Dessutom produceras en fullständig policy för robotens beslut som för det mesta är anpassad till förväntningarna. Rapporten ger en analys av resultaten av tillämpningen av båda teknikerna, som visade att båda ledde till ökad transparens i robotens beslutsprocess. Förklaringsmetoderna gav inte bara förtroende för robotens val, vilket visade sig vara bland de optimala i de flesta fall, utan gjorde det också möjligt att hitta svagheter i robotens policy, vilket gjorde dem till ett verktyg som är användbart för felsökningsändamål. Explainable Reinforcement Learning Human-Robot Collaboration Risk Mitigation Reward Decomposition Autonomous Policy Explanation Collaborative Robots Förklarbar förstärkningslärande Mänskligt-robot-samarbete Riskreducering Reward Decomposition Autonomous Policy Explanation Samarbetsrobotar Computer and Information Sciences Data- och informationsvetenskap
2	Policy Explanation and Model Refinement in Decision-Theoretic Planning Khan, Omar Zia January 2013 (has links) Decision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations. Decision-Theoretic Planning Markov Decision Processes Sequential Decision Making Reasoning under Uncertainty Bayesian Networks Policy Explanation Parameter Learning Model Refinement Computer Science
3	Policy Explanation and Model Refinement in Decision-Theoretic Planning Khan, Omar Zia January 2013 (has links) Decision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations. Decision-Theoretic Planning Markov Decision Processes Sequential Decision Making Reasoning under Uncertainty Bayesian Networks Policy Explanation Parameter Learning Model Refinement Computer Science

1

Page generated in 0.0703 seconds