Global ETD Search

Return to search

Safety-aware apprenticeship learning

It is well acknowledged in the AI community that finding a good reward function for reinforcement learning is extremely challenging. Apprenticeship learning (AL) is a class of “learning from demonstration” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent uses inverse reinforcement learning (IRL) methods to recover expert policy from a set of expert demonstrations. However, as the agent learns exclusively from observations, given a constraint on the probability of the agent running into unwanted situations, there is no verification, nor guarantee, for the learnt policy on the satisfaction of the restriction. In this dissertation, we study the problem of how to guide AL to learn a policy that is inherently safe while still meeting its learning objective. By combining formal methods with imitation learning, a Counterexample-Guided Apprenticeship Learning algorithm is proposed. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. This algorithm guarantees that given some formal safety specification defined by probabilistic temporal logic, the learnt policy shall satisfy this specification. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

https://hdl.handle.net/2144/30743

Computer engineering

Apprenticeship learning

Probabilistic model checking

Reinforcement learning

Safety-aware

Identifer	oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/30743
Date	03 July 2018
Creators	Zhou, Weichao
Contributors	Li, Wenchao
Source Sets	Boston University
Language	en_US
Detected Language	English
Type	Thesis/Dissertation

Page generated in 0.0022 seconds

Safety-aware apprenticeship learning

Description

Links & Downloads

Tags

Additional Fields