Global ETD Search

Return to search

Utilizing negative policy information to accelerate reinforcement learning

A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior.

We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form.

Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance.

We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.

http://hdl.handle.net/1853/53481

Interactive machine learning

State constraints

State-action constraints

Reinforcement learning

Hierarchical reinforcement learning

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/53481
Date	08 June 2015
Creators	Irani, Arya John
Contributors	Isbell, Charles L.
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Dissertation
Format	application/pdf

Page generated in 0.0198 seconds

Utilizing negative policy information to accelerate reinforcement learning

Description

Links & Downloads

Tags

Additional Fields