How can an agent generalize its knowledge to new circumstances? To learn
effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented
knowledge when selecting actions.
Our first contribution introduces the multi-task Reinforcement
Learning setting in which an agent solves a sequence of tasks. An
agent equipped with knowledge of the relationship between tasks can
transfer knowledge between them. We propose the transfer of two
distinct types of knowledge: knowledge of domain models and knowledge
of policies. To represent the transferable knowledge, we propose
hierarchical Bayesian priors on domain models and policies
respectively. To transfer domain model knowledge, we introduce a new
algorithm for model-based Bayesian Reinforcement Learning in the
multi-task setting which exploits the learned hierarchical Bayesian
model to improve exploration in related tasks. To transfer policy
knowledge, we introduce a new policy search algorithm that accepts a
policy prior as input and uses the prior to bias policy search. A
specific implementation of this algorithm is developed that accepts a
hierarchical policy prior. The algorithm learns the hierarchical
structure and reuses components of the structure in related tasks.
Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian
Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective.
Successful use of Bayesian Optimization in Reinforcement Learning
requires a model relating policies and their performance. Given such a
model, Bayesian Optimization can be applied to search for an optimal
policy. Early work using Bayesian Optimization in the Reinforcement
Learning setting ignored the sequential nature of the underlying
decision problem. The work presented in this thesis explicitly
addresses this problem. We construct new Bayesian models that take
advantage of sequence information to better generalize knowledge
across policies. We empirically evaluate the value of this approach in
a variety of Reinforcement Learning benchmark problems. Experiments
show that our method significantly reduces the amount of exploration
required to identify the optimal policy.
Our final contribution is a new framework for learning parametric
policies from queries presented to an expert. In many domains it is
difficult to provide expert demonstrations of desired policies.
However, it may still be a simple matter for an expert to identify
good and bad performance. To take advantage of this limited expert
knowledge, our agent presents experts with pairs of demonstrations and
asks which of the demonstrations best represents a latent target
behavior. The goal is to use a small number of queries to elicit the
latent behavior from the expert. We formulate a Bayesian model of the
querying process, an inference procedure that estimates the posterior
distribution over the latent policy space, and an active procedure for
selecting new queries for presentation to the expert. We show, in
multiple domains, that the algorithm successfully learns the target
policy and that the active learning strategy generally improves the
speed of learning. / Graduation date: 2013
Identifer | oai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/34550 |
Date | 28 July 2012 |
Creators | Wilson, Aaron (Aaron Creighton) |
Contributors | Tadepalli, Prasad, Fern, Alan |
Source Sets | Oregon State University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0017 seconds