Return to search

Hierarchical average reward reinforcement learning

Reinforcement Learning (RL) is the study of agents that learn optimal
behavior by interacting with and receiving rewards and punishments from an unknown
environment. RL agents typically do this by learning value functions that
assign a value to each state (situation) or to each state-action pair. Recently,
there has been a growing interest in using hierarchical methods to cope with the
complexity that arises due to the huge number of states found in most interesting
real-world problems. Hierarchical methods seek to reduce this complexity by the
use of temporal and state abstraction. Like most RL methods, most hierarchical
RL methods optimize the discounted total reward that the agent receives. However,
in many domains, the proper criteria to optimize is the average reward per
time step.
In this thesis, we adapt the concepts of hierarchical and recursive optimality,
which are used to describe the kind of optimality achieved by hierarchical methods,
to the average reward setting and show that they coincide under a condition called
Result Distribution Invariance. We present two new model-based hierarchical RL
methods, HH-learning and HAH-learning, that are intended to optimize the average
reward. HH-learning is a hierarchical extension of the model-based, average-reward RL method, H-learning. Like H-learning, HH-learning requires exploration
in order to learn correct domain models and optimal value function. HH-learning
can be used with any exploration strategy whereas HAH-learning uses the principle
of "optimism under uncertainty", which gives it a built-in "auto-exploratory"
feature. We also give the hierarchical and auto-exploratory hierarchical versions
of R-learning, a model-free average reward method, and a hierarchical version of
ARTDP, a model-based discounted total reward method.
We compare the performance of the "flat" and hierarchical methods in the
task of scheduling an Automated Guided Vehicle (AGV) in a variety of settings.
The results show that hierarchical methods can take advantage of temporal and
state abstraction and converge in fewer steps than the flat methods. The exception
is the hierarchical version of ARTDP. We give an explanation for this anomaly.
Auto-exploratory hierarchical methods are faster than the hierarchical methods
with ��-greedy exploration. Finally, hierarchical model-based methods are faster
than hierarchical model-free methods. / Graduation date: 2003

Identiferoai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/31126
Date15 March 2002
CreatorsSeri, Sandeep
ContributorsTadepalli, Prasad
Source SetsOregon State University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0013 seconds