Spelling suggestions: "subject:"reinforcement learning (cachine learning"" "subject:"reinforcement learning (amachine learning""
1 |
Discovering hierarchy in reinforcement learningHengst, Bernhard, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds building-blocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multi-dimensional Markov decision problem (MDP) by constructing a multi-level hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) sub-goal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents
|
2 |
A study of model-based average reward reinforcement learningOk, DoKyeong 09 May 1996 (has links)
Reinforcement Learning (RL) is the study of learning agents that improve
their performance from rewards and punishments. Most reinforcement learning
methods optimize the discounted total reward received by an agent, while, in many
domains, the natural criterion is to optimize the average reward per time step. In this
thesis, we introduce a model-based average reward reinforcement learning method
called "H-learning" and show that it performs better than other average reward and
discounted RL methods in the domain of scheduling a simulated Automatic Guided
Vehicle (AGV).
We also introduce a version of H-learning which automatically explores the
unexplored parts of the state space, while always choosing an apparently best action
with respect to the current value function. We show that this "Auto-exploratory H-Learning"
performs much better than the original H-learning under many previously
studied exploration strategies.
To scale H-learning to large state spaces, we extend it to learn action models
and reward functions in the form of Bayesian networks, and approximate its value
function using local linear regression. We show that both of these extensions are very
effective in significantly reducing the space requirement of H-learning, and in making
it converge much faster in the AGV scheduling task. Further, Auto-exploratory H-learning
synergistically combines with Bayesian network model learning and value
function approximation by local linear regression, yielding a highly effective average
reward RL algorithm.
We believe that the algorithms presented here have the potential to scale to
large applications in the context of average reward optimization. / Graduation date:1996
|
3 |
An architecture for situated learning agentsMitchell, Matthew Winston, 1968- January 2003 (has links)
Abstract not available
|
4 |
Hierarchical average reward reinforcement learningSeri, Sandeep 15 March 2002 (has links)
Reinforcement Learning (RL) is the study of agents that learn optimal
behavior by interacting with and receiving rewards and punishments from an unknown
environment. RL agents typically do this by learning value functions that
assign a value to each state (situation) or to each state-action pair. Recently,
there has been a growing interest in using hierarchical methods to cope with the
complexity that arises due to the huge number of states found in most interesting
real-world problems. Hierarchical methods seek to reduce this complexity by the
use of temporal and state abstraction. Like most RL methods, most hierarchical
RL methods optimize the discounted total reward that the agent receives. However,
in many domains, the proper criteria to optimize is the average reward per
time step.
In this thesis, we adapt the concepts of hierarchical and recursive optimality,
which are used to describe the kind of optimality achieved by hierarchical methods,
to the average reward setting and show that they coincide under a condition called
Result Distribution Invariance. We present two new model-based hierarchical RL
methods, HH-learning and HAH-learning, that are intended to optimize the average
reward. HH-learning is a hierarchical extension of the model-based, average-reward RL method, H-learning. Like H-learning, HH-learning requires exploration
in order to learn correct domain models and optimal value function. HH-learning
can be used with any exploration strategy whereas HAH-learning uses the principle
of "optimism under uncertainty", which gives it a built-in "auto-exploratory"
feature. We also give the hierarchical and auto-exploratory hierarchical versions
of R-learning, a model-free average reward method, and a hierarchical version of
ARTDP, a model-based discounted total reward method.
We compare the performance of the "flat" and hierarchical methods in the
task of scheduling an Automated Guided Vehicle (AGV) in a variety of settings.
The results show that hierarchical methods can take advantage of temporal and
state abstraction and converge in fewer steps than the flat methods. The exception
is the hierarchical version of ARTDP. We give an explanation for this anomaly.
Auto-exploratory hierarchical methods are faster than the hierarchical methods
with ��-greedy exploration. Finally, hierarchical model-based methods are faster
than hierarchical model-free methods. / Graduation date: 2003
|
5 |
Perception-based generalization in model-based reinforcement learningLeffler, Bethany R. January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 100-104).
|
6 |
Adaptive representations for reinforcement learningWhiteson, Shimon Azariah 28 August 2008 (has links)
Not available / text
|
7 |
Adaptive representations for reinforcement learningWhiteson, Shimon Azariah 22 August 2011 (has links)
Not available / text
|
8 |
Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environmentNgai, Chi-kit., 魏智傑. January 2007 (has links)
published_or_final_version / abstract / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
|
9 |
A unifying framework for computational reinforcement learning theoryLi, Lihong, January 2009 (has links)
Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 238-261).
|
10 |
Reinforcement learning for intelligent assembly automationLee, Siu-keung., 李少強. January 2002 (has links)
published_or_final_version / Industrial and Manufacturing Systems Engineering / Doctoral / Doctor of Philosophy
|
Page generated in 0.1536 seconds