Global ETD Search

Return to search

Reinforcement Learning with History Lists

A very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.

https://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009031619

Reinforcement Learning

28 - Informatik, Datenverarbeitung

I.2.6 - Learning

ddc:004

Identifer	oai:union.ndltd.org:uni-osnabrueck.de/oai:repositorium.ub.uni-osnabrueck.de:urn:nbn:de:gbv:700-2009031619
Date	13 March 2009
Creators	Timmer, Stephan
Contributors	Prof. Dr. Martin Riedmiller, Prof. Dr. Kai-Uwe Kühnberger
Source Sets	Universität Osnabrück
Language	English
Detected Language	English
Type	doc-type:doctoralThesis
Format	application/zip, application/pdf
Rights	http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0021 seconds

Reinforcement Learning with History Lists

Description

Links & Downloads

Tags

Additional Fields