Return to search

Automatic State Construction using Decision Trees for Reinforcement Learning Agents

Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward. When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not always find the optimal policy. For example, a mobile robot needs to remember the number of doors went by in order to reach a specific door, down a corridor of identical doors. To overcome the problem of partial observability, an agent uses both current and past (memory) observations to construct an internal state representation, which is treated as an abstraction of the environment. This research focuses on how features of past events are extracted with variable granularity regarding the internal state construction. The project introduces a new method that applies Information Theory and decision tree technique to derive a tree structure, which represents the state and the policy. The relevance, of a candidate feature, is assessed by the Information Gain Ratio ranking with respect to the cumulative expected reward. Experiments carried out on three different RL tasks have shown that our variant of the U-Tree (McCallum, 1995) produces a more robust state representation and faster learning. This better performance can be explained by the fact that the Information Gain Ratio exhibits a lower variance in return prediction than the Kolmogorov-Smirnov statistical test used in the original U-Tree algorithm.

Identiferoai:union.ndltd.org:ADTP/264960
Date January 2005
CreatorsAu, Manix
PublisherQueensland University of Technology
Source SetsAustraliasian Digital Theses Program
Detected LanguageEnglish
RightsCopyright Manix Au

Page generated in 0.0027 seconds