Global ETD Search

Return to search

The Essential Dynamics Algorithm: Essential Results

This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic one is presented which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results in a bicycle riding domain are presented. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. All code used in the experiments is available on the project's web site.

Reinforcement learning

bicycle

policy search

markov decision processes

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/6718
Date	01 May 2003
Creators	Martin, Martin C.
Source Sets	M.I.T. Theses and Dissertation
Language	en_US
Detected Language	English
Format	12 p., 1085830 bytes, 303781 bytes, application/postscript, application/pdf
Relation	AIM-2003-014

Page generated in 0.0022 seconds

The Essential Dynamics Algorithm: Essential Results

Description

Links & Downloads

Tags

Additional Fields