Global ETD Search

Return to search

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.

reinforcement learning

stochastic approximation

sconvergence

dynamic programming

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/7205
Date	01 August 1993
Creators	Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.
Source Sets	M.I.T. Theses and Dissertation
Language	en_US
Detected Language	English
Format	15 p., 77605 bytes, 356324 bytes, application/octet-stream, application/pdf
Relation	AIM-1441, CBCL-084

Page generated in 0.0021 seconds

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Description

Links & Downloads

Tags

Additional Fields