Learning agents often find themselves in environments in which rare significant events occur independently of their current choice of action. Traditional reinforcement learning algorithms sample events according to their natural probability of occurring, and therefore tend to exhibit slow convergence and high variance in such environments. In this thesis, we assume that learning is done in a simulated environment in which the probability of these rare events can be artificially altered. We present novel algorithms for both policy evaluation and control, using both tabular and function approximation representations of the value function. These algorithms automatically tune the rare event probabilities to minimize the variance and use importance sampling to correct for changes in the dynamics. We prove that these algorithms converge, provide an analysis of their bias and variance, and demonstrate their utility in a number of domains, including a large network planning task.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.111576 |
Date | January 2009 |
Creators | Frank, Jordan William, 1980- |
Publisher | McGill University |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Format | application/pdf |
Coverage | Master of Science (School of Computer Science.) |
Rights | All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated. |
Relation | alephsysno: 003164349, proquestno: AAIMR66878, Theses scanned by UMI/ProQuest. |
Page generated in 0.0015 seconds