A Markov Decision Process (MDP) is a model employed to describe problems in which a decision must be made at each one of several stages, while receiving feedback from the environment. This type of model has been extensively studied in the operations research community and fundamental algorithms have been developed to solve associated problems. However, these algorithms are quite inefficient for very large problems, leading to a need for alternatives; since MDP problems are provably hard on compressed representations, one becomes content even with algorithms which may perform well at least on specific classes of problems. The class of problems we deal with in this thesis allows succinct representations for the MDP as a dynamic Bayes network, and for its solution as a weighted combination of basis functions. We develop novel algorithms for producing, improving, and calculating the error of approximate solutions for MDPs using a compressed representation. Specifically, we develop an efficient branch-and-bound algorithm for computing the Bellman error of the compact approximate solution regardless of its provenance. We introduce an efficient direct linear programming algorithm which, using incremental constraints generation, achieves run times significantly smaller than existing approximate algorithms without much loss of accuracy. We also show a novel direct linear programming algorithm which, instead of employing constraints generation, transforms the exponentially many constraints into a compact form more amenable for tractable solutions. In spite of its perceived importance, the efficient optimization of the Bellman error towards an approximate MDP solution has eluded current algorithms; to this end we propose a novel branch-and-bound approximate policy iteration algorithm which makes direct use of our branch-and-bound method for computing the Bellman error. We further investigate another procedure for obtaining an approximate solution based on the dual of the direct, approximate linear programming formulation for solving MDPs. To address both the loss of accuracy resulting from the direct, approximate linear program solution and the question of where basis functions come from we also develop a principled system able not only to produce the initial set of basis functions, but also able to augment it with new basis functions automatically generated such that the approximation error decreases according to the user's requirements and time limitations.
Identifer | oai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/1171 |
Date | January 2004 |
Creators | Patrascu, Relu-Eugen |
Publisher | University of Waterloo |
Source Sets | University of Waterloo Electronic Theses Repository |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | application/pdf, 1048298 bytes, application/pdf |
Rights | Copyright: 2004, Patrascu, Relu-Eugen. All rights reserved. |
Page generated in 0.0019 seconds