This research focuses on Markov Decision Processes (MDP). MDP is one of the most important and challenging areas of Operations Research. Every day people make many decisions: today's decisions impact
tomorrow's and tomorrow's will impact the ones made the day after. Problems in Engineering, Science, and Business often pose similar challenges: a large number of options and uncertainty about the
future. MDP is one of the most powerful tools for solving such problems.
There are several standard methods for finding optimal or approximately optimal policies for MDP. Approaches widely employed
to solve MDP problems include value iteration and policy iteration. Although simple to implement, these approaches are, nevertheless, limited in the size of problems that can be solved, due to excessive
computation required to find close-to-optimal solutions.
My thesis proposes a new value iteration and modified policy iteration methods for classes of the expected discounted MDPs and
average cost MDPs.
We establish a class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search. Application of these operators requires a little additional computation per iteration but reduces the number of iterations significantly. The development of the acceleration operators relies on two key properties of Markov operator, namely
contraction mapping and monotonicity in a restricted region. Since Markov operators of the classical value iteration and modified
policy iteration methods for average cost MDPs do not possess the contraction mapping property, for these models we restrict our study to average cost problems that can be formulated as the stochastic shortest path problem.
The performance improvement is significant, while the implementation of the operator into the value iteration is trivial. Numerical studies show that the accelerated methods can be hundreds of times more efficient for solving MDP problems than the other known approaches. The computational savings can be significant especially
when the discount factor approaches 1 and the transition probability matrix becomes dense, in which case the standard iterative algorithms suffer from slow convergence.
Identifer | oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/24381 |
Date | 21 April 2010 |
Creators | Shlakhter, Oleksandr |
Contributors | Lee, Chi-Guhn |
Source Sets | University of Toronto |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.006 seconds