This dissertation applies policy improvement and successive
approximation or value iteration to a general class of Markov decision processes with discounted costs. In particular, a class of Markov decision processes, called piecewise-linear, is studied. Piecewise-linear processes are characterized by the property that the value function of a process observed for one period and then terminated is piecewise-linear if the terminal reward function is piecewise-linear. Partially observable Markov decision processes have this property.
It is shown that there are e-optimal piecewise-linear value functions and piecewise-constant policies which are simple. Simple means that there are only finitely many pieces, each of which is defined on a convex polyhedral set. Algorithms based on policy improvement and successive approximation are developed to compute simple approximations to an optimal policy and the optimal value function. / Business, Sauder School of / Graduate
Identifer | oai:union.ndltd.org:UBC/oai:circle.library.ubc.ca:2429/20920 |
Date | January 1977 |
Creators | Sawaki, Katsushige |
Source Sets | University of British Columbia |
Language | English |
Detected Language | English |
Type | Text, Thesis/Dissertation |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
Page generated in 0.0017 seconds