Return to search

Piecewise linear Markov decision processes with an application to partially observable Markov models

This dissertation applies policy improvement and successive
approximation or value iteration to a general class of Markov decision processes with discounted costs. In particular, a class of Markov decision processes, called piecewise-linear, is studied. Piecewise-linear processes are characterized by the property that the value function of a process observed for one period and then terminated is piecewise-linear if the terminal reward function is piecewise-linear. Partially observable Markov decision processes have this property.
It is shown that there are e-optimal piecewise-linear value functions and piecewise-constant policies which are simple. Simple means that there are only finitely many pieces, each of which is defined on a convex polyhedral set. Algorithms based on policy improvement and successive approximation are developed to compute simple approximations to an optimal policy and the optimal value function. / Business, Sauder School of / Graduate

Identiferoai:union.ndltd.org:UBC/oai:circle.library.ubc.ca:2429/20920
Date January 1977
CreatorsSawaki, Katsushige
Source SetsUniversity of British Columbia
LanguageEnglish
Detected LanguageEnglish
TypeText, Thesis/Dissertation
RightsFor non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Page generated in 0.0018 seconds