Return to search

Regret-based Reward Elicitation for Markov Decision Processes

Markov decision processes (MDPs) have proven to be a useful model for sequential decision- theoretic reasoning under uncertainty, yet they require the specification of a reward function that can require sophisticated human judgement to assess relevant tradeoffs. This dissertation casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We demonstrate how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion.
Minimax regret offers an intuitive bound on loss; however, it is computationally intractable in general. This work develops techniques for exploiting MDP structure to allow for offline precomputation that enables efficient online minimax regret computation. To complement this exact approach we develop several general approximations that offer both upper and lower bounds on minimax regret. We further show how approximations can be improved online during the elicitation procedure to balance accuracy and efficiency.
To effectively reduce regret, we investigate a spectrum of elicitation approaches that range from the computationally-demanding optimal selection of complex queries about full MDP policies (which are informative, but, we believe, cognitively difficult) to the heuristic selection of simple queries that focus on a small set of reward parameters. Results are demonstrated on MDPs drawn from the domains of assistive technology and autonomic computing.
Finally we demonstrate our framework on a realistic website optimization domain, per- forming elicitation on websites with tens of thousands of webpages. We show that minimax regret can be efficiently computed, and develop informative and cognitively reasonable queries that quickly lower minimax regret, producing policies that offer significant improvement in the design of the underlying websites.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/65730
Date22 August 2014
CreatorsKevin, Regan
ContributorsCraig, Boutilier
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0021 seconds