Global ETD Search

Return to search

Regret-based Reward Elicitation for Markov Decision Processes

Markov decision processes (MDPs) have proven to be a useful model for sequential decision- theoretic reasoning under uncertainty, yet they require the specification of a reward function that can require sophisticated human judgement to assess relevant tradeoffs. This dissertation casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We demonstrate how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion.
Minimax regret offers an intuitive bound on loss; however, it is computationally intractable in general. This work develops techniques for exploiting MDP structure to allow for offline precomputation that enables efficient online minimax regret computation. To complement this exact approach we develop several general approximations that offer both upper and lower bounds on minimax regret. We further show how approximations can be improved online during the elicitation procedure to balance accuracy and efficiency.
To effectively reduce regret, we investigate a spectrum of elicitation approaches that range from the computationally-demanding optimal selection of complex queries about full MDP policies (which are informative, but, we believe, cognitively difficult) to the heuristic selection of simple queries that focus on a small set of reward parameters. Results are demonstrated on MDPs drawn from the domains of assistive technology and autonomic computing.
Finally we demonstrate our framework on a realistic website optimization domain, per- forming elicitation on websites with tens of thousands of webpages. We show that minimax regret can be efficiently computed, and develop informative and cognitively reasonable queries that quickly lower minimax regret, producing policies that offer significant improvement in the design of the underlying websites.

http://hdl.handle.net/1807/65730

Decision Making

Robust Optimization

Preference Elicitation

Artificial Intelligence

Planning under Uncertainty

Markov Decision Processes

0984

Identifer	oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/65730
Date	22 August 2014
Creators	Kevin, Regan
Contributors	Craig, Boutilier
Source Sets	University of Toronto
Language	en_ca
Detected Language	English
Type	Thesis

Page generated in 0.0021 seconds

Regret-based Reward Elicitation for Markov Decision Processes

Description

Links & Downloads

Tags

Additional Fields