Global ETD Search

Return to search

Learning in Partially Observable Markov Decision Processes

Indiana University-Purdue University Indianapolis (IUPUI) / Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need to address a number of realistic problems. A number of methods exist for learning in POMDPs, but learning with limited amount of information about the model of POMDP remains a highly anticipated feature. Learning with minimal information is desirable in complex systems as methods requiring complete information among decision makers are impractical in complex systems due to increase of problem dimensionality.

In this thesis we address the problem of decentralized control of POMDPs with unknown transition probabilities and reward. We suggest learning in POMDP using a tree based approach. States of the POMDP are guessed using this tree. Each node in the tree has an automaton in it and acts as a decentralized decision maker for the POMDP. The start state of POMDP is known as the landmark state. Each automaton in the tree uses a simple learning scheme to update its action choice and requires minimal information. The principal result derived is that, without proper knowledge of transition probabilities and rewards, the automata tree of decision makers will converge to a set of actions that maximizes the long term expected reward per unit time obtained by the system. The analysis is based on learning in sequential stochastic games and properties of ergodic Markov chains. Simulation results are presented to compare the long term rewards of the system under different decision control algorithms.

Learning in POMDP

Learning automata tree

POMDP

Computer programming

Data structures (Computer science)

Stochastic systems -- Research

Game theory -- Mathematical models

Sequences (Mathematics)

Markov processes

Decision making -- Simulation methods

User interfaces (Computer systems)

Identifer	oai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/3451
Date	21 August 2013
Creators	Sachan, Mohit
Contributors	Mukhopadhyay, Snehasis, Raje, Rajeev, Al Hasan, Mohammad, Fang, Shiaofen
Source Sets	Indiana University-Purdue University Indianapolis
Language	en_US
Detected Language	English

Page generated in 0.0024 seconds

Learning in Partially Observable Markov Decision Processes

Description

Links & Downloads

Tags

Additional Fields