Global ETD Search

Return to search

Proactive Planning through Active Policy Inference in Stochastic Environments

In multi-agent Markov Decision Processes, a controllable agent must perform optimal planning in a dynamic and uncertain environment that includes another unknown and uncontrollable agent. Given a task specification for the controllable agent, its ability to complete the task can be impeded by an inaccurate model of the intent and behaviors of other agents. In this work, we introduce an active policy inference algorithm that allows a controllable agent to infer a policy of the environmental agent through interaction. Active policy inference is data-efficient and is particularly useful when data are time-consuming or costly to obtain. The controllable agent synthesizes an exploration-exploitation policy that incorporates the knowledge learned about the environment's behavior. Whenever possible, the agent also tries to elicit behavior from the other agent to improve the accuracy of the environmental model. This is done by mapping the uncertainty in the environmental model to a bonus reward, which helps elicit the most informative exploration, and allows the controllable agent to return to its main task as fast as possible. Experiments demonstrate the improved sample efficiency of active learning and the convergence of the policy for the controllable agents.

Identifer	oai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-theses-2251
Date	01 May 2018
Creators	Poulin, Nolan
Contributors	Jie Fu, Advisor, Zhi (Jane) Li, Committee Member, Carlo Pinciroli, Committee Member
Publisher	Digital WPI
Source Sets	Worcester Polytechnic Institute
Detected Language	English
Type	text
Format	application/pdf
Source	Masters Theses (All Theses, All Years)

Page generated in 0.002 seconds

Proactive Planning through Active Policy Inference in Stochastic Environments

Description

Links & Downloads

Tags

Additional Fields