Global ETD Search

Return to search

POPR: Probabilistic Offline Policy Ranking with Expert Data

While existing off-policy evaluation (OPE) methods typically estimate the value of a policy, in real-world applications, OPE is often used to compare and rank policies before deploying them in the real world. This is also known as the offline policy ranking problem. While one can rank the policies based on point estimates from OPE, it is beneficial to estimate the full distribution of outcomes for policy ranking and selection. This paper introduces Probabilistic Offline Policy Ranking that works with expert trajectories. It introduces rigorous statistical inference capabilities to offline evaluation, which facilitates probabilistic comparisons of candidate policies before they are deployed. We empirically demonstrate that POPR is effective for evaluating RL policies across various environments.

offline reinforcement learning

off-policy evaluation

off-policy ranking

Physical Sciences and Mathematics

Identifer	oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-11359
Date	26 April 2023
Creators	Schwantes, Trevor F.
Publisher	BYU ScholarsArchive
Source Sets	Brigham Young University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	https://lib.byu.edu/about/copyright/

Page generated in 0.0016 seconds

POPR: Probabilistic Offline Policy Ranking with Expert Data

Description

Links & Downloads

Tags

Additional Fields