• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Hyperparameter Tuning for Reinforcement Learning with Bandits and Off-Policy Sampling

Hauser, Kristen 21 June 2021 (has links)
No description available.
2

POPR: Probabilistic Offline Policy Ranking with Expert Data

Schwantes, Trevor F. 26 April 2023 (has links) (PDF)
While existing off-policy evaluation (OPE) methods typically estimate the value of a policy, in real-world applications, OPE is often used to compare and rank policies before deploying them in the real world. This is also known as the offline policy ranking problem. While one can rank the policies based on point estimates from OPE, it is beneficial to estimate the full distribution of outcomes for policy ranking and selection. This paper introduces Probabilistic Offline Policy Ranking that works with expert trajectories. It introduces rigorous statistical inference capabilities to offline evaluation, which facilitates probabilistic comparisons of candidate policies before they are deployed. We empirically demonstrate that POPR is effective for evaluating RL policies across various environments.

Page generated in 0.2014 seconds