Global ETD Search

Return to search

REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS

<p> </p>
<p>Formulating RL with MDPs work typically works for a single objective, and hence, they are not readily applicable where the policies need to optimize multiple objectives or to satisfy certain constraints while maximizing one or multiple objectives, which can often be conflicting. Further, many applications such as robotics or autonomous driving do not allow for violating constraints even during the training process. Currently, existing algorithms do not simultaneously combine multiple objectives and zero-constraint violations, sample efficiency, and computational complexity. To this end, we study sample efficient Reinforcement Learning with concave objective and convex constraints, where an agent maximizes a concave, Lipschitz continuous function of multiple objectives while satisfying a convex cost objective. For this setup, we provide a posterior sampling algorithm which works with a convex optimization problem to solve for the stationary distribution of the states and actions. Further, using our Bellman error based analysis, we show that the algorithm obtains a near-optimal Bayesian regret bound for the number of interaction with the environment. Moreover, with an assumption of existence of slack policies, we design an algorithm that solves for conservative policies which does not violate constraints and still achieves the near-optimal regret bound. We also show that the algorithm performs significantly better than the existing algorithm for MDPs with finite states and finite actions.</p>

10.25394/pgs.20399676.v1

Reinforcement learning

reinforcement learning

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/20399676
Date	29 July 2022
Creators	Mridul Agarwal (13171941)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/REINFORCEMENT_LEARNING_FOR_CONCAVE_OBJECTIVES_AND_CONVEX_CONSTRAINTS/20399676

Page generated in 0.0024 seconds

REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS

Description

Links & Downloads

Tags

Additional Fields