Global ETD Search

Return to search

MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGS

Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.

10.25394/pgs.27173229.v1

Reinforcement learning

Optimisation

Stochastic analysis and modelling

Reinforcement learning

Markov Decision Process

Discounted Reward

Average Reward

Policy Gradient

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/27173229
Date	07 October 2024
Creators	Qinbo Bai (19804362)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/MODEL-FREE_ALGORITHMS_FOR_CONSTRAINED_REINFORCEMENT_LEARNING_IN_DISCOUNTED_AND_AVERAGE_REWARD_SETTINGS/27173229

Page generated in 0.0021 seconds

MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGS

Description

Links & Downloads

Tags

Additional Fields