Global ETD Search

Return to search

Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learning

We consider a multi-task learning problem, where an agent is presented a number of N reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate O(1/√k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. / Master of Science / First, we introduce a popular machine learning technique called Reinforcement Learning (RL), where an agent, such as a robot, uses a policy to choose an action, like moving forward, based on observations from sensors like cameras. The agent receives a reward that helps judge if the policy is good or bad. The objective of the agent is to find a policy that maximizes the cumulative reward it receives by repeating the above process. RL has many applications, including Cruise autonomous cars, Google industry automation, training ChatGPT language models, and Walmart inventory management. However, RL suffers from task sensitivity and requires a lot of training data. For example, if the task changes slightly, the agent needs to train the policy from the beginning. This motivates the technique called Multi-Task Reinforcement Learning (MTRL), where different tasks give different rewards and the agent maximizes the sum of cumulative rewards of all the tasks. We focus on the incremental setting where the agent can only access the tasks one by one randomly. In this case, we only need one agent and it is not required to know which task it is performing. We show that the incremental policy gradient methods we proposed converge to the optimal value of the MTRL objectives at a sublinear rate O(1/ √ k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.

https://hdl.handle.net/10919/118699

Markov decision processes

Multi-task reinforcement learning

Identifer	oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/118699
Date	05 April 2024
Creators	Bai, Yitao
Contributors	Electrical and Computer Engineering, Doan, Thinh T., Stilwell, Daniel J., Jin, Ming
Publisher	Virginia Tech
Source Sets	Virginia Tech Theses and Dissertation
Language	English
Detected Language	English
Type	Thesis, Text
Format	ETD, application/pdf, application/pdf
Rights	CC0 1.0 Universal, http://creativecommons.org/publicdomain/zero/1.0/

Page generated in 0.0035 seconds

Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learning

Description

Links & Downloads

Tags

Additional Fields