Return to search

Game Players Using Distributional Reinforcement Learning

Reinforcement learning (RL) algorithms aim to identify optimal action sequences for an agent in a given environment, traditionally maximizing the expected rewards received from the environment by taking each action and transitioning between states. This thesis explores approaching RL distributionally, replacing the expected reward function by the full distribution over the possible rewards received, known as the value distribution. We focus on the quantile regression distributional RL (QR-DQN) algorithm introduced by Dabney et al. (2017), which models the value distribution by representing its quantiles. With such information of the value distribution, we modify the QR-DQN algorithm to enhance the agent's risk sensitivity. Our risk-averse algorithm is evaluated against the original QR-DQN in the Atari 2600 and in the Gymnasium environment, specifically in the games Breakout, Pong, Lunar Lander and Cartpole. Results indicate that the risk-averse variant performs comparably in terms of rewards while exhibiting increased robustness and risk aversion. Potential refinements of the risk-averse algorithm are presented.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-348826
Date January 2024
CreatorsPettersson, Adam, Pei Purroy, Francesc
PublisherKTH, Skolan för teknikvetenskap (SCI)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTRITA-SCI-GRU ; 2024:129

Page generated in 0.0021 seconds