Global ETD Search

Return to search

Game Players Using Distributional Reinforcement Learning

Reinforcement learning (RL) algorithms aim to identify optimal action sequences for an agent in a given environment, traditionally maximizing the expected rewards received from the environment by taking each action and transitioning between states. This thesis explores approaching RL distributionally, replacing the expected reward function by the full distribution over the possible rewards received, known as the value distribution. We focus on the quantile regression distributional RL (QR-DQN) algorithm introduced by Dabney et al. (2017), which models the value distribution by representing its quantiles. With such information of the value distribution, we modify the QR-DQN algorithm to enhance the agent's risk sensitivity. Our risk-averse algorithm is evaluated against the original QR-DQN in the Atari 2600 and in the Gymnasium environment, specifically in the games Breakout, Pong, Lunar Lander and Cartpole. Results indicate that the risk-averse variant performs comparably in terms of rewards while exhibiting increased robustness and risk aversion. Potential refinements of the risk-averse algorithm are presented.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-348826

Reinforcement learning

distributional reinforcement learning

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-348826
Date	January 2024
Creators	Pettersson, Adam, Pei Purroy, Francesc
Publisher	KTH, Skolan för teknikvetenskap (SCI)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-SCI-GRU ; 2024:129

Page generated in 0.0021 seconds

Game Players Using Distributional Reinforcement Learning

Description

Links & Downloads

Tags

Additional Fields