Global ETD Search

Return to search

Time-normalised discounting in reinforcement learning

Reinforcement learning has emerged as a powerful paradigm in machinelearning, witnessing remarkable progress in recent years. Amongreinforcement algorithms, Q-learning stands out, enabling agents tolearn quickly from past actions. This study aims to investigate andenhance Q-learning methodologies, with a specific focus on tabularQ-learning. In particular, it addresses Q-learning with an actionspace containing actions that require different amounts of time toexecute. With such an action space the algorithm might convergeto a suboptimal solution when using a constant discount factor sincediscounting occurs per action and not per time step. We refer to thisissue as the non-temporal discounting (NTD) problem. By introducinga time-normalised discounting function, we were able to address theissue of NTD. In addition, we were able to stabilise the solutionby implementing a cost for specific actions. As a result, the modelconverged to the expected solution. Building on these results it wouldbe wise to implement time-normalised discounting in a state-of-the-artreinforcement learning model such as deep Q-learning.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-348807

Mathematics

Matematik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-348807
Date	January 2024
Creators	Akan, Oguzhan, Waara Ankarstrand, Wilmer
Publisher	KTH, Skolan för teknikvetenskap (SCI)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-SCI-GRU ; 2024:255

Page generated in 0.0021 seconds

Time-normalised discounting in reinforcement learning

Description

Links & Downloads

Tags

Additional Fields