Return to search

A review of Q-learning methods for Markov decision processes

This paper discusses how Q-Learning and Deep Q-Networks (DQN) canbe applied to state-action problems described by a Markov decision process(MDP). These are machine learning methods for finding the optimal choiceof action at each time step, resulting in the optimal policy. The limitationsand advantages for the two methods are discussed, with the main limitationbeing the fact that Q-learning is unable to be used on problems with infinitestate spaces. Q-learning, however, has an advantage in the simplicity of thealgorithm, leading to a better understanding of what the algorithm is actuallydoing. Q-Learning did manage to find the optimal policy for the simpleproblem studied in this paper, but was unable to do so for the advancedproblem. The Deep Q-Network (DQN) approach was able to solve bothproblems, with a drawback in it being harder to understand what the algorithmactually is doing.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-348638
Date January 2024
CreatorsBlizzard, Christopher, Wiktorsson, Emil
PublisherKTH, Skolan för teknikvetenskap (SCI)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationTRITA-SCI-GRU ; 2024:247

Page generated in 0.0076 seconds