Global ETD Search

Return to search

A review of Q-learning methods for Markov decision processes

This paper discusses how Q-Learning and Deep Q-Networks (DQN) canbe applied to state-action problems described by a Markov decision process(MDP). These are machine learning methods for finding the optimal choiceof action at each time step, resulting in the optimal policy. The limitationsand advantages for the two methods are discussed, with the main limitationbeing the fact that Q-learning is unable to be used on problems with infinitestate spaces. Q-learning, however, has an advantage in the simplicity of thealgorithm, leading to a better understanding of what the algorithm is actuallydoing. Q-Learning did manage to find the optimal policy for the simpleproblem studied in this paper, but was unable to do so for the advancedproblem. The Deep Q-Network (DQN) approach was able to solve bothproblems, with a drawback in it being harder to understand what the algorithmactually is doing.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-348638

Q-Learning

Deep Q-Network

Markov Decision Process

Mathematics

Matematik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-348638
Date	January 2024
Creators	Blizzard, Christopher, Wiktorsson, Emil
Publisher	KTH, Skolan för teknikvetenskap (SCI)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-SCI-GRU ; 2024:247

Page generated in 0.0027 seconds

A review of Q-learning methods for Markov decision processes

Description

Links & Downloads

Tags

Additional Fields