Global ETD Search

Return to search

Comparison of deep reinforcement learning algorithms in a self-play setting

In this exciting era of artificial intelligence and machine learning, the success of AlphaGo,
AlphaZero, and MuZero has generated a great interest in deep reinforcement
learning, especially under self-play settings. The methods used by AlphaZero are
finding their ways to be more useful than before in many different application areas,
such as clinical medicine, intelligent military command decision support systems, and
recommendation systems. While specific methods of reinforcement learning with selfplay
have found their place in application domains, there is much to be explored from
existing reinforcement learning methods not originally intended for self-play settings.
This thesis focuses on evaluating performance of existing reinforcement learning
techniques in self-play settings. In this research, we trained and evaluated the performance
of two deep reinforcement learning algorithms with self-play settings on game
environments, such as the games Connect Four and Chess.
We demonstrate how a simple on-policy, policy-based method, such as REINFORCE,
shows signs of learning, whereas an off-policy value-based method such as
Deep Q-Networks does not perform well with self-play settings in the selected environments.
The results show that REINFORCE agent wins 85% of the games after
training against a random baseline agent and 60% games against the greedy baseline
agent in the game Connect Four. The agent’s strength from both techniques was measured
and plotted against different baseline agents. We also investigate the impact
of selected significant hyper-parameters in the performance of the agents. Finally,
we provide our recommendation for these hyper-parameters’ values for training deep
reinforcement learning agents in similar environments. / Graduate

http://hdl.handle.net/1828/13325

Deep Reinforcement Learning

Self-play

machine learning

Deep learning

reinforcement learning

Identifer	oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/13325
Date	30 August 2021
Creators	Kumar, Sunil
Contributors	Muller, Hausi A.
Source Sets	University of Victoria
Language	English, English
Detected Language	English
Type	Thesis
Format	application/pdf
Rights	Available to the World Wide Web

Page generated in 0.0021 seconds

Comparison of deep reinforcement learning algorithms in a self-play setting

Description

Links & Downloads

Tags

Additional Fields