In this exciting era of artificial intelligence and machine learning, the success of AlphaGo,
AlphaZero, and MuZero has generated a great interest in deep reinforcement
learning, especially under self-play settings. The methods used by AlphaZero are
finding their ways to be more useful than before in many different application areas,
such as clinical medicine, intelligent military command decision support systems, and
recommendation systems. While specific methods of reinforcement learning with selfplay
have found their place in application domains, there is much to be explored from
existing reinforcement learning methods not originally intended for self-play settings.
This thesis focuses on evaluating performance of existing reinforcement learning
techniques in self-play settings. In this research, we trained and evaluated the performance
of two deep reinforcement learning algorithms with self-play settings on game
environments, such as the games Connect Four and Chess.
We demonstrate how a simple on-policy, policy-based method, such as REINFORCE,
shows signs of learning, whereas an off-policy value-based method such as
Deep Q-Networks does not perform well with self-play settings in the selected environments.
The results show that REINFORCE agent wins 85% of the games after
training against a random baseline agent and 60% games against the greedy baseline
agent in the game Connect Four. The agent’s strength from both techniques was measured
and plotted against different baseline agents. We also investigate the impact
of selected significant hyper-parameters in the performance of the agents. Finally,
we provide our recommendation for these hyper-parameters’ values for training deep
reinforcement learning agents in similar environments. / Graduate
Identifer | oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/13325 |
Date | 30 August 2021 |
Creators | Kumar, Sunil |
Contributors | Muller, Hausi A. |
Source Sets | University of Victoria |
Language | English, English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | Available to the World Wide Web |
Page generated in 0.0022 seconds