Applying Deep Reinforcement Learning in Playing Simplified Scenarios of StarCraft II / 應用深度強化學習進行星海爭霸II之迷你遊戲

碩士 / 國立臺北大學 / 資訊管理研究所 / 107 / One of the objectives of deep reinforcement learning (RL) is making an intelligent agent that is capable of making decisions or solving problems. Recently, artificial intelligence (AI) has won Atari games and defeated professional Go player, so AI researchers have being focused on more complex tasks – real-time strategy games. This study used StarCraft II artificial intelligence learning environment co-developed by DeepMind and Blizzard Entertainment to train and develop intelligent agents that are capable of playing StarCraft II.
Our study uses the Asynchronous Advantage Actor-Critic (A3C) algorithm to train the RL model. We compare the action selection methods of two RL algorithms – the probability exploration method of the original A3C and the ε-greedy method. The results of the study show that the ε-greedy method learned a good game strategy faster. However, giving more training time, the probability exploration method does better eventually. We also propose a way to expedite the training process. We compare the difference between the growth rate of the cumulative rewards of the previous batch and this batch. We decrease the learning rate when the difference is larger than a predefined threshold. The results show that when the batch size is 20 and the decrement threshold is set to 30% to 40%, the agent can be trained more efficiently.
Because it takes tremendous computing resources to learn the whole game of StarCraft II. Instead, we use the mini-games to train and learn RL models. The research environment established by this study can be utilized to training whole games in future studies.

Identiferoai:union.ndltd.org:TW/107NTPU0396011
Date January 2019
CreatorsSHEN, YI-XING, 沈易星
ContributorsCHEN, TSUNG-TENG, 陳宗天
Source SetsNational Digital Library of Theses and Dissertations in Taiwan
Languagezh-TW
Detected LanguageEnglish
Type學位論文 ; thesis
Format63

Page generated in 0.0104 seconds