Return to search

Hierarchical reinforcement learning in adversarial environments

It is known that one of the downfalls of reinforcement learning is the amount of time required to learn an optimal policy. This especially holds true for environments with large state spaces or environments with multiple agents. It is also known that standard Q-Learning develops a deterministic policy, and so in games where a stochastic policy is required (such as rock, paper, scissors) a Q-Learner opponent can be defeated without too much difficulty once the learning has ceased. Initially we investigated the impact that the MAXQ hierarchical reinforcement learning algorithm had in an adversarial environment. We found that it was difficult to conduct state space abstraction, especially when an unpredictable or co-evolving opponent was involved. We noticed that to keep the domains zero-sum, discounted learning was required. We had also found that a speed increase could be obtained through the use of hierarchy in the adversarial environment. We then investigated the ability to obtain similar learning speed increases to adversarial reinforcement learning through the use of this hierarchical methodology. Applying the hierarchical decomposition to Bowling's Win or Learn Fast (WoLF) algorithm we were able to maintain the accelerated learning rate whilst simultaneously retaining the stochastic elements of the WoLF algorithm. We made an assessment on the impact of the adversarial component of the hierarchy at both the higher and lower tiers of the hierarchical tree. Finally, we introduce the idea of pivot points. A pivot point is the last possible time you can wait before having to make a decision and thus revealing your strategy to the opponent. This results in maximising confusion for the opponent. Through the use of these pivot points, which could only have been discovered through the use of hierarchy, we were able to perform improved state-space abstraction since no decision needed to be made, in regards to the opponent, until this point was reached.

Identiferoai:union.ndltd.org:ADTP/258637
Date January 2009
CreatorsKwok, Hing-Wah, Computer Science & Engineering, Faculty of Engineering, UNSW
PublisherPublisher:University of New South Wales. Computer Science & Engineering
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright

Page generated in 0.0021 seconds