• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Exploring the effects of state-action space complexity on training time for AlphaZero agents / Undersökning av påverkan av spelkomplexitet på träningstiden för AlphaZero-agenter

Glimmerfors, Tobias January 2022 (has links)
DeepMind’s development of AlphaGo took the world by storm in 2016 when it became the first computer program to defeat a world champion at the game of Go. Through further development, DeepMind showed that the underlying algorithm could be made more general, and applied to a large set of problems. This thesis will focus on the AlphaZero algorithm and what parameters affect the rate at which an agent is able to learn through self-play. We investigated the effect that the neural network size has on the agent’s learning as well as how the environment complexity affects the agent’s learning. We used Connect4 as the environment for our agents, and by varying the width of the board we were able to simulate environments with different complexities. For each board width, we trained an AlphaZero agent and tracked the rate at which it improved. While we were unable to find a clear correlation between the complexity of the environment and the rate at which the agent improves, we found that a larger neural network both improved the final performance of the agent as well as the rate at which it learns. Along with this, we also studied what impact the number of MonteCarlo tree search iterations have on an already trained AlphaZero agent. Unsurprisingly, we found that a higher number of iterations led to an improved performance. However, the difference between using only the priors of the neural network and a series of Monte-Carlo tree search iterations is not very large. This suggest that using solely the priors can sometimes be useful if inferences need to made quickly. / DeepMinds utveckling av AlphaGo blev ett stort framsteg året 2016 då det blev första datorprogrammet att besegra världsmästaren i Go. Med utvecklingen av AlphaZero visade DeepMind att en mer generell algoritm kunde användas för att lösa en större mängd problem. Den här rapporten kommer att fokusera på AlphaZero-algoritmen och hur olika parametrar påverkar träningen. Vi undersökte påverkan av neuronnätets storlek och spelkomplexiteten på agentens förmåga att förbättra sig. Med hjälp av 4 i rad som testningsmiljö för våra agenter, och genom att ändra på bredden på spelbrädet kunde vi simulera olika komplexa spel. För varje bredd som vi testade, tränade vi en AlphaZero-agent och mätte dens förbättring. Vi kunde inte hitta någon tydlig korrelation mellan spelets komplexitet och agentens förmåga att lära sig. Däremot visade vi att ett större neuronnät leder till att agenten förbättrar sig mer, och dessutom lär sig snabbare. Vi studerade även påverkan av att variera antalet trädsökningar för en färdigtränad agent. Våra experiment visar på att det finns en korrelation mellan agentens spelstyrka och antalet trädsökningar, där fler trädsökningar innebär en förbättrad förmåga att spela spelet. Skillnaden som antalet trädsökningar gör visade sig däremot inte vara så stor som förväntad. Detta visar på att man kan spara tid under inferensfasen genom att sänka antalet trädsökningar, med en minimal bestraffning i prestanda.
2

Zero-Knowledge Agent Trained for the Game of Risk

Bethdavid, Simon January 2020 (has links)
Recent developments in deep reinforcement learning applied to abstract strategy games such as Go, chess and Hex have sparked an interest within military planning. This Master thesis explores if it is possible to implement an algorithm similar to Expert Iteration and AlphaZero to wargames. The studied wargame is Risk, which is a turn-based multiplayer game played on a simplified political map of the world. The algorithms consist of an expert, in the form of a Monte Carlo tree search algorithm, and an apprentice, implemented through a neural network. The neural network is trained by imitation learning, trained to mimic expert decisions generated from self-play reinforcement learning. The apprentice is then used as heuristics in forthcoming tree searches. The results demonstrated that a Monte Carlo tree search algorithm could, to some degree, be employed on a strategy game as Risk, dominating a random playing agent. The neural network, fed with a state representation in the form of a vector, had difficulty in learning expert decisions and could not beat a random playing agent. This led to a halt in the expert/apprentice learning process. However, possible solutions are provided as future work.
3

Using search based methods for beamforming

Bergman Karlsson, Adam January 2024 (has links)
In accommodating the growing global demand for wireless, Multi-User Multiple-Input and Multiple-Output (MU-MIMO) systems have been identified as the key technology. In such systems, a transmitting basestation serves several users simultaneously, increasing the network capacity. However, sharing the same time-frequency physical resources can cause interference for the simultaneously scheduled users if not moderated properly. One way to mitigate this interference is by directing radio power through the radio channel in specific directions, a method which is called beamforming. Following the successful implementation of the AlphaZero algorithm in another radio resource management technique, scheduling, this thesis explores the potential of using a similar search-based method for the beamforming problem, striving towards the ultimate objective of making decisions for scheduling and beamforming jointly. However, as AlphaZero only supports discrete action spaces and the action space of the beamforming problem is continuous, a modification of the algorithm is required. The proposed course of action is to extend AlphaZero into Sampled AlphaZero, using sample-based policy improvement to create an algorithm that is both more scalable for large discrete action spaces and able to handle high dimensional continuous action spaces. To evaluate the performance of the models, test environments were simulated and solved using increasingly larger so-called codebooks, containing predefined beamforming solutions. The results of the Sampled AlphaZero model demonstrated promising performance even for very large codebook sizes, indicating the model's suitability for addressing the beamforming problem in a non-codebook-based context. Furthermore, this thesis explores how states in the search can be represented and preprocessed for the neural network to learn efficiently, demonstrating clear benefits of using a singular value decomposition-based state preprocessing over raw states as input to the neural network.

Page generated in 0.0236 seconds