Global ETD Search

1	Exploring the effects of state-action space complexity on training time for AlphaZero agents / Undersökning av påverkan av spelkomplexitet på träningstiden för AlphaZero-agenter Glimmerfors, Tobias January 2022 (has links) DeepMind’s development of AlphaGo took the world by storm in 2016 when it became the first computer program to defeat a world champion at the game of Go. Through further development, DeepMind showed that the underlying algorithm could be made more general, and applied to a large set of problems. This thesis will focus on the AlphaZero algorithm and what parameters affect the rate at which an agent is able to learn through self-play. We investigated the effect that the neural network size has on the agent’s learning as well as how the environment complexity affects the agent’s learning. We used Connect4 as the environment for our agents, and by varying the width of the board we were able to simulate environments with different complexities. For each board width, we trained an AlphaZero agent and tracked the rate at which it improved. While we were unable to find a clear correlation between the complexity of the environment and the rate at which the agent improves, we found that a larger neural network both improved the final performance of the agent as well as the rate at which it learns. Along with this, we also studied what impact the number of MonteCarlo tree search iterations have on an already trained AlphaZero agent. Unsurprisingly, we found that a higher number of iterations led to an improved performance. However, the difference between using only the priors of the neural network and a series of Monte-Carlo tree search iterations is not very large. This suggest that using solely the priors can sometimes be useful if inferences need to made quickly. / DeepMinds utveckling av AlphaGo blev ett stort framsteg året 2016 då det blev första datorprogrammet att besegra världsmästaren i Go. Med utvecklingen av AlphaZero visade DeepMind att en mer generell algoritm kunde användas för att lösa en större mängd problem. Den här rapporten kommer att fokusera på AlphaZero-algoritmen och hur olika parametrar påverkar träningen. Vi undersökte påverkan av neuronnätets storlek och spelkomplexiteten på agentens förmåga att förbättra sig. Med hjälp av 4 i rad som testningsmiljö för våra agenter, och genom att ändra på bredden på spelbrädet kunde vi simulera olika komplexa spel. För varje bredd som vi testade, tränade vi en AlphaZero-agent och mätte dens förbättring. Vi kunde inte hitta någon tydlig korrelation mellan spelets komplexitet och agentens förmåga att lära sig. Däremot visade vi att ett större neuronnät leder till att agenten förbättrar sig mer, och dessutom lär sig snabbare. Vi studerade även påverkan av att variera antalet trädsökningar för en färdigtränad agent. Våra experiment visar på att det finns en korrelation mellan agentens spelstyrka och antalet trädsökningar, där fler trädsökningar innebär en förbättrad förmåga att spela spelet. Skillnaden som antalet trädsökningar gör visade sig däremot inte vara så stor som förväntad. Detta visar på att man kan spara tid under inferensfasen genom att sänka antalet trädsökningar, med en minimal bestraffning i prestanda. Deep learning Reinforcement learning AlphaZero Monte-Carlo tree search Environment complexity Djupinlärning Förstärkande inlärning AlphaZero Monte-Carlo tree search spelkomplexitet Computer and Information Sciences Data- och informationsvetenskap
2	Kooperativní hledání cest s protivníkem / Kooperativní hledání cest s protivníkem Ivanová, Marika January 2014 (has links) Presented master thesis defines and investigates Adversarial Cooperative Path-finding problem (ACPF), a generalization of standard Cooperative Path-finding. In addition to the Cooperative path- finding where non-colliding paths for multiple agents connecting their initial positions and destinations are searched, consideration of agents controlled by the adversary is included in ACPF. This work is focused on both theoretical properties and practical solving techniques of the considered problem. ACPF is introduced formally using terms from graph theory. We study computational complexity of the problem where we show that the problem is PSPACE-hard and belongs to EXPTIME complexity class. We introduce and discuss possible methods suitable for practical solving of the problem. Considered solving approaches include greedy algorithms, minimax methods, Monte Carlo Tree Search and adaptation of algorithm for the cooperative version of the problem. Surprisingly frequent success rate of greedy methods and rather weaker results of Monte Carlo Tree Search are indicated by the conducted experimental evaluation. Powered by TCPDF (www.tcpdf.org)
3	MCTS with Information Sharing / MCTS with Information Sharing Baudiš, Petr January 2011 (has links) We introduce our competitive implementation of a Monte Carlo Tree Search (MCTS) algorithm for the board game of Go: Pachi. The software is based both on previously published methods and our original improvements. We then focus on improving the tree search performance by collecting information regarding tactical situations and game status from the Monte Carlo simulations and sharing it with and within the game tree. We propose specific methods of such sharing --- dynamic komi, criticality-based biasing, and liberty maps --- and demonstrate their positive effect. based on collected play-testing measurements. We also outline some promising future research directions related to our work.
4	Monte Carlo Techniques in Planning / Monte Carlo Techniques in Planning Trunda, Otakar January 2013 (has links) The Monte Carlo Tree Search (MCTS) algorithm has recently proved to be able to solve difficult problems in the field of optimization as well as game-playing. It has been able to address several problems that no conventional techniques have been able to solve efficiently. In this thesis we investigate possible ways to use MCTS in the field of planning and scheduling. We analyze the problem theoretically trying to identify possible difficulties when using MCTS in this field. We propose the solutions to these problems based on a modification of the algorithm and preprocessing the planning domain. We present the techniques we have developed for these tasks and we combine them into an applicable algorithm. We specialize the method for a specific kind of planning problems - the transportation problems. We compare our planner with other planning system.
5	Distribuovaný MCTS pro hry s týmem kooperujících agendů / Distributed Monte-Carlo Tree Search for Games with Team of Cooperative Agents Filip, Ondřej January 2013 (has links) The aim of this work is design, implementation and experimental evaluation of distributed algorithms for planning actions of a team of cooperative autonomous agents. Particular algorithms require different amount of communication. In the work, the related research on Monte-Carlo tree search algorithm, its parallelization and distributability and algorithms for distributed coordination of autonomous agents. Designed algorithms are tested in the environment of the game of Ms Pac-Man. Quality of the algorithms is tested in dependence on computational time, the amount of communication and the robustness against communication failures. Particular algorithms are compared according to these characteristics. Powered by TCPDF (www.tcpdf.org)
6	Exploring the Effect of Different Numbers of Convolutional Filters and Training Loops on the Performance of AlphaZero Prince, Jared 01 October 2018 (has links) In this work, the algorithm used by AlphaZero is adapted for dots and boxes, a two-player game. This algorithm is explored using different numbers of convolutional filters and training loops, in order to better understand the effect these parameters have on the learning of the player. Different board sizes are also tested to compare these parameters in relation to game complexity. AlphaZero originated as a Go player using an algorithm which combines Monte Carlo tree search and convolutional neural networks. This novel approach, integrating a reinforcement learning method previously applied to Go (MCTS) with a supervised learning method (neural networks) led to a player which beat all its competitors. Monte Carlo tree search neural network dots and boxes Other Computer Sciences Robotics Theory and Algorithms
7	Umělá inteligence pro počítačovou hru Children of the Galaxy / Artificial Intelligence for Children of the Galaxy Computer Game Šmejkal, Pavel January 2018 (has links) Even though artificial intelligence (AI) agents are now able to solve many classical games, in the field of computer strategy games, the AI opponents still leave much to be desired. In this work we tackle a problem of combat in strategy video games by adapting existing search approaches: Portfolio greedy search (PGS) and Monte-Carlo tree search (MCTS). We also introduce an improved version of MCTS called MCTS considering hit points (MCTS_HP). These methods are evaluated in context of a recently released 4X strategy game Children of the Galaxy. We implement a combat simulator for the game and a benchmarking framework where various AI approaches can be compared. We show that for small to medium combat MCTS methods are superior to PGS. In all scenarios MCTS_HP is equal or better than regular MCTS due to its better search guidance. In smaller scenarios MCTS_HP with only 100 millisecond time limit outperforms regular MCTS with 2 second time limit. By combining fast greedy search for large combats and more precise MCTS_HP for smaller scenarios a universal AI player can be created.
8	RESOURCE CONSTRAINT COOPERATIVE GAME WITH MONTE CARLO TREE SEARCH Cheng, Chee Chian 01 August 2016 (has links) A hybrid methodology of game theory and Monte Carlo Tree Search was developed and the hybrid methodology was tested with various case studies through the nurse scheduling problem to show that it was able to form Pareto front dominance solutions, ﬁnding feasible solutions that were optimal and ﬁnding feasible partial solutions in over-constrained problems. The performance comparison was carried out with the Genetic Algorithm on the Resident Physician Scheduling problem and showed that the hybrid methodology was able to produce better quality solutions compared to the state of the art approach. cooperative game monte carlo tree search nurse scheduling problem physician scheduling problem resource constraint
9	Umělá inteligence pro hru Quoridor / Artificial Intelligence for Quoridor Board Game Brenner, Matyáš January 2015 (has links) The aim of this work is to design an Artificial Intelligence for Sector 66, which is a board game based on Quoridor. In Sector 66 there is a possibility to use spells and fields with some special effects. The Artificial Intelligence is based on Monte Carlo Tree Search. It can be used for 2 to 4 players. The Artificial Intelligence introduced in this work can work with the high branching factor of Quoridor/Sector 66 Game and can also handle unknown elements represented by user defined plug-ins. The game and the Artificial Intelligence has been developed using .NET Framework, XNA and C#. Powered by TCPDF (www.tcpdf.org)
10	Domain independent enhancements to Monte Carlo tree search for eurogames Bergh, Peter January 2020 (has links) The Monte Carlo tree search-algorithm (MCTS) has been proven successful when applied to combinatorial games, a term applied to sequential games with perfect information. As the focus for MCTS has tended to lean towards combinatorial games, general MCTS-strategies for other types of board games are hard to find. On another front, board games under the name of “Eurogames” have become increasingly popular in the last decade. These games introduce yet another set of challenges for game-playing agents on top of what combinatorial games already offer. Since its initial conception, a large number of enhancements to the MCTS-algorithm has been proposed. Seeing that eurogames share much of the same game-mechanics with each other, MCTS-enhancements proving effective for one game could potentially be aimed towards eurogames in general. In this paper, alterations to the expansion phase, the playout phase and the backpropagation phase are made to the standard MCTS-algorithm for agents playing the game of Carcassonne. To detect how enhancements are affected by chance events, both a deterministic and a stochastic version of the game is examined. It can be concluded that a reward policy relying solely on in-game score outperforms the conventional wins-against-losses policy. Concerning playouts, the Early Playout Termination enhancement only yields better results when the number of MCTS-iterations are somewhat restricted. Lastly, delayed node expansion is shown to be preferable over that of conventional node expansion. None of the enhancements showed any increasing or declining performance with regard to chance events. Additional experiments on other eurogames are needed to reaffirm any findings. Moreover, subsequent studies which introduce modifications to the examined enhancements is proposed as a measure to further increase agent performance. / Monte Carlo tree search-algoritmen (MCTS) har visat sig framgångsrik när den tillämpats på "combinatorial games", en term som används för sekventiella spel med perfekt information. Eftersom fokusområdet för MCTS har tenderat att luta mot "combinatorial games", är det svårt att hitta allmänna MCTS-strategier för andra typer av brädspel. På en annan front har brädspel under namnet "Eurogames" blivit allt populärare under det senaste decenniet. Dessa spel introducerar ännu en uppsättning utmaningar för agenter utöver vad "combinatorial games" redan erbjuder. Sedan dess begynnande så har ett stort antal förbättringar av MCTS-algoritmen föreslagits. Med tanke på att eurogames delar mycket av samma spelmekanik med varandra kan MCTS-förbättringar som visar sig vara effektiva för ett spel potentiellt riktas mot eurogames i allmänhet. I denna studie görs förändringar av expansionsfasen, playout-fasen och backpropagation-fasen i standard MCTS-algoritmen för agenter som spelar spelet Carcassonne. För att upptäcka hur förbättringar påverkas av slumpmässiga händelser undersöks både en deterministisk och en stokastisk version av spelet. Man kan dra slutsatsen att en belöningspolicy som enbart förlitar sig på poäng i spelet överträffar konventionell vinst-mot-förlust-policy. När det gäller "playouts" så bidrar Early Playout Termination-tillägget endast med bättre resultat när antalet MCTS-iterationer är något begränsat. Slutligen kan det visas att fördröjd expansion av noder att föredra framför konventionell expansion. Ingen av förbättringarna visade någon ökande eller minskande prestanda med avseende på slumpmässiga händelser. Ytterligare experiment på andra eurogames behövs för att bekräfta eventuella fynd. Dessutom föreslås efterföljande studier som introducerar modifieringar av de undersökta förbättringarna som ett mått för att ytterligare öka agentens prestanda. Software Engineering Programvaruteknik

Search results