Global ETD Search

1	Reinforcement Learning and Simulation-Based Search in Computer Go Silver, David 11 1900 (has links) Learning and planning are two fundamental problems in artificial intelligence. The learning problem can be tackled by reinforcement learning methods, such as temporal-difference learning, which update a value function from real experience, and use function approximation to generalise across states. The planning problem can be tackled by simulation-based search methods, such as Monte-Carlo tree search, which update a value function from simulated experience, but treat each state individually. We introduce a new method, temporal-difference search, that combines elements of both reinforcement learning and simulation-based search methods. In this new method the value function is updated from simulated experience, but it uses function approximation to efficiently generalise across states. We also introduce the Dyna-2 architecture, which combines temporal-difference learning with temporal-difference search. Whereas temporal-difference learning acquires general domain knowledge from its past experience, temporal-difference search acquires local knowledge that is specialised to the agent's current state, by simulating future experience. Dyna-2 combines both forms of knowledge together. We apply our algorithms to the game of 9x9 Go. Using temporal-difference learning, with a million binary features matching simple patterns of stones, and using no prior knowledge except the grid structure of the board, we learnt a fast and effective evaluation function. Using temporal-difference search with the same representation produced a dramatic improvement: without any explicit search tree, and with equivalent domain knowledge, it achieved better performance than a vanilla Monte-Carlo tree search. When combined together using the Dyna-2 architecture, our program outperformed all handcrafted, traditional search, and traditional machine learning programs on the 9x9 Computer Go Server. We also use our framework to extend the Monte-Carlo tree search algorithm. By forming a rapid generalisation over subtrees of the search space, and incorporating heuristic pattern knowledge that was learnt or handcrafted offline, we were able to significantly improve the performance of the Go program MoGo. Using these enhancements, MoGo became the first 9x9 Go program to achieve human master level.
2	Reinforcement Learning and Simulation-Based Search in Computer Go Silver, David Unknown Date No description available.
3	Ising Graphical Model Kamenetsky, Dmitry, dkamen@rsise.anu.edu.au January 2010 (has links) The Ising model is an important model in statistical physics, with over 10,000 papers published on the topic. This model assumes binary variables and only local pairwise interactions between neighbouring nodes. Inference for the general Ising model is NP-hard; this includes tasks such as calculating the partition function, finding a lowest-energy (ground) state and computing marginal probabilities. Past approaches have proceeded by working with classes of tractable Ising models, such as Ising models defined on a planar graph. For such models, the partition function and ground state can be computed exactly in polynomial time by establishing a correspondence with perfect matchings in a related graph. In this thesis we continue this line of research. In particular we simplify previous inference algorithms for the planar Ising model. The key to our construction is the complementary correspondence between graph cuts of the model graph and perfect matchings of its expanded dual. We show that our exact algorithms are effective and efficient on a number of real-world machine learning problems. We also investigate heuristic methods for approximating ground states of non-planar Ising models. We show that in this setting our approximative algorithms are superior than current state-of-the-art methods. Ising model graphical models computer vision machine learning image segmentation computer go
4	Dynamique d'apprentissage pour Monte Carlo Tree Search : applications aux jeux de Go et du Clobber solitaire impartial / Learning dynamics for Monte Carlo Tree Search : application to combinatorial games Fabbri, André 22 October 2015 (has links) Depuis son introduction pour le jeu de Go, Monte Carlo Tree Search (MCTS) a été appliqué avec succès à d'autres jeux et a ouvert la voie à une famille de nouvelles méthodes comme Mutilple-MCTS ou Nested Monte Carlo. MCTS évalue un ensemble de situations de jeu à partir de milliers de fins de parties générées aléatoirement. À mesure que les simulations sont produites, le programme oriente dynamiquement sa recherche vers les coups les plus prometteurs. En particulier, MCTS a suscité l'intérêt de la communauté car elle obtient de remarquables performances sans avoir pour autant recours à de nombreuses connaissances expertes a priori. Dans cette thèse, nous avons choisi d'aborder MCTS comme un système apprenant à part entière. Les simulations sont alors autant d'expériences vécues par le système et les résultats sont autant de renforcements. L'apprentissage du système résulte alors de la complexe interaction entre deux composantes : l'acquisition progressive de représentations et la mobilisation de celles-ci lors des futures simulations. Dans cette optique, nous proposons deux approches indépendantes agissant sur chacune de ces composantes. La première approche accumule des représentations complémentaires pour améliorer la vraisemblance des simulations. La deuxième approche concentre la recherche autour d'objectifs intermédiaires afin de renforcer la qualité des représentations acquises. Les méthodes proposées ont été appliquées aux jeu de Go et du Clobber solitaire impartial. La dynamique acquise par le système lors des expérimentations illustre la relation entre ces deux composantes-clés de l'apprentissage / Monte Carlo Tree Search (MCTS) has been initially introduced for the game of Go but has now been applied successfully to other games and opens the way to a range of new methods such as Multiple-MCTS or Nested Monte Carlo. MCTS evaluates game states through thousands of random simulations. As the simulations are carried out, the program guides the search towards the most promising moves. MCTS achieves impressive results by this dynamic, without an extensive need for prior knowledge. In this thesis, we choose to tackle MCTS as a full learning system. As a consequence, each random simulation turns into a simulated experience and its outcome corresponds to the resulting reinforcement observed. Following this perspective, the learning of the system results from the complex interaction of two processes : the incremental acquisition of new representations and their exploitation in the consecutive simulations. From this point of view, we propose two different approaches to enhance both processes. The first approach gathers complementary representations in order to enhance the relevance of the simulations. The second approach focuses the search on local sub-goals in order to improve the quality of the representations acquired. The methods presented in this work have been applied to the games of Go and Impartial Solitaire Clobber. The results obtained in our experiments highlight the significance of these processes in the learning dynamic and draw up new perspectives to enhance further learning systems such as MCTS Intelligence artificielle pour les jeux Monte Carlo Tree Search Apprentissage par renforcement Jeux combinatoires Computer-Go Clobber Solitaire Impartial Artificial Intelligence on games Monte Carlo Tree Search Reinforcement learning Combinatorial games Computer-Go Impartial Solitaire Clobber 006.3

1

Page generated in 0.0632 seconds