Return to search

Application of temporal difference learning and supervised learning in the game of Go.

by Horace Wai-Kit, Chan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 109-112). / Acknowledgement --- p.i / Abstract --- p.ii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Objective --- p.3 / Chapter 1.3 --- Organization of This Thesis --- p.3 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Definitions --- p.5 / Chapter 2.1.1 --- Theoretical Definition of Solving a Game --- p.5 / Chapter 2.1.2 --- Definition of Computer Go --- p.7 / Chapter 2.2 --- State of the Art of Computer Go --- p.7 / Chapter 2.3 --- A Framework for Computer Go --- p.11 / Chapter 2.3.1 --- Evaluation Function --- p.11 / Chapter 2.3.2 --- Plausible Move Generator --- p.14 / Chapter 2.4 --- Problems Tackled in this Research --- p.14 / Chapter 3 --- Application of TD in Game Playing --- p.15 / Chapter 3.1 --- Introduction --- p.15 / Chapter 3.2 --- Reinforcement Learning and TD Learning --- p.15 / Chapter 3.2.1 --- Models of Learning --- p.16 / Chapter 3.2.2 --- Temporal Difference Learning --- p.16 / Chapter 3.3 --- TD Learning and Game-playing --- p.20 / Chapter 3.3.1 --- Game-Playing as a Delay-reward Prediction Problem --- p.20 / Chapter 3.3.2 --- Previous Work of TD Learning in Backgammon --- p.20 / Chapter 3.3.3 --- Previous Works of TD Learning in Go --- p.22 / Chapter 3.4 --- Design of this Research --- p.23 / Chapter 3.4.1 --- Limitations in the Previous Researches --- p.24 / Chapter 3.4.2 --- Motivation --- p.25 / Chapter 3.4.3 --- Objective and Methodology --- p.26 / Chapter 4 --- Deriving a New Updating Rule to Apply TD Learning in Multi-layer Perceptron --- p.28 / Chapter 4.1 --- Multi-layer Perceptron (MLP) --- p.28 / Chapter 4.2 --- Derivation of TD(A) Learning Rule for MLP --- p.31 / Chapter 4.2.1 --- Notations --- p.31 / Chapter 4.2.2 --- A New Generalized Delta Rule --- p.31 / Chapter 4.2.3 --- Updating rule for TD(A) Learning --- p.34 / Chapter 4.3 --- Algorithm of Training MLP using TD(A) --- p.35 / Chapter 4.3.1 --- Definitions of Variables in the Algorithm --- p.35 / Chapter 4.3.2 --- Training Algorithm --- p.36 / Chapter 4.3.3 --- Description of the Algorithm --- p.39 / Chapter 5 --- Experiments --- p.41 / Chapter 5.1 --- Introduction --- p.41 / Chapter 5.2 --- Experiment 1 : Training Evaluation Function for 7 x 7 Go Games by TD(λ) with Self-playing --- p.42 / Chapter 5.2.1 --- Introduction --- p.42 / Chapter 5.2.2 --- 7 x 7 Go --- p.42 / Chapter 5.2.3 --- Experimental Designs --- p.43 / Chapter 5.2.4 --- Performance Testing for Trained Networks --- p.44 / Chapter 5.2.5 --- Results --- p.44 / Chapter 5.2.6 --- Discussions --- p.45 / Chapter 5.2.7 --- Limitations --- p.47 / Chapter 5.3 --- Experiment 2 : Training Evaluation Function for 9 x 9 Go Games by TD(λ) Learning from Human Games --- p.47 / Chapter 5.3.1 --- Introduction --- p.47 / Chapter 5.3.2 --- 9x 9 Go game --- p.48 / Chapter 5.3.3 --- Training Data Preparation --- p.49 / Chapter 5.3.4 --- Experimental Designs --- p.50 / Chapter 5.3.5 --- Results --- p.52 / Chapter 5.3.6 --- Discussion --- p.54 / Chapter 5.3.7 --- Limitations --- p.56 / Chapter 5.4 --- Experiment 3 : Life Status Determination in the Go Endgame --- p.57 / Chapter 5.4.1 --- Introduction --- p.57 / Chapter 5.4.2 --- Training Data Preparation --- p.58 / Chapter 5.4.3 --- Experimental Designs --- p.60 / Chapter 5.4.4 --- Results --- p.64 / Chapter 5.4.5 --- Discussion --- p.65 / Chapter 5.4.6 --- Limitations --- p.66 / Chapter 5.5 --- A Postulated Model --- p.66 / Chapter 6 --- Conclusions --- p.69 / Chapter 6.1 --- Future Direction of Research --- p.71 / Chapter A --- An Introduction to Go --- p.72 / Chapter A.l --- A Brief Introduction --- p.72 / Chapter A.1.1 --- What is Go? --- p.72 / Chapter A.1.2 --- History of Go --- p.72 / Chapter A.1.3 --- Equipment used in a Go game --- p.73 / Chapter A.2 --- Basic Rules in Go --- p.74 / Chapter A.2.1 --- A Go game --- p.74 / Chapter A.2.2 --- Liberty and Capture --- p.75 / Chapter A.2.3 --- Ko --- p.77 / Chapter A.2.4 --- "Eyes, Live and Death" --- p.81 / Chapter A.2.5 --- Seki --- p.83 / Chapter A.2.6 --- Endgame and Scoring --- p.83 / Chapter A.2.7 --- Rank and Handicap Games --- p.85 / Chapter A.3 --- Strategies and Tactics in Go --- p.87 / Chapter A.3.1 --- Strategy vs Tactics --- p.87 / Chapter A.3.2 --- Open-game --- p.88 / Chapter A.3.3 --- Middle-game --- p.91 / Chapter A.3.4 --- End-game --- p.92 / Chapter B --- Mathematical Model of Connectivity --- p.94 / Chapter B.1 --- Introduction --- p.94 / Chapter B.2 --- Basic Definitions --- p.94 / Chapter B.3 --- Adjacency and Connectivity --- p.96 / Chapter B.4 --- String and Link --- p.98 / Chapter B.4.1 --- String --- p.98 / Chapter B.4.2 --- Link --- p.98 / Chapter B.5 --- Liberty and Atari --- p.99 / Chapter B.5.1 --- Liberty --- p.99 / Chapter B.5.2 --- Atari --- p.101 / Chapter B.6 --- Ko --- p.101 / Chapter B.7 --- Prohibited Move --- p.104 / Chapter B.8 --- Path and Distance --- p.105 / Bibliography --- p.109

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_321537
Date January 1996
ContributorsChan, Horace Wai-Kit., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
PublisherChinese University of Hong Kong
Source SetsThe Chinese University of Hong Kong
LanguageEnglish
Detected LanguageEnglish
TypeText, bibliography
Formatprint, ix, 112 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0019 seconds