Global ETD Search

Return to search

Application of temporal difference learning and supervised learning in the game of Go.

by Horace Wai-Kit, Chan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 109-112). / Acknowledgement --- p.i / Abstract --- p.ii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Objective --- p.3 / Chapter 1.3 --- Organization of This Thesis --- p.3 / Chapter 2 --- Background --- p.5 / Chapter 2.1 --- Definitions --- p.5 / Chapter 2.1.1 --- Theoretical Definition of Solving a Game --- p.5 / Chapter 2.1.2 --- Definition of Computer Go --- p.7 / Chapter 2.2 --- State of the Art of Computer Go --- p.7 / Chapter 2.3 --- A Framework for Computer Go --- p.11 / Chapter 2.3.1 --- Evaluation Function --- p.11 / Chapter 2.3.2 --- Plausible Move Generator --- p.14 / Chapter 2.4 --- Problems Tackled in this Research --- p.14 / Chapter 3 --- Application of TD in Game Playing --- p.15 / Chapter 3.1 --- Introduction --- p.15 / Chapter 3.2 --- Reinforcement Learning and TD Learning --- p.15 / Chapter 3.2.1 --- Models of Learning --- p.16 / Chapter 3.2.2 --- Temporal Difference Learning --- p.16 / Chapter 3.3 --- TD Learning and Game-playing --- p.20 / Chapter 3.3.1 --- Game-Playing as a Delay-reward Prediction Problem --- p.20 / Chapter 3.3.2 --- Previous Work of TD Learning in Backgammon --- p.20 / Chapter 3.3.3 --- Previous Works of TD Learning in Go --- p.22 / Chapter 3.4 --- Design of this Research --- p.23 / Chapter 3.4.1 --- Limitations in the Previous Researches --- p.24 / Chapter 3.4.2 --- Motivation --- p.25 / Chapter 3.4.3 --- Objective and Methodology --- p.26 / Chapter 4 --- Deriving a New Updating Rule to Apply TD Learning in Multi-layer Perceptron --- p.28 / Chapter 4.1 --- Multi-layer Perceptron (MLP) --- p.28 / Chapter 4.2 --- Derivation of TD(A) Learning Rule for MLP --- p.31 / Chapter 4.2.1 --- Notations --- p.31 / Chapter 4.2.2 --- A New Generalized Delta Rule --- p.31 / Chapter 4.2.3 --- Updating rule for TD(A) Learning --- p.34 / Chapter 4.3 --- Algorithm of Training MLP using TD(A) --- p.35 / Chapter 4.3.1 --- Definitions of Variables in the Algorithm --- p.35 / Chapter 4.3.2 --- Training Algorithm --- p.36 / Chapter 4.3.3 --- Description of the Algorithm --- p.39 / Chapter 5 --- Experiments --- p.41 / Chapter 5.1 --- Introduction --- p.41 / Chapter 5.2 --- Experiment 1 : Training Evaluation Function for 7 x 7 Go Games by TD(λ) with Self-playing --- p.42 / Chapter 5.2.1 --- Introduction --- p.42 / Chapter 5.2.2 --- 7 x 7 Go --- p.42 / Chapter 5.2.3 --- Experimental Designs --- p.43 / Chapter 5.2.4 --- Performance Testing for Trained Networks --- p.44 / Chapter 5.2.5 --- Results --- p.44 / Chapter 5.2.6 --- Discussions --- p.45 / Chapter 5.2.7 --- Limitations --- p.47 / Chapter 5.3 --- Experiment 2 : Training Evaluation Function for 9 x 9 Go Games by TD(λ) Learning from Human Games --- p.47 / Chapter 5.3.1 --- Introduction --- p.47 / Chapter 5.3.2 --- 9x 9 Go game --- p.48 / Chapter 5.3.3 --- Training Data Preparation --- p.49 / Chapter 5.3.4 --- Experimental Designs --- p.50 / Chapter 5.3.5 --- Results --- p.52 / Chapter 5.3.6 --- Discussion --- p.54 / Chapter 5.3.7 --- Limitations --- p.56 / Chapter 5.4 --- Experiment 3 : Life Status Determination in the Go Endgame --- p.57 / Chapter 5.4.1 --- Introduction --- p.57 / Chapter 5.4.2 --- Training Data Preparation --- p.58 / Chapter 5.4.3 --- Experimental Designs --- p.60 / Chapter 5.4.4 --- Results --- p.64 / Chapter 5.4.5 --- Discussion --- p.65 / Chapter 5.4.6 --- Limitations --- p.66 / Chapter 5.5 --- A Postulated Model --- p.66 / Chapter 6 --- Conclusions --- p.69 / Chapter 6.1 --- Future Direction of Research --- p.71 / Chapter A --- An Introduction to Go --- p.72 / Chapter A.l --- A Brief Introduction --- p.72 / Chapter A.1.1 --- What is Go? --- p.72 / Chapter A.1.2 --- History of Go --- p.72 / Chapter A.1.3 --- Equipment used in a Go game --- p.73 / Chapter A.2 --- Basic Rules in Go --- p.74 / Chapter A.2.1 --- A Go game --- p.74 / Chapter A.2.2 --- Liberty and Capture --- p.75 / Chapter A.2.3 --- Ko --- p.77 / Chapter A.2.4 --- "Eyes, Live and Death" --- p.81 / Chapter A.2.5 --- Seki --- p.83 / Chapter A.2.6 --- Endgame and Scoring --- p.83 / Chapter A.2.7 --- Rank and Handicap Games --- p.85 / Chapter A.3 --- Strategies and Tactics in Go --- p.87 / Chapter A.3.1 --- Strategy vs Tactics --- p.87 / Chapter A.3.2 --- Open-game --- p.88 / Chapter A.3.3 --- Middle-game --- p.91 / Chapter A.3.4 --- End-game --- p.92 / Chapter B --- Mathematical Model of Connectivity --- p.94 / Chapter B.1 --- Introduction --- p.94 / Chapter B.2 --- Basic Definitions --- p.94 / Chapter B.3 --- Adjacency and Connectivity --- p.96 / Chapter B.4 --- String and Link --- p.98 / Chapter B.4.1 --- String --- p.98 / Chapter B.4.2 --- Link --- p.98 / Chapter B.5 --- Liberty and Atari --- p.99 / Chapter B.5.1 --- Liberty --- p.99 / Chapter B.5.2 --- Atari --- p.101 / Chapter B.6 --- Ko --- p.101 / Chapter B.7 --- Prohibited Move --- p.104 / Chapter B.8 --- Path and Distance --- p.105 / Bibliography --- p.109

Go (Game)--Data processing

Computer games--Programming

Artificial intelligence

Identifer	oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_321537
Date	January 1996
Contributors	Chan, Horace Wai-Kit., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Publisher	Chinese University of Hong Kong
Source Sets	The Chinese University of Hong Kong
Language	English
Detected Language	English
Type	Text, bibliography
Format	print, ix, 112 leaves : ill. ; 30 cm.
Rights	Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0023 seconds

Application of temporal difference learning and supervised learning in the game of Go.

Description

Links & Downloads

Tags

Additional Fields