Return to search

Meta-level reasoning in reinforcement learning

Made available in DSpace on 2015-04-14T14:50:11Z (GMT). No. of bitstreams: 1
458136.pdf: 1716431 bytes, checksum: 17b30dfc5da2cb4b2915eb5fd0832eca (MD5)
Previous issue date: 2014-02-24 / Reinforcement learning (RL) is a technique to compute an optimal policy in stochastic
settings where actions from an initial policy are simulated (or directly executed) and the value of
a state is updated based on the immediate rewards obtained as the policy is executed. Existing
efforts model opponents in competitive games as elements of a stochastic environment and use
RL to learn policies against such opponents. In this setting, the rate of change for state values
monotonically decreases over time, as learning converges. Although this modeling assumes that the
opponent strategy is static over time, such an assumption is too strong with human opponents.
Consequently, in this work, we develop a meta-level RL mechanism that detects when an opponent
changes strategy and allows the state-values to deconverge in order to learn how to play against
a different strategy. We validate this approach empirically for high-level strategy selection in the
Starcraft: Brood War game. / Reinforcement learning (RL) ? uma t?cnica para encontrar uma pol?tica ?tima em ambientes
estoc?sticos onde, as a??es de uma pol?tica inicial s?o simuladas (ou executadas diretamente) e
o valor de um estado ? atualizado com base nas recompensas obtida imediatamente ap?s a execu??o
de cada a??o. Existem trabalhos que modelam advers?rios em jogos competitivos em ambientes
estoc?sticos e usam RL para aprender pol?ticas contra esses advers?rios. Neste cen?rio, a taxa de
mudan?a de valores do estado monotonicamente diminui ao longo do tempo, de acordo com a convergencia
do aprendizado. Embora este modelo pressup?e que a estrat?gia do advers?rio ? est?tica
ao longo do tempo, tal suposi??o ? muito forte com advers?rios humanos. Conseq?entemente, neste
trabalho, ? desenvolvido um mecanismo de meta-level RL que detecta quando um oponente muda
de estrat?gia e permite que taxa de aprendizado almente, a fim de aprender a jogar contra uma
estrat?gia diferente. Esta abordagem ? validada de forma emp?rica, utilizando sele??o de estrat?gias
de alto n?vel no jogo Starcraft: Brood War.

Identiferoai:union.ndltd.org:IBICT/oai:tede2.pucrs.br:tede/5253
Date24 February 2014
CreatorsMaissiat, Ji?verson
ContributorsMeneguzzi, Felipe Rech
PublisherPontif?cia Universidade Cat?lica do Rio Grande do Sul, Programa de P?s-Gradua??o em Ci?ncia da Computa??o, PUCRS, BR, Faculdade de Inform?ca
Source SetsIBICT Brazilian ETDs
LanguagePortuguese
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, info:eu-repo/semantics/masterThesis
Formatapplication/pdf
Sourcereponame:Biblioteca Digital de Teses e Dissertações da PUC_RS, instname:Pontifícia Universidade Católica do Rio Grande do Sul, instacron:PUC_RS
Rightsinfo:eu-repo/semantics/openAccess
Relation1974996533081274470, 500, 600, 1946639708616176246

Page generated in 0.002 seconds