Spelling suggestions: "subject:"counterfactual regret minimization"" "subject:"counterfactuals regret minimization""
1 |
Using counterfactual regret minimization to create a competitive multiplayer poker agentAbou Risk, Nicholas 11 1900 (has links)
Games have been used to evaluate and advance techniques in the eld of Articial Intelligence since
before computers were invented. Many of these games have been deterministic perfect information
games (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect
information game, all information is visible to all players. However, many real-world scenarios
involving competing agents can be more accurately modeled as stochastic (non-deterministic), im-
perfect information games, and this dissertation investigates such games. Poker is one such game
played by millions of people around the world; it will be used as the testbed of the research presented
in this dissertation. For a specic set of games, two-player zero-sum perfect recall games, a recent
technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably
convergent to an -Nash equilibrium. A Nash equilibrium strategy is very useful in two-player games
as it maximizes its utility against a worst-case opponent. However, once we move to multiplayer
games, we lose all theoretical guarantees for CFR. Furthermore, we have no theoretical guarantees
about the performance of a strategy from a multiplayer Nash equilibrium against two arbitrary op-
ponents. Despite the lack of theoretical guarantees, my thesis is that CFR-generated agents may
perform well in multiplayer games. I created several 3-player limit Texas Holdem Poker agents
and the results of the 2009 Computer Poker Competition demonstrate that these are the strongest
3-player computer Poker agents in the world. I also contend that a good strategy can be obtained by
grafting a set of two-player subgame strategies to a 3-player base strategy when one of the players
is eliminated.
|
2 |
Using counterfactual regret minimization to create a competitive multiplayer poker agentAbou Risk, Nicholas Unknown Date
No description available.
|
3 |
Hraní her a Deepstack / General Game Playing and DeepstackSchlindenbuch, Hynek January 2019 (has links)
General game playing is an area of artificial intelligence which focuses on creating agents capable of playing many games from some class. The agents receive the rules just before the match and therefore cannot be specialized for each game. Deepstack is the first artificial intelligence to beat professional human players in heads-up no-limit Texas hold'em poker. While it is specialized for poker, at its core is a general algorithm for playing two-player zero-sum games with imperfect information - continual resolving. In this thesis we introduce a general version of continual resolving and compare its performance against Online Outcome Sampling Monte Carlo Counterfactual Regret Minimization in several games.
|
4 |
Bluffing AI in Strategy Board GameLeijonhufvud, Johan, Henriksson, Albin January 2021 (has links)
Games have been a field of interest for researchin artificial intelligence for decades. As of now, it is over 5years ago an AI for the strategy game Go, AlphaGo, beat worldchampion Lee Sedol 4-1, which was considered to be an enormousmilestone for AI. Our goal is to make an AI that can play theclassic strategy board game Stratego at a competent level. Thisis achieved by making the AI learn by repeatedly playing againstitself to figure out what strategy to use in various situations byusing CFR - counterfactual regret minimization. According toour experiments, we were able to accomplish our goal in makinga Stratego AI that could play at a sophisticated level for a smallerversion of the game. We concluded that it was able to play betterthan an amateur human player. / Spel har varit ett intresseområde inomutvecklingen av artificiell intelligens i årtionden. Det är redanfem år sedan AlphaGo slog världsmästaren Lee Sedol i Go 2016,vilket betraktas vara ett stort steg för utvecklingen av AI. Vårtmål är att skapa en AI som kan spela strategispelet Stratego på en kompetent nivå. Detta kommer att implementeras genom att AI:n spelar mot sig själv en stor mängd gånger och uppdaterarsin strategi baserat på konceptet CFR counterfactual regretminimization. Enligt våra experiment lyckades vi med vårt mål i att skapa en kompetent Stratego AI för en mindre version avStratego. Vår uppfattning är att den spelar bättre än en människapå amatörnivå. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
5 |
En spelteoretisk AI för StrategoSacchi, Giorgio, Bardvall, David January 2021 (has links)
Many problems involving decision making withimperfect information can be modeled as extensive games. Onefamily of state-of-the-art algorithms for computing optimal playin such games is Counterfactual Regret Minimization (CFR).The purpose of this paper is to explore the viability of CFRalgorithms on the board game Stratego. We compare differentalgorithms within the family and evaluate the heuristic method“imperfect recall” for game abstraction. Our experiments showthat the Monte-Carlo variant External CFR and use of gametree pruning greatly reduce training time. Further, we show thatimperfect recall can reduce the memory requirements with only aminor drop in player performance. These results show that CFRis suitable for strategic decision making. However, solutions tothe long computation time in high complexity games need to beexplored. / Många beslutsproblem med dold informationkan modelleras som spel på omfattande form. En familj avledande algoritmer för att beräkna optimal strategi i sådana spelär Counterfactual Regret Minimization (CFR). Syftet med dennarapport är att undersöka effektiviteten för CFR-algoritmer ibrädspelet Stratego. Vi jämför olika algoritmer inom familjen ochutvärderar den heuristiska metoden “imperfekt minne” för spelabstraktion.Våra experiment visar att Monte-Carlo-variantenExternal CFR och användning av trimning av spelträd kraftigtminskar träningstiden. Vidare visar vi att imperfekt minne kanminska algoritmens lagringskrav med bara en mindre förlust ispelstyrka. Dessa resultat visar att CFR är lämplig för strategisktbeslutsfattande. Lösningar på den långa beräkningstiden i spelmed hög komplexitet måste dock undersökas. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
6 |
Řešení koncovek ve velkých hrách s neúplnou informací jako je např. Poker / Solving Endgames in Large Imperfect-Information Games such as PokerHa, Karel January 2016 (has links)
Title: Solving Endgames in Large Imperfect-Information Games such as Poker Author: Bc. Karel Ha Department: Department of Applied Mathematics Supervisor: doc. Mgr. Milan Hladík, Ph.D., Department of Applied Mathematics Abstract: Endgames have a distinctive role for players. At the late stage of games, many aspects are finally clearly defined, deeming exhaustive analysis tractable. Specialised endgame handling is rewarding for games with perfect information (e.g., Chess databases pre-computed for entire classes of endings, or dividing Go board into separate independent subgames). An appealing idea would be to extend this approach to imperfect-information games such as the famous Poker: play the early parts of the game, and once the subgame becomes feasible, calculate an ending solution. However, the problem is much more complex for imperfect information. Subgames need to be generalized to account for information sets. Unfortunately, such a generalization cannot be solved straightaway, as it does not generally preserve optimality. As a consequence, we may end up with a far more exploitable strategy. There are currently three techniques to deal with this challenge: (a) disregard the problem entirely; (b) use a decomposition technique, which sadly retains only the same quality; (c) or formalize improvements of...
|
7 |
Temporal Abstractions in Multi-agent LearningJiayu Chen (18396687) 13 June 2024 (has links)
<p dir="ltr">Learning, planning, and representing knowledge at multiple levels of temporal abstractions provide an agent with the ability to predict consequences of different courses of actions, which is essential for improving the performance of sequential decision making. However, discovering effective temporal abstractions, which the agent can use as skills, and adopting the constructed temporal abstractions for efficient policy learning can be challenging. Despite significant advancements in single-agent settings, temporal abstractions in multi-agent systems remains underexplored. This thesis addresses this research gap by introducing novel algorithms for discovering and employing temporal abstractions in both cooperative and competitive multi-agent environments. We first develop an unsupervised spectral-analysis-based discovery algorithm, aiming at finding temporal abstractions that can enhance the joint exploration of agents in complex, unknown environments for goal-achieving tasks. Subsequently, we propose a variational method that is applicable for a broader range of collaborative multi-agent tasks. This method unifies dynamic grouping and automatic multi-agent temporal abstraction discovery, and can be seamlessly integrated into the commonly-used multi-agent reinforcement learning algorithms. Further, for competitive multi-agent zero-sum games, we develop an algorithm based on Counterfactual Regret Minimization, which enables agents to form and utilize strategic abstractions akin to routine moves in chess during strategy learning, supported by solid theoretical and empirical analyses. Collectively, these contributions not only advance the understanding of multi-agent temporal abstractions but also present practical algorithms for intricate multi-agent challenges, including control, planning, and decision-making in complex scenarios.</p>
|
Page generated in 0.1227 seconds