Spelling suggestions: "subject:"regret minimization"" "subject:"regret minimizations""
11 |
En spelteoretisk AI för StrategoSacchi, Giorgio, Bardvall, David January 2021 (has links)
Many problems involving decision making withimperfect information can be modeled as extensive games. Onefamily of state-of-the-art algorithms for computing optimal playin such games is Counterfactual Regret Minimization (CFR).The purpose of this paper is to explore the viability of CFRalgorithms on the board game Stratego. We compare differentalgorithms within the family and evaluate the heuristic method“imperfect recall” for game abstraction. Our experiments showthat the Monte-Carlo variant External CFR and use of gametree pruning greatly reduce training time. Further, we show thatimperfect recall can reduce the memory requirements with only aminor drop in player performance. These results show that CFRis suitable for strategic decision making. However, solutions tothe long computation time in high complexity games need to beexplored. / Många beslutsproblem med dold informationkan modelleras som spel på omfattande form. En familj avledande algoritmer för att beräkna optimal strategi i sådana spelär Counterfactual Regret Minimization (CFR). Syftet med dennarapport är att undersöka effektiviteten för CFR-algoritmer ibrädspelet Stratego. Vi jämför olika algoritmer inom familjen ochutvärderar den heuristiska metoden “imperfekt minne” för spelabstraktion.Våra experiment visar att Monte-Carlo-variantenExternal CFR och användning av trimning av spelträd kraftigtminskar träningstiden. Vidare visar vi att imperfekt minne kanminska algoritmens lagringskrav med bara en mindre förlust ispelstyrka. Dessa resultat visar att CFR är lämplig för strategisktbeslutsfattande. Lösningar på den långa beräkningstiden i spelmed hög komplexitet måste dock undersökas. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
12 |
Řešení koncovek ve velkých hrách s neúplnou informací jako je např. Poker / Solving Endgames in Large Imperfect-Information Games such as PokerHa, Karel January 2016 (has links)
Title: Solving Endgames in Large Imperfect-Information Games such as Poker Author: Bc. Karel Ha Department: Department of Applied Mathematics Supervisor: doc. Mgr. Milan Hladík, Ph.D., Department of Applied Mathematics Abstract: Endgames have a distinctive role for players. At the late stage of games, many aspects are finally clearly defined, deeming exhaustive analysis tractable. Specialised endgame handling is rewarding for games with perfect information (e.g., Chess databases pre-computed for entire classes of endings, or dividing Go board into separate independent subgames). An appealing idea would be to extend this approach to imperfect-information games such as the famous Poker: play the early parts of the game, and once the subgame becomes feasible, calculate an ending solution. However, the problem is much more complex for imperfect information. Subgames need to be generalized to account for information sets. Unfortunately, such a generalization cannot be solved straightaway, as it does not generally preserve optimality. As a consequence, we may end up with a far more exploitable strategy. There are currently three techniques to deal with this challenge: (a) disregard the problem entirely; (b) use a decomposition technique, which sadly retains only the same quality; (c) or formalize improvements of...
|
13 |
DECENTRALIZED PRICE-DRIVEN DEMAND RESPONSE IN SMART ENERGY GRIDZibo Zhao (5930495) 14 January 2021 (has links)
<div>
<div>
<div>
<p>Real-time pricing (RTP) of electricity for consumers has long been argued to be
crucial for realizing the many envisioned benefits of demand flexibility in a smart
grid. However, many details of how to actually implement a RTP scheme are still
under debate. Since most of the organized wholesale electricity markets in the US
implement a two-settlement mechanism, with day-ahead electricity price forecasts
guiding financial and physical transactions in the next day and real-time ex post
prices settling any real-time imbalances, it is a natural idea to let consumers respond
to the day-ahead prices in real-time. However, if such an idea is not controlled
properly, the inherent closed-loop operation may lead consumers to all respond in
the same fashion, causing large swings of real-time demand and prices, which may
jeopardize system stability and increase consumers’ financial risks.
</p><p><br></p>
<p>To overcome the potential uncertainties and undesired demand peak caused by
“selfish” behaviors by individual consumers under RTP, in this research, we develop a fully decentralized price-driven demand response (DR) approach under game-
theoretical frameworks. In game theory, agents usually make decisions based on their
belief about competitors’ states, which needs to maintain a large amount of knowledge and thus can be intractable and implausible for a large population. Instead,
we propose using regret-based learning in games by focusing on each agent’s own
history and utility received. We study two learning mechanisms: bandit learning
with incomplete information feedback, and low regret learning with full information
feedback. With the learning in games, we establish performance guarantees for each individual agent (i.e., regret minimization) and the overall system (i.e., bounds on
price of anarchy).</p><p><br></p></div></div></div><div><div><div>
<p>In addition to the game-theoretical framework for price-driven demand response,
we also apply such a framework for peer-to-peer energy trading auctions. The market-
based approach can better incentivize the development of distributed energy resources
(DERs) on demand side. However, the complexity of double-sided auctions in an
energy market and agents’ bounded rationality may invalidate many well-established
theories in auction design, and consequently, hinder market development. To address
these issues, we propose an automated bidding framework based on multi-armed
bandit learning through repeated auctions, and is aimed to minimize each bidder’s
cumulative regret. We also use such a framework to compare market outcomes of
three different auction designs.
</p>
</div>
</div>
</div>
|
14 |
Prédiction de structure tridimensionnelle de molécules d’ARN par minimisation de regret / Prediction of three-dimensional structure of RNA molecules by regret minimizationBoudard, Mélanie 29 April 2016 (has links)
Les fonctions d'une molécule d'ARN dans les processus cellulaires sont très étroitement liées à sa structure tridimensionnelle. Il est donc essentiel de pouvoir prédire cette structure pour étudier sa fonction. Le repliement de l'ARN peut être vu comme un processus en deux étapes : le repliement en structure secondaire, grâce à des interactions fortes, puis le repliement en structure tridimensionnelle par des interactions tertiaires. Prédire la structure secondaire a donné lieu à de nombreuses avancées depuis plus de trente ans. Toutefois, la prédiction de la structure tridimensionnelle est un problème bien plus difficile. Nous nous intéressons ici au problème de prédiction de la structure 3D d'ARN sous la forme d'un jeu. Nous représentons la structure secondaire de l'ARN comme un graphe : cela correspond à une modélisation à gros grain de cette structure. Cette modélisation permet de réaliser un jeu de repliement dans l'espace. Notre hypothèse consiste à voir la structure 3D comme un équilibre en théorie des jeux. Pour atteindre cet équilibre, nous utiliserons des algorithmes de minimisation de regret. Nous étudierons aussi différentes formalisations du jeu, basées sur des statistiques biologiques. L'objectif de ce travail est de développer une méthode de repliement d'ARN fonctionnant sur tous les types de molécule d'ARN et obtenant des structures similaires aux molécules réelles. Notre méthode, nommée GARN, a atteint les objectifs attendus et nous a permis d'approfondir l'impact de certains paramètres pour la prédiction de structure à gros grain des molécules. / The functions of RNA molecules in cellular processes are related very closely to its three dimensional structure. It is thus essential to predict the structure for understanding RNA functions. This folding can be seen as a two-step process: the formation of a secondary structure and the formation of three-dimensional structure. This first step is the results of strong interactions between nucleotides, and the second one is obtain by the tertiary interactions. Predicting the secondary structure is well-known and results in numerous advances since thirty years. However, predicting the three-dimensional structure is a more difficult problem due to the high number of possibility. To overcome this problem, we decided to see the folding of the RNA structure as a game. The secondary structure of the RNA is represented as a graph: its corresponds to a coarse-grained modeling of this structure. This modeling allows us to fold the RNA molecule in a discrete space. Our hypothesis is to understand the 3D structure like an equilibrium in game theory. To find this equilibrium, we will use regret minimization algorithms. We also study different formalizations of the game, based on biological statistics. The objective of this work is to develop a method of RNA folding which will work on all types of secondary structures and results more accurate than current approaches. Our method, called GARN, reached the expected objectives and allowed us to deepen the interesting factors for coarse-grained structure prediction on molecules.
|
15 |
Temporal Abstractions in Multi-agent LearningJiayu Chen (18396687) 13 June 2024 (has links)
<p dir="ltr">Learning, planning, and representing knowledge at multiple levels of temporal abstractions provide an agent with the ability to predict consequences of different courses of actions, which is essential for improving the performance of sequential decision making. However, discovering effective temporal abstractions, which the agent can use as skills, and adopting the constructed temporal abstractions for efficient policy learning can be challenging. Despite significant advancements in single-agent settings, temporal abstractions in multi-agent systems remains underexplored. This thesis addresses this research gap by introducing novel algorithms for discovering and employing temporal abstractions in both cooperative and competitive multi-agent environments. We first develop an unsupervised spectral-analysis-based discovery algorithm, aiming at finding temporal abstractions that can enhance the joint exploration of agents in complex, unknown environments for goal-achieving tasks. Subsequently, we propose a variational method that is applicable for a broader range of collaborative multi-agent tasks. This method unifies dynamic grouping and automatic multi-agent temporal abstraction discovery, and can be seamlessly integrated into the commonly-used multi-agent reinforcement learning algorithms. Further, for competitive multi-agent zero-sum games, we develop an algorithm based on Counterfactual Regret Minimization, which enables agents to form and utilize strategic abstractions akin to routine moves in chess during strategy learning, supported by solid theoretical and empirical analyses. Collectively, these contributions not only advance the understanding of multi-agent temporal abstractions but also present practical algorithms for intricate multi-agent challenges, including control, planning, and decision-making in complex scenarios.</p>
|
Page generated in 0.1085 seconds