Global ETD Search

1	Empirical-evidence equilibria in stochastic games Dudebout, Nicolas 27 August 2014 (has links) The objective of this research is to develop the framework of empirical-evidence equilibria (EEEs) in stochastic games. This framework was developed while attempting to design decentralized controllers using learning in stochastic games. The overarching goal is to enable a set of agents to control a dynamical system in a decentralized fashion. To do so, the agents play a stochastic game crafted such that its equilibria are decentralized controllers for the dynamical system. Unfortunately, there exists no algorithm to compute equilibria in stochastic games. One explanation for this lack of results is the full-rationality requirement of game theory. In the case of stochastic games, full rationality imposes that two requirements be met at equilibrium. First, each agent has a perfect model of the game and of its opponents strategies. Second, each agent plays an optimal strategy for the POMDP induced by its opponents strategies. Both requirements are unrealistic. An agent cannot know the strategies of its opponents; it can only observe the combined effect of its own strategy interacting with its opponents. Furthermore, POMDPs are intractable; an agent cannot compute an optimal strategy in a reasonable time. In addition to these two requirements, engineered agents cannot carry perfect analytical reasoning and have limited memory; they naturally exhibit bounded rationality. In this research, bounded rationality is not seen as a limitation and is instead used to relax the two requirements. In the EEE framework, agents formulate low-order empirical models of observed quantities called mockups. Mockups have unmodeled states and dynamic effects, but they are statistically consistent; the empirical evidence observed by an agent does not contradict its mockup. Each agent uses its mockup to derive an optimal strategy. 1Since agents are interconnected through the system, these mockups are sensitive to the specific strategies employed by other agents. In an EEE, the two requirements are weakened. First, each agent has a consistent mockup of the game and the strategies of its opponents. Second, each agent plays an optimal strategy for the MDP induced by its mockup. The main contribution of this dissertation is the use of modeling to study stochastic games. This approach, while common in engineering, had not been applied to stochastic games. Equilibrium Stochastic games Bounded rationality
2	A Model for Strategic Bidding in Combined Transmission and Wholesale Energy Markets Gupte, Sanket 01 July 2004 (has links) Motivated by deregulation in major service sectors like airlines, banking and telecommunication, the electric industry is undergoing a major transformation. However due to design inefficiencies, restructuring of the power sector, so far, has not been a major success. A lack of comprehensive quantitative models has resulted in the inability of the market designers to evaluate market performance and develop successful market designs. A comprehensive model should include market features like two-settlement system, transmission congestion, financial transmission rights (FTRs), demand elasticity, demand-side bidding and other market rules. The contribution of this thesis includes development of an exhaustive modeling framework that includes the above mentioned market features and also development of a computationally effective solution methodology. The market designers would use this methodology in the development of alternative conceptual market design frameworks, and also for assessing the impact of various market rules on market performance. The noncooperative bidding behavior of the generators in both FTR and energy markets are modeled as nonzero-sum stochastic games. Since the bidding strategies in the FTR and energy games are dependent on each other and jointly impact the market performance, a two-tier learning approach is developed. Players (e.g. generators) first bid in the FTR market. FTR bids are then taken into account in the process of selecting bids in the energy market. The FTR bids and the energy bids together decide the market equilibrium and the resulting performance. This performance measure is then used to evaluate success of FTR bidding strategy. Several example power networks are studied to expose the modeling and learning based solution approach. restructuring market power reinforcement learning FTR stochastic games American Studies Arts and Humanities
3	Equilibria in Quitting Games and Software for the Analysis / Gleichgewichte in Quitting Games und Software für ihre Analyse Fischer, Katharina 08 August 2013 (has links) (PDF) A quitting game is an undiscounted sequential stochastic game, with finitely many players. At any stage each player has only two possible actions, continue and quit. The game ends as soon as at least one player chooses to quit. The players then receive a payoff, which depends only on the set of players that did choose to quit. If the game never ends, the payoff to each player is zero. In this thesis we give a detailed introduction to quitting games. We examine the existing results for the existence of equilibria and improve an important result from Solan and Vieille stated in their article “Quitting Games” (2001). Since there is no software for the analysis of quitting games, or for stochastic games with more than two players, we provide algorithms and programs for symmetric quitting games, for a reduction by dominance and for the detection of a pure, instant and stationary epsilon-equilibrium. Stochastische N-Personen Spiele Quitting Games N-player stochastic games quitting games ddc:510 rvk:SK 860
4	Statistical Analysis of Wireless Communication Systems Using Hidden Markov Models Rouf, Ishtiaq 06 August 2009 (has links) This thesis analyzes the use of hidden Markov models (HMM) in wireless communication systems. HMMs are a probabilistic method which is useful for discrete channel modeling. The simulations done in the thesis verified a previously formulated methodology. Power delay profiles (PDP) of twelve wireless receivers were used for the experiment. To reduce the computational burden, binary HMMs were used. The PDP measurements were sampled to identify static receivers and grid-based analysis. This work is significant as it has been performed in a new environment. Stochastic game theory is analyzed to gain insight into the decision-making process of HMMs. Study of game theory is significant because it analyzes rational decisions in detail by attaching risk and reward to every possibility. Network security situation awareness has emerged as a novel application of HMMs in wireless networking. The dually stochastic nature of HMMs is applied in this process for behavioral analysis of network intrusion. The similarity of HMMs to artificial neural networks makes it useful for such applications. This application was performed using simulations similar to the original works. / Master of Science Stochastic Games Position Location Hidden Markov Model Network Security Situation Awareness
5	Algorithms For Stochastic Games And Service Systems Prasad, H L 05 1900 (has links) (PDF) This thesis is organized into two parts, one for my main area of research in the field of stochastic games, and the other for my contributions in the area of service systems. We first provide an abstract for my work in stochastic games. The field of stochastic games has been actively pursued over the last seven decades because of several of its important applications in oligopolistic economics. In the past, zero-sum stochastic games have been modelled and solved for Nash equilibria using the standard techniques of Markov decision processes. General-sum stochastic games on the contrary have posed difficulty as they cannot be reduced to Markov decision processes. Over the past few decades the quest for algorithms to compute Nash equilibria in general-sum stochastic games has intensified and several important algorithms such as stochastic tracing procedure [Herings and Peeters, 2004], NashQ [Hu and Wellman, 2003], FFQ [Littman, 2001], etc., and their generalised representations such as the optimization problem formulations for various reward structures [Filar and Vrieze, 1997] have been proposed. However, they suffer from either lack of generality or are intractable for even medium sized problems or both. In our venture towards algorithms for stochastic games, we start with a non-linear optimization problem and then design a simple gradient descent procedure for the same. Though this procedure gives the Nash equilibrium for a sample problem of terrain exploration, we observe that, in general, it need not be true. We characterize the necessary conditions and define KKT-N point. KKT-N points are those Karush-Kuhn-Tucker (KKT) points which corresponding to Nash equilibria. Thus, for a simple gradient based algorithm to guarantee convergence to Nash equilibrium, all KKT points of the optimization problem need to be KKT-N points, which restricts the applicability of such algorithms. We then take a step back and start looking at better characterization of those points of the optimization problem which correspond to Nash equilibria of the underlying game. As a result of this exploration, we derive two sets of necessary and sufficient conditions. The first set, KKT-SP conditions, is inspired from KKT conditions itself and is obtained by breaking down the main optimization problem into several sub-problems and then applying KKT conditions to each one of those sub-problems. The second set, SG-SP conditions, is a simplified set of conditions which characterize those Nash points more compactly. Using both KKT-SP and SG-SP conditions, we propose three algorithms, OFF-SGSP, ON-SGSP and DON-SGSP, respectively, which we show provide Nash equilibrium strategies for general-sum discounted stochastic games. Here OFF-SGSP is an off-line algorithm while ONSGSP and DON-SGSP are on-line algorithms. In particular, we believe that DON-SGSP is the first decentralized on-line algorithm for general-sum discounted stochastic games. We show that both our on-line algorithms are computationally efficient. In fact, we show that DON-SGSP is not only applicable for multi-agent scenarios but is also directly applicable for the single-agent case, i.e., MDPs (Markov Decision Processes). The second part of the thesis focuses on formulating and solving the problem of minimizing the labour-cost in service systems. We define the setting of service systems and then model the labour-cost problem as a constrained discrete parameter Markov-cost process. This Markov process is parametrized by the number of workers in various shifts and with various skill levels. With the number of workers as optimization variables, we provide a detailed formulation of a constrained optimization problem where the objective is the expected long-run averages of the single-stage labour-costs, and the main set of constraints are the expected long-run average of aggregate SLAs (Service Level Agreements). For this constrained optimization problem, we provide two stochastic optimization algorithms, SASOC-SF-N and SASOC-SF-C, which use smoothed functional approaches to estimate gradient and perform gradient descent in the aforementioned constrained optimization problem. SASOC-SF-N uses Gaussian distribution for smoothing while SASOC-SF-C uses Cauchy distribution for the same. SASOC-SF-C is the first Cauchy based smoothing algorithm which requires a fixed number (two) of simulations independent of the number of optimization variables. We show that these algorithms provide an order of magnitude better performance than existing industrial standard tool, OptQuest. We also show that SASOC-SF-C gives overall better performance. Algorithms Stochastic Games Stochastic Games - Algorithms Nash Equilibrium Computation Gradient Descent Schemes Markov Decision Processes Markov Cost Process Labour Costs - Modelling Labor Cost Optimization Nash Equilibria Game Theory
6	Automatic verification of competitive stochastic systems Simaitis, Aistis January 2014 (has links) In this thesis we present a framework for automatic formal analysis of competitive stochastic systems, such as sensor networks, decentralised resource management schemes or distributed user-centric environments. We model such systems as stochastic multi-player games, which are turn-based models where an action in each state is chosen by one of the players or according to a probability distribution. The specifications, such as “sensors 1 and 2 can collaborate to detect the target with probability 1, no matter what other sensors in the network do” or “the controller can ensure that the energy used is less than 75 mJ, and the algorithm terminates with probability at least 0.5'', are provided as temporal logic formulae. We introduce a branching-time temporal logic rPATL and its multi-objective extension to specify such probabilistic and reward-based properties of stochastic multi-player games. We also provide algorithms for these logics that can either verify such properties against the model, providing a yes/no answer, or perform strategy synthesis by constructing the strategy for the players that satisfies the specification. We conduct a detailed complexity analysis of the model checking problem for rPATL and its multi-objective extension and provide efficient algorithms for verification and strategy synthesis. We also implement the proposed techniques in the PRISM-games tool and apply them to the analysis of several case studies of competitive stochastic systems. 519
7	Les mécanismes d'incitation à la coopération dans les réseaux tolérants aux délais / Incentive Mechanisms For Cooperation In Delay Tolerant Networks Nguyen, Thi Thu Hang 04 December 2018 (has links) Les réseaux tolérants aux retards (DTN) ont été conçus pour fournir un moyen de communication durable entre terminaux mobiles dans les régions dépourvues d’infrastructure cellulaire. Dans de tels réseaux, l’ensemble des voisins de chaque nœud change au fil du temps en raison de la mobilité des nœuds, ce qui entraîne une connectivité intermittente et des routes instables dans le réseau. Nous analysons la performance d’un système d’incitation pour les DTN à deux sauts dans lequel une source en arriéré offre une récompense fixe aux relais pour délivrer un message. Un seul message à la fois est proposé par la source. Pour un message donné, seul le premier relais à le délivrer reçoit la récompense correspondant à ce message, induisant ainsi une compétition entre les relais. Les relais cherchent à maximiser la récompense attendue pour chaque message alors que l’objectif de la source est de satisfaire une contrainte donnée sur la probabilité de livraison du message. Nous considérons deux réglages différents : l’un dans lequel la source indique aux relais pendant combien de temps un message est en circulation, et l’autre dans lequel la source ne donne pas cette information. Dans le premier paramètre, nous montrons que la politique optimale d’un relais est de type seuil : il accepte un message jusqu’à un premier seuil et le conserve jusqu’à ce qu’il atteigne la destination ou le deuxième seuil. Les formules de calcul des seuils ainsi que de la probabilité de livraison des messages sont dérivées pour une source d’arriérés. Nous étudions ensuite la performance asymptotique de ce réglage dans la limite moyenne du champ. Lorsque le deuxième seuil est infini, nous donnons l’ODE du champ moyen et montrons que tous les messages ont la même probabilité de réussite. Lorsque le deuxième seuil est fini, nous ne donnons qu’une approximation ODE car dans ce cas, la dynamique n’est pas markovienne. Pour le second réglage, nous supposons que la source propose chaque message pour une période de temps fixe et qu’un relais décide d’accepter un message selon une politique randomisée lors d’une rencontre avec la source. S’il accepte le message, un relais le garde jusqu’à ce qu’il atteigne la destina- tion. Nous établissons dans quelle condition la probabilité d’acceptation des relais est strictement positive et montrons que, dans cette condition, il existe un équilibre de Nash symétrique unique, dans lequel aucun relais n’a quelque chose à gagner en changeant unilatéralement sa probabilité d’acceptation. Des expressions explicites pour la probabilité de livraison du message et le temps moyen de livraison d’un message à l’équilibre symétrique de Nash sont dérivées, ainsi qu’une expression de la valeur asymptotique de la livraison du message. Enfin, nous présentons de nombreux résultats de simulations pour com- parer les performances de la stratégie de type seuil et de la stratégie ran- domisée, afin de déterminer dans quelle condition il est rentable pour la source de donner l’information sur l’âge d’un message aux relais. / Delay-Tolerant Networks (DTNs) were designed to provide a sustainable means of communication between mobile terminals in regions without cellular infrastructure. In such networks, the set of neighbors of every node changes over time due to the mobility of nodes, resulting in intermittent connectivity and unstable routes in the network. We analyze the performance of an incentive scheme for two-hop DTNs in which a backlogged source pro- poses a fixed reward to the relays to deliver a message. Only one message at a time is proposed by the source. For a given message, only the first relay to deliver it gets the reward corresponding to this message thereby inducing a competition between the relays. The relays seek to maximize the expected reward for each message whereas the objective of the source is to satisfy a given constraint on the probability of message delivery. We consider two different settings: one in which the source tells the relays for how long a message is in circulation, and one in which the source does not give this information. In the first setting, we show that the optimal policy of a relay is of thresh- old type: it accepts a message until a first threshold and then keeps the message until it either meets the destination or reaches the second threshold. Formulas for computing the thresholds as well as probability of message delivery are derived for a backlogged source. We then investigate the asymptotic performance of this setting in the mean field limit. When the second thresh- old in infinite, we give the mean-field ODE and show that all the messages have the same probability of successful delivery. When the second threshold is finite we only give an ODE approximation since in this case the dynamics are not Markovian. For the second setting, we assume that the source proposes each message for a fixed period of time and that a relay decides to accept a message accord- ing to a randomized policy upon encounter with the source. If it accepts the message, a relay keeps it until it reaches the destination. We establish under which condition the acceptance probability of the relays is strictly positive and show that, under this condition, there exists a unique symmetric Nash equilibrium, in which no relay has anything to gain by unilaterally changing its acceptance probability. Explicit expressions for the probability of message delivery and the mean time to deliver a message at the symmetric Nash equilibrium are derived, as well as an expression of the asymptotic value of message delivery. Finally, we present numerous simulations results to compare performances of the threshold-type strategy and the randomized strategy, in order to determine under which condition it is profitable for the source to give the information on the age of a message to the relays Jeux stochastiques Réseaux de communication mobile Réseaux tolérants aux délais Stochastic games Mobile communication networks Delay Tolerant Networks 621.382 629.8 004.65
8	Playing is believing: the role of beliefs in multi-agent learning Chang, Yu-Han, Kaelbling, Leslie P. 01 1900 (has links) We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms and discuss some insights that can be gained. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the long-run against fair opponents. / Singapore-MIT Alliance (SMA) multi-agent learning algorithm repeated games belief game theory Matrix games Nash equilibrium Stochastic games Reinforcement learning PHC-Exploiter
9	Infinite-state Stochastic and Parameterized Systems Ben Henda, Noomene January 2008 (has links) A major current challenge consists in extending formal methods in order to handle infinite-state systems. Infiniteness stems from the fact that the system operates on unbounded data structure such as stacks, queues, clocks, integers; as well as parameterization. Systems with unbounded data structure are natural models for reasoning about communication protocols, concurrent programs, real-time systems, etc. While parameterized systems are more suitable if the system consists of an arbitrary number of identical processes which is the case for cache coherence protocols, distributed algorithms and so forth. In this thesis, we consider model checking problems for certain fundamental classes of probabilistic infinite-state systems, as well as the verification of safety properties in parameterized systems. First, we consider probabilistic systems with unbounded data structures. In particular, we study probabilistic extensions of Lossy Channel Systems (PLCS), Vector addition Systems with States (PVASS) and Noisy Turing Machine (PNTM). We show how we can describe the semantics of such models by infinite-state Markov chains; and then define certain abstract properties, which allow model checking several qualitative and quantitative problems. Then, we consider parameterized systems and provide a method which allows checking safety for several classes that differ in the topologies (linear or tree) and the semantics (atomic or non-atomic). The method is based on deriving an over-approximation which allows the use of a symbolic backward reachability scheme. For each class, the over-approximation we define guarantees monotonicity of the induced approximate transition system with respect to an appropriate order. This property is convenient in the sense that it preserves upward closedness when computing sets of predecessors. program verification model checking stochastic games infinite-state systems Markov chains reachability repeated reachability parameterized systems approximation safety tree systems
10	Some links between discrete and continuous aspects in dynamic games / Quelques liens entre aspects discrets et continus dans jeux dynamiques Maldonado Lopez, Juan Pablo 04 November 2014 (has links) Cette thèse étudie les liens entre a) les jeux en temps discret et continu, et b) les jeux à très grand nombre de joueurs identiques et les jeux avec un continuum de joueurs. Une motivation pour ces sujets ainsi que les contributions principales de cette thèse sont présentées dans le Chapitre 1. Le reste de la thèse est organisé en trois parties. La Partie I étudie les jeux différentiels à somme nulle et à deux joueurs. Nous décrivons dans le Chapitre 3 trois approches qui ont été proposées dans la littérature pour établir l’existence de la valeur dans les jeux différentiels à deux joueurs et à somme nulle, en soulignant les liens qui existent entre elles. Nous fournissons dans le Chapitre 4 une démonstration de l’existence de la valeur à l’aide d’une description explicite des stratégies ε optimales. Le Chapitre 5 établit l’équivalence entre les solutions de minimax et les solutions de viscosité pour les équations de Hamilton-Jacobi-Isaacs. La Partie II porte sur les jeux à champ moyen en temps discret. L’espace d’action est supposé compact dans le Chapitre 6, et fini dans le Chapitre 7. Dans les deux cas, nous obtenons l’existence d’un ε- équilibre de Nash pour un jeu stochastique avec un nombre fini de joueurs identiques, où le terme d’approximation tend vers zéro lorsque le nombre de joueurs augmente. Nous obtenons dans le Chapitre 7 des bornes d’erreur explicites, ainsi que l’existence d’un ε-équilibre de Nash pour un jeu stochastique à durée d’étape évanescente et à un nombre fini de joueurs identiques. Dans ce cas, le terme d’approximation est fonction à la fois du nombre de joueurs et de la durée d’étape. Enfin, la Partie III porte sur les jeux stochastiques à durée d’étape évanescente, qui sont décrits dans le Chapitre 8. Il s’agit de jeux où un paramètre évolue selon une chaîne de Markov en temps continu, tandis que les joueurs choisissent leurs actions à des dates discrètes. La dynamique en temps continu dépend des actions des joueurs. Nous considérons trois évaluations différentes pour le paiement et deux structures d’information : dans un cas, les joueurs observent les actions passées et le paramètre, et dans l’autre, seules les actions passées sont observées. / In this thesis we describe some links between a) discrete and continuous time games and b) games with finitely many players and games with a continuum of players. A motivation to the subject and the main contributions are outlined in Chapter 2. The rest of the thesis is organized in three parts: Part I is devoted to differential games, describing the different approaches for establishing the existence of the value of two player, zero sum differential games in Chapter 3 and pointing out connections between them. In Chapter 4 we provide a proof of the existence of the value using an explicit description of ε-optimal strategies and a proof of the equivalence of minimax solutions and viscosity solutions for Hamilton-Jacobi-Isaacs equations in Chapter 5. Part II concerns discrete time mean field games. We study two models with different assumptions, in particular, in Chapter 6 we consider a compact action space while in Chapter 7 the action space is finite. In both cases we derive the existence of an ε-Nash equilibrium for a stochastic game with finitely many identical players, where the approximation error vanishes as the number of players increases. We obtain explicit error bounds in Chapter 7 where we also obtain the existence of an ε-Nash equilibrium for a stochastic game with short stage duration and finitely many identical players, with the approximation error depending both on the number of players and the duration of the stage. Part III is concerned with two player, zero sum stochastic games with short stage duration, described in Chapter 8. These are games where a parameter evolves following a continuous time Markov chain, while the players choose their actions at the nodes of a given partition of the positive real axis. The continuous time dynamics of the parameter depends on the actions of the players. We consider three different evaluations for the payoff and two different information structures: when players observe the past actions and the parameter and when players observe past actions but not the parameter. Jeux à somme nulle Théorie des jeux Jeux différentiels Jeux à champs moyen Jeux stochastiques Jeux dynamiques Differential games Stochastic games 510

Search results