Global ETD Search

11	Planification multi-agents dans un cadre markovien : les jeux stochastiques à somme générale Hamila, Mohammed Amine 03 April 2012 (has links) Planifier les actions d’un agent dans un environnement dynamique et incertain, a été largement étudié et le cadre des processus décisionnels de Markov offre les outils permettant de modéliser et de résoudre de tels problèmes. Le domaine de la théorie des jeux, a permis l’étude des interactions stratégiques entre plusieurs agents pour un jeu donné. Le cadre des jeux stochastiques, est considéré comme une généralisation du domaine des processus décisionnels de Markov et du champ de la théorie des jeux et permet de modéliser des systèmes ayant plusieurs agents et plusieurs états. Cependant, planifier dans unsystème multi-agents est considéré comme difficile, car la politique d’actions de l’agent dépend non seulement de ses choix mais aussi des politiques des autres agents. Le travail que nous présentons dans cette thèse porte sur la prise de décision distribuée dans les systèmes multi-agents. Les travaux existants dans le domaine, permettent la résolution théorique des jeux stochastiques mais imposent de fortes restrictions et font abstraction de certains problèmes cruciaux du modèle. Nous proposons un algorithme de planification décentralisée pour le modèle des jeux stochastiques, d’une part basé sur l’algorithme Value-Iteration et d’autre part basé sur la notion d’équilibre issue de la résolution des jeux matriciels. Afin d’améliorer le processus de résolution et de traiter des problèmes de taille importante, nous recherchons à faciliter la prise de décision et à limiter les possibilités d’actions à chaque étape d’interaction. L’algorithme que nous avonsproposé, a été validé sur un exemple d’interaction incluant plusieurs agents et différentes expérimentations ont été menées afin d’évaluer la qualité de la solution obtenue. / Planning agent’s actions in a dynamic and uncertain environment has been extensively studied. The framework of Markov decision process provides tools to model and solve such problems. The field of game theory has allowed the study of strategic interactions between multiple agents for a given game. The framework of stochastic games is considered as a generalization of the fields of Markov decision process and game theory. It allows to model systems with multiple agents and multiple states. However, planning in a multi-agent system is considered difficult : agent’s decisions depend not only on its actions but also on actions of the other agents. The work presented in this thesis focuses on decision making in distributed multi-agent systems. Existing works in this field allow the theoretical resolution of stochastic games but place severe restrictions and ignore some crucial problems of the model. We propose a decentralized planning algorithm for the model of stochastic games. Our proposal is based on the Value-Iteration algorithm and on the concept of Nash equilibrium. To improve the resolution process and to deal with large problems, we sought to ease decision making and limit the set of joint actions at each stage. The proposed algorithm was validated on a coordination problem including several agents and various experiments were conducted to assess the quality of the resulting solution. Jeux stochastiques Planification Équilibre de Nash Jeux matriciels Processusdécisionnels de Markov Stochastic games Planning Nash equilibrium Matrix game Mdp
12	Essays in Cooperation and Competition Mouli Modak (12476466) 29 April 2022 (has links) <p>This dissertation is a collection of three papers, each one being a chapter. The running subject of interest in all the papers is the strategic behavior of individuals in different environments. In the first chapter, I experimentally investigate collusive behavior under simultaneous interaction in multiple strategic settings, a phenomenon which I call multiple contacts. I investigate how multiple contacts impact collusive behavior when the players are symmetric or asymmetric. The second chapter is a joint work with Dr. Brian Roberson. In this chapter, we examine the role of cognitive diversity in teams on performance in a large innovation contest setting. We use a theoretical model to derive conditions under which increasing diversity can improve the performance in the large contest. Finally, in the third chapter, a joint work with Dr. Yaroslav Rosokha and Dr. Masha Shunko, we experimentally study players' behavior when they interact in an infinitely repeated environment, where the state of the world in each period is stochastic and dependent on a transition rule. Our main questions are how the transition rule impacts behavior and whether asymmetry in players impacts this.</p> <p><br></p> <p>In the first chapter, I study the phenomenon of multiple contacts using a laboratory experiment with multiple symmetric or asymmetric prisoners' dilemma games. When agents interact in multiple settings, even if defection or deviation from collusion in one setting can not be credibly punished in the same setting, it may be punishable in other settings. This can increase the incentive to collude. I observe a statistically significant increase in probability of punishment in one game after defection in another game under multiple contacts, but only when the games are asymmetric in payoffs. While punishment of defection increases in some situations, I do not find any significant increase in collusion due to multiple contacts in either symmetric or asymmetric environment. In addition to this result, to find further support for the theory which suggests that agents should use different strategies under multiple contacts, I estimate the underlying strategies that subjects use in my experiment. To this end, I modify popular strategies (e.g., Grim Trigger, Tit-for-Tat, etc.) to condition on the history observed in multiple strategic settings. I find that only for games with asymmetric payoffs subjects use these modified strategies in the presence of multiple contacts.</p> <p><br></p> <p>The second chapter is a theoretical work. In our model of large team innovation contest, teams develop an innovation using the skills or perspectives (tools) belonging to individual team members and the costly effort they provide.</p> <p>Prizes are awarded based on the values of the teams' innovations. Within a team, the team members posses different skills or perspectives (tools) which may be applied to innovation problems. For a given innovation problem and a given level of team effort, different combinations of tools within a team may generate different values for the team innovation. In this context, we examine the issues of individual team performance as a function of a team's own composition and the overall performance of the contest as a function of the compositions of the teams. We find that the question of whether increasing diversity leads to an increase in expected performance, for both an individual team and the overall contest, depends on the efficiency with which teams are able to effectively apply diverse sets of tools to innovation problems. Thus, our paper provides a channel -- other than a direct cost of diversity -- through which diversity can be beneficial or detrimental depending on how efficient teams are at utilizing diverse sets of team member tools.</p> <p><br></p> <p>The final chapter is another experimental study. We study an enviroment where individuals interact with each other in a prisoners' dilemma game repeatedly over time. However, the payoffs of the prisoners' dilemma game is decided stochastically using a transition rule. We vary the transition rule from alternation to random and study the change in subject behavior when the interaction is either symmetric or asymmetric. Our results show that in asymmetric environment, alternation can improve cooperation rates.</p> <p>With random transition rule, symmetric environment is more conducive to cooperation. We find that asymmetric environment with random transition rules performs the worst in terms of cooperation rates.</p> Experimental Economics Microeconomic Theory multimarket contact Prisoner’s dilemma game stochastic games infinitely repeated game team contest cognitive diversity asymmetry
13	Equilibria in Quitting Games and Software for the Analysis Fischer, Katharina 10 July 2013 (has links) A quitting game is an undiscounted sequential stochastic game, with finitely many players. At any stage each player has only two possible actions, continue and quit. The game ends as soon as at least one player chooses to quit. The players then receive a payoff, which depends only on the set of players that did choose to quit. If the game never ends, the payoff to each player is zero. In this thesis we give a detailed introduction to quitting games. We examine the existing results for the existence of equilibria and improve an important result from Solan and Vieille stated in their article “Quitting Games” (2001). Since there is no software for the analysis of quitting games, or for stochastic games with more than two players, we provide algorithms and programs for symmetric quitting games, for a reduction by dominance and for the detection of a pure, instant and stationary epsilon-equilibrium. info:eu-repo/classification/ddc/510 ddc:510
14	Infinite-state Stochastic and Parameterized Systems Ben Henda, Noomene January 2008 (has links) <p>A major current challenge consists in extending formal methods in order to handle infinite-state systems. Infiniteness stems from the fact that the system operates on unbounded data structure such as stacks, queues, clocks, integers; as well as parameterization.</p><p>Systems with unbounded data structure are natural models for reasoning about communication protocols, concurrent programs, real-time systems, etc. While parameterized systems are more suitable if the system consists of an arbitrary number of identical processes which is the case for cache coherence protocols, distributed algorithms and so forth. </p><p>In this thesis, we consider model checking problems for certain fundamental classes of probabilistic infinite-state systems, as well as the verification of safety properties in parameterized systems. First, we consider probabilistic systems with unbounded data structures. In particular, we study probabilistic extensions of Lossy Channel Systems (PLCS), Vector addition Systems with States (PVASS) and Noisy Turing Machine (PNTM). We show how we can describe the semantics of such models by infinite-state Markov chains; and then define certain abstract properties, which allow model checking several qualitative and quantitative problems.</p><p>Then, we consider parameterized systems and provide a method which allows checking safety for several classes that differ in the topologies (linear or tree) and the semantics (atomic or non-atomic). The method is based on deriving an over-approximation which allows the use of a symbolic backward reachability scheme. For each class, the over-approximation we define guarantees monotonicity of the induced approximate transition system with respect to an appropriate order. This property is convenient in the sense that it preserves upward closedness when computing sets of predecessors.</p> Datavetenskap program verification model checking stochastic games infinite-state systems Markov chains reachability repeated reachability parameterized systems approximation safety tree systems Datavetenskap
15	Algorithmes distribués d'allocation de ressources dans les réseaux sans fil Akbarzadeh, Sara 20 September 2010 (has links) (PDF) La connectivité totale offerte par la communication sans fil pose un grand nombre d'avantages et de défis pour les concepteurs de la future génération des réseaux sans fil. Un des principaux défis qui se posent est lié à l'interference au niveau des récepteurs. Il est bien reconnu que ce défi réside dans la conception des systèmes d'allocation des ressources qui offrent le meilleur compromis entre l'efficacité et la complexité. L'exploration de ce compromis nécessite des choix judicieux d'indicateurs de performance et des modèles mathématiques. À cet égard, cette thèse est consacrée à certains aspects techniques et mathématiques d'allocation des ressources dans les réseaux sans fil. En particulier, nous demontrons que l'allocation de ressources efficace dans les réseaux sans fil doit prendre en compte les paramètres suivants: (i) le taux de changement de l'environnement, (ii) le modèle de trafic, et (iii) la quantité d'informations disponibles aux émetteurs. Comme modeles mathématiques dans cet étude, nous utilisons la théorie d'optimisation et la théorie des jeux. Nous sommes particulièrement intéressés à l'allocation distribuée des ressources dans les réseaux avec des canaux à évanouissement lent et avec des informations partielles du canal aux émetteurs. Les émetteurs avec information partielle disposent d'informations exactes de leur propre canal ainsi que la connaissance statistique des autres canaux. Dans un tel contexte, le système est fondamentalement détérioré par une probabilité outage non nul. Nous proposons des algorithmes distribués à faible complexité d'allocation conjointe du débit et de la puissance visant à maximiser le "throughput" individuel. Distributed resource allocation Slow-fading channels paritial CSI Gamee theory Constrained optimization theory Super-modular games Bayesian/Stochastic games
16	Méthodes multigrilles pour les jeux stochastiques à deux joueurs et somme nulle, en horizon infini Detournay, Sylvie 25 September 2012 (has links) (PDF) Dans cette thèse, nous proposons des algorithmes et présentons des résultats numériques pour la résolution de jeux répétés stochastiques, à deux joueurs et somme nulle dont l'espace d'état est de grande taille. En particulier, nous considérons la classe de jeux en information complète et en horizon infini. Dans cette classe, nous distinguons d'une part le cas des jeux avec gain actualisé et d'autre part le cas des jeux avec gain moyen. Nos algorithmes, implémentés en C, sont principalement basés sur des algorithmes de type itérations sur les politiques et des méthodes multigrilles. Ces algorithmes sont appliqués soit à des équations de la programmation dynamique provenant de problèmes de jeux à deux joueurs à espace d'états fini, soit à des discrétisations d'équations de type Isaacs associées à des jeux stochastiques différentiels. Dans la première partie de cette thèse, nous proposons un algorithme qui combine l'algorithme des itérations sur les politiques pour les jeux avec gain actualisé à des méthodes de multigrilles algébriques utilisées pour la résolution des systèmes linéaires. Nous présentons des résultats numériques pour des équations d'Isaacs et des inéquations variationnelles. Nous présentons également un algorithme d'itérations sur les politiques avec raffinement de grilles dans le style de la méthode FMG. Des exemples sur des inéquations variationnelles montrent que cet algorithme améliore de façon non négligeable le temps de résolution de ces inéquations. Pour le cas des jeux avec gain moyen, nous proposons un algorithme d'itération sur les politiques pour les jeux à deux joueurs avec espaces d'états et d'actions finis, dans le cas général multichaine (c'est-à-dire sans hypothèse d'irréductibilité sur les chaînes de Markov associées aux stratégies des deux joueurs). Cet algorithme utilise une idée développée dans Cochet-Terrasson et Gaubert (2006). Cet algorithme est basé sur la notion de projecteur spectral non-linéaire d'opérateurs de la programmation dynamique de jeux à un joueur (lequel est monotone et convexe). Nous montrons que la suite des valeurs et valeurs relatives satisfont une propriété de monotonie lexicographique qui implique que l'algorithme termine en temps fini. Nous présentons des résultats numériques pour des jeux discrets provenant d'une variante des jeux de Richman et sur des problèmes de jeux de poursuite. Finalement, nous présentons de nouveaux algorithmes de multigrilles algébriques pour la résolution de systèmes linéaires singuliers particuliers. Ceux-ci apparaissent, par exemple, dans l'algorithme d'itérations sur les politiques pour les jeux stochastiques à deux joueurs et somme nulle avec gain moyen, décrit ci-dessus. Nous introduisons également une nouvelle méthode pour la recherche de mesures invariantes de chaînes de Markov irréductibles basée sur une approche de contrôle stochastique. Nous présentons un algorithme qui combine les itérations sur les politiques d'Howard et des itérations de multigrilles algébriques pour les systèmes linéaires singuliers. two-player zero-sum stochastic games policy iteration algebraic multigrid methods dynamic programming Hamilton-Jacobi equations Isaacs equations variational inequalities
17	Vector-Valued Markov Games / Vektorwertige Markov-Spiele Piskuric, Mojca 16 April 2001 (has links) (PDF) The subject of the thesis are vector-valued Markov Games. Chapter 1 presents the idea, that has led to the development of the theory of general stochastic games. The work of Lloyd S. Shapley is outlined, and the most important authors and bibliography are stated. Also, the motivation behind the research of vector-valued game-theoretic problems is presented. Chapter 2 develops a rigorous mathematical model of vector-valued N-person Markov games. The corresponding definitions are stated, and the notations, as well as the notion of a strategy are explained in detail. On the basis of these definitions a probability measure is constructed, in an appropriate probability space, which controls the stochastic game process. Furthermore, as in all models of stochastic control, a payoff is specified, in our case the expected discounted payoff. The principles of vector optimization are stated in Chapter 3, and the concept of optimality with recpect to some convex cone is developed. This leads to the generalization of Nash-equilibria from scalar- to vector-valued games, the so-called D-equilibria. Examples are provided to show, that this definition really is a generalization of the existing definitions for scalar-valued games. For a given convex cone D, necessary and sufficient conditions are found to show, when a strategy is also a D-equilibrium. Furthermore it is shown that a D-equilibrium in stationary strategies exists, as one could expect from the known results from the theory of scalar-valued stochastic games. The main result of this chapter is a generalization of an existing result for 2-person vector-valued Markov games to N-person Markov Games, namely that a D-equilibrium of an N-person Markov game is a subgradient of specially constructed support functions of the original payoff functions. To be able to develop solution procedures in the simplest case, that is, the 2-person zero-sum case, Chapter 4 introduces the Denardo dynamic programming formalism. In the space of all p-dimensional functions we define a dynamic programming operator H? to describe the solutions of Markov games. The first of the two main results in this chapter is the following: the expected overall payoff to player 1, f(??), for a fixed stationary strategy ??, is the fixed point of the operator H?. The second theorem then shows, that the latter result is exactly the vector-valued generalization of the famous Shapley result. These theorems are fundamental for the subsequent development of two algorithms, the successive approximations and the Hoffman-Karp algorithm. A numerical example for both algorithms is presented. Chapter 4 finishes with a discussion on other significant results, and the outline of the further research. The Appendix finally presents the main results from general Game Theory, most of which were used for developing both theoretic and algorithmic parts of this thesis. / Das Thema der vorliegenden Arbeit sind vektorwertige Markov-Spiele. Im Kapitel 1 wird die Idee vorgestellt, die zur Entwicklung genereller stochastischer Spiele geführt hat. Die Arbeit von Lloyd S. Shapley wird kurz dargestellt, und die wichtigsten Autoren und Literaturquellen werden genannt. Es wird weiter die Motivation für das Studium der vektorwertigen Spiele erklärt. Kapitel 2 entwickelt ein allgemeines mathematisches Modell vektorwertiger N-Personen Markov-Spiele. Die entsprechenden Definitionen werden angegeben, und es wird auf die Bezeichnungen, sowie den Begriff einer Strategie eingegangen. Weiter wird im entsprechenden Wahrscheinlichkeitsraum ein Wahrscheinlichkeitsmaß konstruiert, das den zugrunde liegenden stochastischen Prozeß steuert. Wie bei allen Modellen gesteuerter stochastischen Prozesse wird eine Auszahlung spezifiziert, konkret der erwartete diskontierte Gesamtertrag. Im Kapitel 3 werden die Prinzipien der Vektoroptimierung erläutert. Es wird der Begriff der Optimalität bezüglich gegebener konvexer Kegel entwickelt. Dieser Begriff wird weiter benutzt, um die Definition der Nash-Gleichgewichte für skalarwertige Spiele auf unser vektorwertiges Modell, die sogenannten D-Gleichgewichte, zu erweitern. Anhand mehrerer Beispiele wird gezeigt, dass diese Definition eine Verallgemeinerung der existierenden Definitionen für skalarwertige Spiele ist. Weiter werden notwendige und hinreichende Bedingungen hinsichtlich des Optimierungskegels D angegeben, wann eine Strategie ein D-Gleichgewicht ist. Anschließend wird gezeigt, dass man sich ? wie bei Markov'schen Entscheidungsprozessen und skalarwertigen stochastischen Spielen - beim Suchen der D-Gleichgewichte auf stationäre Strategien beschränken kann. Das Hauptresultat dieses Kapitels ist die Verallgemeinerung einer schon bekannten Aussage für 2-Personen Markov-Spiele auf N-Personen Markov-Spiele: Ein D-Gleichgewicht im N-Personen Markov-Spiel ist ein Subgradient speziell konstruierter Trägerfunktionen des Gesamtertrags der Spieler. Um im einfachsten Fall der Markov-Spiele, den Zwei-Personen Nullsummenspielen, ein Lösungskonzept entwickeln zu können, wird im Kapitel 4 die Methode des Dynamischen Programmierens benutzt. Es wird der Denardo-Formalismus übernommen, um einen Operator H? im Raum aller p-dimensionalen vektorwertigen Funktionen zu entwickeln. Die Haputresultate dieses Kapitels sind zwei Sätze über optimale Lösungen, bzw. D-Gleichgewichte. Der erste Satz zeigt, dass für eine fixierte stationäre Strategie ?? der erwartete diskontierte Gesamtertrag f(??) der Fixpunkt des Operators H? ist. Anschließend zeigt der zweite Satz, dass diese Lösung genau der vektorwertigen Erweiterung des Resultats von Shapley entspricht. Anhand dieser Resultate werden nun zwei Algorithmen entwickelt: sukzessive Approximationen und Hoffman-Karp-Algorithmus. Es wird ein numerisches Beispiel für beide Algorithmen berechnet. Kapitel 4 schließt mit dem Abschnitt über weitere Resultate und Ansätze für weitere Forschung. Im Anhang werden die Hauptresultate der statischen Spieltheorie vorgestellt, viele von denen werden in der vorliegenden Arbeit benutzt. Subgradient dynamische Optimierung stochastische Spiele vektorwertige Auszahlung dynamic programming stochastic games subgradient vector payoffs ddc:27 rvk:SK 860 Dynamische Optimierung Mehrpersonenspiel Stochastisches Spiel Vektoroptimierung
18	Asymmetric information games and cyber security Jones, Malachi G. 13 January 2014 (has links) A cyber-security problem is a conflict-resolution scenario that typically consists of a security system and at least two decision makers (e.g. attacker and defender) that can each have competing objectives. In this thesis, we are interested in cyber-security problems where one decision maker has superior or better information. Game theory is a well-established mathematical tool that can be used to analyze such problems and will be our tool of choice. In particular, we will formulate cyber-security problems as stochastic games with asymmetric information, where game-theoretic methods can then be applied to the problems to derive optimal policies for each decision maker. A severe limitation of considering optimal policies is that these policies are computationally prohibitive. We address the complexity issues by introducing methods, based on the ideas of model predictive control, to compute suboptimal polices. Specifically, we first prove that the methods generate suboptimal policies that have tight performance bounds. We then show that the suboptimal polices can be computed by solving a linear program online, and the complexity of the linear program remains constant with respect to the game length. Finally, we demonstrate how the suboptimal policy methods can be applied to cyber-security problems to reduce the computational complexity of forecasting cyber-attacks. Game theory Asymmetric information games Cyber security Cyber-attack forecasting Model predictive control Stochastic games Repeated games Cyber intelligence (Computer security) Decision making Game theory
19	Distributed Algorithms for Power Allocation Games on Gaussian Interference Channels Krishnachaitanya, A January 2016 (has links) (PDF) We consider a wireless communication system in which there are N transmitter-receiver pairs and each transmitter wants to communicate with its corresponding receiver. This is modelled as an interference channel. We propose power allocation algorithms for increasing the sum rate of two and three user interference channels. The channels experience fast fading and there is an average power constraint on each transmitter. In this case receivers use successive decoding under strong interference, instead of treating interference as noise all the time. Next, we u se game theoretic approach for power allocation where each receiver treats interference as noise. Each transmitter-receiver pair aims to maximize its long-term average transmission rate subject to an average power constraint. We formulate a stochastic game for this system in three different scenarios. First, we assume that each user knows all direct and crosslink channel gains. Next, we assume that each user knows channel gains of only the links that are incident on its receiver. Finally, we assume that each use r knows only its own direct link channel gain. In all cases, we formulate the problem of finding the Nash equilibrium(NE) as a variational in equality problem. For the game with complete channel knowledge, we present an algorithm to solve the VI and we provide weaker sufficient conditions for uniqueness of the NE than the sufficient conditions available in the literature. Later, we present a novel heuristic for solving the VI under general channel conditions. We also provide a distributed algorithm to compute Pare to optimal solutions for the proposed games. We use Bayesian learning that guarantees convergence to an Ɛ-Nash equilibrium for the incomplete information game with direct link channel gain knowledge only, that does not require knowledge of the power policies of other users but requires feedback of the interference power values from a receiver to its corresponding transmitter. Later, we consider a more practical scenario in which each transmitter transmits data at a certain rate using a power that depends on the channel gain to its receiver. If a receiver can successfully receive the message, it sends an acknowledgement(ACK), else it sends a negative ACK(NACK). Each user aims to maximize its probability of successful transmission. We formulate this problem as a stochastic game and propose a fully distributed learning algorithm to find a correlated equilibrium(CE). In addition, we use a no regret algorithm to find a coarse correlated equilibrium(CCE) for our power allocation game. We also propose a fully distributed learning algorithm to find a Pareto optimal solution. In general Pareto points do not guarantee fairness among the users. Therefore we also propose an algorithm to compute a Nash bargaining solution which is Pareto optimal and provides fairness among the users. Finally, we extend these results when each transmitter sends data at multiple rates rather than at a fixed rate. Gaussian Interference Channels Stochastic Games Game Theory Power Allocation Games Interference Channels Nash Equilibrium Information Games Wireless Communication Systems Distributed Algorithms Learning Equilibria Distributed Learning of Equilibria Communication Engineering
20	Vector-Valued Markov Games Piskuric, Mojca 23 April 2001 (has links) The subject of the thesis are vector-valued Markov Games. Chapter 1 presents the idea, that has led to the development of the theory of general stochastic games. The work of Lloyd S. Shapley is outlined, and the most important authors and bibliography are stated. Also, the motivation behind the research of vector-valued game-theoretic problems is presented. Chapter 2 develops a rigorous mathematical model of vector-valued N-person Markov games. The corresponding definitions are stated, and the notations, as well as the notion of a strategy are explained in detail. On the basis of these definitions a probability measure is constructed, in an appropriate probability space, which controls the stochastic game process. Furthermore, as in all models of stochastic control, a payoff is specified, in our case the expected discounted payoff. The principles of vector optimization are stated in Chapter 3, and the concept of optimality with recpect to some convex cone is developed. This leads to the generalization of Nash-equilibria from scalar- to vector-valued games, the so-called D-equilibria. Examples are provided to show, that this definition really is a generalization of the existing definitions for scalar-valued games. For a given convex cone D, necessary and sufficient conditions are found to show, when a strategy is also a D-equilibrium. Furthermore it is shown that a D-equilibrium in stationary strategies exists, as one could expect from the known results from the theory of scalar-valued stochastic games. The main result of this chapter is a generalization of an existing result for 2-person vector-valued Markov games to N-person Markov Games, namely that a D-equilibrium of an N-person Markov game is a subgradient of specially constructed support functions of the original payoff functions. To be able to develop solution procedures in the simplest case, that is, the 2-person zero-sum case, Chapter 4 introduces the Denardo dynamic programming formalism. In the space of all p-dimensional functions we define a dynamic programming operator H? to describe the solutions of Markov games. The first of the two main results in this chapter is the following: the expected overall payoff to player 1, f(??), for a fixed stationary strategy ??, is the fixed point of the operator H?. The second theorem then shows, that the latter result is exactly the vector-valued generalization of the famous Shapley result. These theorems are fundamental for the subsequent development of two algorithms, the successive approximations and the Hoffman-Karp algorithm. A numerical example for both algorithms is presented. Chapter 4 finishes with a discussion on other significant results, and the outline of the further research. The Appendix finally presents the main results from general Game Theory, most of which were used for developing both theoretic and algorithmic parts of this thesis. / Das Thema der vorliegenden Arbeit sind vektorwertige Markov-Spiele. Im Kapitel 1 wird die Idee vorgestellt, die zur Entwicklung genereller stochastischer Spiele geführt hat. Die Arbeit von Lloyd S. Shapley wird kurz dargestellt, und die wichtigsten Autoren und Literaturquellen werden genannt. Es wird weiter die Motivation für das Studium der vektorwertigen Spiele erklärt. Kapitel 2 entwickelt ein allgemeines mathematisches Modell vektorwertiger N-Personen Markov-Spiele. Die entsprechenden Definitionen werden angegeben, und es wird auf die Bezeichnungen, sowie den Begriff einer Strategie eingegangen. Weiter wird im entsprechenden Wahrscheinlichkeitsraum ein Wahrscheinlichkeitsmaß konstruiert, das den zugrunde liegenden stochastischen Prozeß steuert. Wie bei allen Modellen gesteuerter stochastischen Prozesse wird eine Auszahlung spezifiziert, konkret der erwartete diskontierte Gesamtertrag. Im Kapitel 3 werden die Prinzipien der Vektoroptimierung erläutert. Es wird der Begriff der Optimalität bezüglich gegebener konvexer Kegel entwickelt. Dieser Begriff wird weiter benutzt, um die Definition der Nash-Gleichgewichte für skalarwertige Spiele auf unser vektorwertiges Modell, die sogenannten D-Gleichgewichte, zu erweitern. Anhand mehrerer Beispiele wird gezeigt, dass diese Definition eine Verallgemeinerung der existierenden Definitionen für skalarwertige Spiele ist. Weiter werden notwendige und hinreichende Bedingungen hinsichtlich des Optimierungskegels D angegeben, wann eine Strategie ein D-Gleichgewicht ist. Anschließend wird gezeigt, dass man sich ? wie bei Markov'schen Entscheidungsprozessen und skalarwertigen stochastischen Spielen - beim Suchen der D-Gleichgewichte auf stationäre Strategien beschränken kann. Das Hauptresultat dieses Kapitels ist die Verallgemeinerung einer schon bekannten Aussage für 2-Personen Markov-Spiele auf N-Personen Markov-Spiele: Ein D-Gleichgewicht im N-Personen Markov-Spiel ist ein Subgradient speziell konstruierter Trägerfunktionen des Gesamtertrags der Spieler. Um im einfachsten Fall der Markov-Spiele, den Zwei-Personen Nullsummenspielen, ein Lösungskonzept entwickeln zu können, wird im Kapitel 4 die Methode des Dynamischen Programmierens benutzt. Es wird der Denardo-Formalismus übernommen, um einen Operator H? im Raum aller p-dimensionalen vektorwertigen Funktionen zu entwickeln. Die Haputresultate dieses Kapitels sind zwei Sätze über optimale Lösungen, bzw. D-Gleichgewichte. Der erste Satz zeigt, dass für eine fixierte stationäre Strategie ?? der erwartete diskontierte Gesamtertrag f(??) der Fixpunkt des Operators H? ist. Anschließend zeigt der zweite Satz, dass diese Lösung genau der vektorwertigen Erweiterung des Resultats von Shapley entspricht. Anhand dieser Resultate werden nun zwei Algorithmen entwickelt: sukzessive Approximationen und Hoffman-Karp-Algorithmus. Es wird ein numerisches Beispiel für beide Algorithmen berechnet. Kapitel 4 schließt mit dem Abschnitt über weitere Resultate und Ansätze für weitere Forschung. Im Anhang werden die Hauptresultate der statischen Spieltheorie vorgestellt, viele von denen werden in der vorliegenden Arbeit benutzt. info:eu-repo/classification/ddc/27 ddc:27

Search results