The purpose of this dissertation is to apply behavior learning concepts to incomplete-information continuous time games. Realistic game scenarios are often incomplete-information games in which the players withhold information. A player may not know its opponent’s objectives and strategies prior to the start of the game. This lack of information can limit the player’s ability to play optimally. If the player can observe the opponent’s actions, it can better optimize its achievements by taking corrective actions.
In this research, a framework to learn an opponent’s behavior and take corrective actions is developed. The framework will allow a player to observe the opponent’s actions and formulate behavior models. The developed behavior model can then be utilized to find the best actions for the player that optimizes the player’s objective function. In addition, the framework proposes that the player plays a safe strategy at the beginning of the game. A safe strategy is defined in this research as a strategy that guarantees a minimum pay-off to the player independent of the other player’s actions. During the initial part of the game, the player will play the safe strategy until it learns the opponent’s behavior.
Two methods to develop behavior models that differ in the formulation of the behavior model are proposed. The first method is the Cost-Strategy Recognition (CSR) method in which the player formulates an objective function and a strategy for the opponent. The opponent is presumed to be rational and therefore will play to optimize its objective function. The strategy of the opponent is dependent on the information available to the opponent about other players in the game. A strategy formulation presumes a certain level of information available to the opponent. The previous observations of the opponent’s actions are used to estimate the parameters of the formulated behavior model. The estimated behavior model predicts the opponent’s future actions.
The second method is the Direct Approximation of Value Function (DAVF) method. In this method, unlike the CSR method, the player formulates an objective function for the opponent but does not formulates a strategy directly; rather, indirectly the player assumes that the opponent is playing optimally. Thus, a value function satisfying the HJB equation corresponding to the opponent’s cost function exists. The DAVF method finds an approximate solution for the value function based on previous observations of the opponent’s control. The approximate solution to the value function is then used to predict the opponent’s future behavior. Game examples in which only a single player is learning its opponent’s behavior are simulated. Subsequently, examples in which both players in a two-player game are learning each other’s behavior are simulated.
In the second part of this research, a reorientation control maneuver for a spinning spacecraft will be developed. This will aid the application of behavior learning and differential games concepts to the specific scenario involving multiple spinning spacecraft. An impulsive reorientation maneuver with coasting will be analytically designed to reorient the spin axis of the spacecraft using a single body fixed thruster. Cooperative maneuvers of multiple spacecraft optimizing fuel and relative orientation will be designed. Pareto optimality concepts will be used to arrive at mutually agreeable reorientation maneuvers for the cooperating spinning spacecraft.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/149467 |
Date | 03 October 2013 |
Creators | Satak, Neha |
Contributors | Hurtado, John E., Junkins, John L., Vadali, Srinivas R., Bhattacharyya, Shankar P. |
Source Sets | Texas A and M University |
Language | English |
Detected Language | English |
Type | Thesis, text |
Format | application/pdf |
Page generated in 0.0018 seconds