Spelling suggestions: "subject:"markov decision processes."" "subject:"darkov decision processes.""
21 |
Troubleshooting Trucks : Automated Planning and Diagnosis / Felsökning av lastbilar : automatiserad planering och diagnosWarnquist, Håkan January 2015 (has links)
This thesis considers computer-assisted troubleshooting of heavy vehicles such as trucks and buses. In this setting, the person that is troubleshooting a vehicle problem is assisted by a computer that is capable of listing possible faults that can explain the problem and gives recommendations of which actions to take in order to solve the problem such that the expected cost of restoring the vehicle is low. To achieve this, such a system must be capable of solving two problems: the diagnosis problem of finding which the possible faults are and the decision problem of deciding which action should be taken. The diagnosis problem has been approached using Bayesian network models. Frameworks have been developed for the case when the vehicle is in the workshop only and for remote diagnosis when the vehicle is monitored during longer periods of time. The decision problem has been solved by creating planners that select actions such that the expected cost of repairing the vehicle is minimized. New methods, algorithms, and models have been developed for improving the performance of the planner. The theory developed has been evaluated on models of an auxiliary braking system, a fuel injection system, and an engine temperature control and monitoring system.
|
22 |
Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worldsCrook, Paul A. January 2007 (has links)
In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l).
|
23 |
Oligopolies in private spectrum commons: analysis and regulatory implicationsKavurmacioglu, Emir 17 February 2016 (has links)
In an effort to make more spectrum available, recent initiatives by the FCC let mobile providers offer spot service of their licensed spectrum to secondary users, hence paving the way to dynamic secondary spectrum markets. This dissertation investigates secondary spectrum markets under different regulatory regimes by identifying profitability conditions and possible competitive outcomes in an oligopoly model. We consider pricing in a market where multiple providers compete for secondary demand.
First, we analyze the market outcomes when providers adopt a coordinated access policy, where, besides pricing, a provider can elect to apply admission control on secondary users based on the state of its network. We next consider a competition when providers implement an uncoordinated access policy (i.e., no admission control). Through our analysis, we identify profitability conditions and fundamental price thresholds, including break-even and market sharing prices. We prove that regardless of the specific form of the secondary demand function, competition under coordinated access always leads to a price war outcome. In contrast, under uncoordinated access, market sharing becomes a viable market outcome if the intervals of prices for which the providers are willing to share the market overlap.
We then turn our attention to how a network provider use carrier (spectrum) aggregation in order to lower its break-even price and gain an edge over its competition. To this end, we determine the optimal (minimum) level of carrier aggregation that a smaller provider needs. Under a quality-driven (QD) regime, we establish an efficient way of numerically calculating the optimal carrier aggregation and derive scaling laws. We extend the results to delay-related metrics and show their applications to profitable pricing in secondary spectrum markets.
Finally, we consider the problem of profitability over a spatial topology, where identifying system behavior suffers from the curse of dimensionality. Hence, we propose an approximation model that captures system behavior to the first-order and provide an expression to calculate the break-even price at each network location and provide simulation results for accuracy comparison. All of our results hold for general forms of demand, thus avoid restricting assumptions of customer preferences and the valuation of the spectrum.
|
24 |
Representações compactas para processos de decisão de Markov e sua aplicação na adminsitração de impressoras. / Compact representations of Markov decision processes and their application to printer management.Torres, João Vitor 02 June 2006 (has links)
Os Processos de Decisão de Markov (PDMs) são uma importante ferramenta de planejamento e otimização em ambientes que envolvem incertezas. Contudo a especificação e representação computacional das distribuições de probabilidades subjacentes a PDMs é uma das principais dificuldades de utilização desta ferramenta. Este trabalho propõe duas estratégias para representação destas probabilidades de forma compacta e eficiente. Estas estratégias utilizam redes Bayesianas e regularidades entre os estados e as variáveis. As estratégias apresentadas são especialmente úteis em sistemas onde as variáveis têm muitas categorias e possuem forte inter-relação. Além disso, é apresentada a aplicação destes modelos no gerenciamento de grupos de impressoras (um problema real da indústria e que motivou o desenvolvimento do trabalho) permitindo que estas atuem coletiva e não individualmente. O último tópico discutido é uma análise comparativa da mesma aplicação utilizando Lógica Difusa. / Markov Decision Processes (MDPs) are an important tool for planning and optimization in environments under uncertainty. The specification and computational representation of the probability distributions underlying MDPs are central difficulties for their application. This work proposes two strategies for representation of probabilities in a compact and efficient way. These strategies use Bayesian networks and regularities among states and variables. The proposed strategies are particularly useful in systems whose variables have many categories and have strong interrelation. This proposal has been applied to the management of clusters of printers, a real problem that in fact motivated the work. Markov Decision Processes are then used to allow printers to act as a group, and not just individually. The work also presents a comparison between MDPs and Fuzzy Logic in the context of clusters of printers.
|
25 |
Markov Decision Processes and ARIMA models to analyze and predict Ice Hockey player’s performanceSans Fuentes, Carles January 2019 (has links)
In this thesis, player’s performance on ice hockey is modelled to create newmetricsby match and season for players. AD-trees have been used to summarize ice hockey matches using state variables, which combine context and action variables to estimate the impact of each action under that specific state using Markov Decision Processes. With that, an impact measure has been described and four player metrics have been derived by match for regular seasons 2007-2008 and 2008-2009. General analysis has been performed for these metrics and ARIMA models have been used to analyze and predict players performance. The best prediction achieved in the modelling is the mean of the previous matches. The combination of several metrics including the ones created in this thesis could be combined to evaluate player’s performance using salary ranges to indicate whether a player is worth hiring/maintaining/firing
|
26 |
Learning Average Reward Irreducible Stochastic Games: Analysis and ApplicationsLi, Jun, 13 November 2003 (has links)
A large class of sequential decision making problems under uncertainty with multiple competing decision makers/agents can be modeled as stochastic games. Stochastic games having Markov properties are called Markov games or competitive Markov decision processes. This dissertation presents an approach to solve non cooperative stochastic games, in which each decision maker makes her/his own decision independently and each has an individual payoff function. In stochastic games, the environment is nonstationary and each agent's payoff is affected by joint decisions of all agents, which results in the conflict of interest among the decision makers.
In this research, the theory of Markov decision processes (MDPs) is combined with the game theory to analyze the structure of Nash equilibrium for stochastic games. In particular, the Laurent series expansion technique is used to extend the results of discounted reward stochastic games to average reward stochastic games. As a result, auxiliary matrix games are developed that have equivalent equilibrium points and values to a class of stochastic games that are irreducible and have average reward performance metric.
R-learning is a well known machine learning algorithm that deals with average reward MDPs. The R-learning algorithm is extended to develop a Nash-R reinforcement learning algorithm for obtaining the equivalent auxiliary matrices. A convergence analysis of the Nash-R algorithm is developed from the study of the asymptotic behavior of its two time scale stochastic approximation scheme, and the stability of the associated ordinary differential equations (ODEs). The Nash-R learning algorithm is tested and then benchmarked with MDP based learning methods using a well known grid game.
Subsequently, a real life application of stochastic games in deregulated power market is explored. According to the current literature, Cournot, Bertrand, and Supply Function Equilibrium (SFEs) are the three primary equilibrium models that are used to evaluate the power market designs. SFE is more realistic for pool type power markets. However, for a complicated power system, the convex assumption for optimization problems is violated in most cases, which makes the problems more difficult to solve. The SFE concept in adopted in this research, and the generators' behaviors are modeled as a stochastic game instead of one shot game. The power market is considered to have features such as multi-settlement (bilateral, day-ahead market, spot markets and transmission congestion contracts), and demand elasticity. Such a market consisting of multiple competing suppliers (generators) is modeled as a competitive Markov decision processes and is studied using the Nash-R algorithm.
|
27 |
Decision-Theoretic Planning under Risk-Sensitive Planning ObjectivesLiu, Yaxin 18 April 2005 (has links)
Risk attitudes are important for human decision making, especially in scenarios where huge wins or losses are possible, as exemplified by planetary rover navigation, oilspill response, and business applications. Decision-theoretic planners therefore need to take risk aspects into account to serve their users better. However, most existing decision-theoretic planners use simplistic planning objectives that are risk-neutral. The thesis research is the first comprehensive study of how to incorporate risk attitudes into decision-theoretic planners and solve large-scale planning problems represented as Markov decision process models. The thesis consists of three parts.
The first part of the thesis work studies risk-sensitive planning in case where exponential utility functions are used to model risk attitudes. I show that existing decision-theoretic planners can be transformed to take risk attitudes into account. Moreover, different versions of the transformation are needed if the transition probabilities are implicitly given, namely, temporally extended probabilities and probabilities given in a factored form.
The second part of the thesis work studies risk-sensitive planning in case where general nonlinear utility functions are used to model risk attitudes. I show that a state-augmentation approach can be used to reduce a risk-sensitive planning problem to a risk-neutral planning problem with an augmented state space. I further use a functional interpretation of value functions and approximation methods to solve the planning problems efficiently with value iteration. I also show an exact method for solving risk-sensitive planning problems where one-switch utility functions are used to model risk attitudes.
The third part of the thesis work studies risk sensitive planning in case where arbitrary rewards are used. I propose a spectrum of conditions that can be used to constrain the utility function and the planning problem so that the optimal expected utilities exist and are finite. I prove that the existence and finiteness properties hold for stationary plans, where the action to perform in each state does not change over time, under different sets of conditions.
|
28 |
Computational techniques for reasoning about and shaping player experiences in interactive narrativesRoberts, David L. 06 April 2010 (has links)
Interactive narratives are marked by two characteristics: 1) a space of player interactions, some subset of which are specified as aesthetic goals for the system; and 2) the affordance for players to express self-agency and have meaningful interactions. As a result, players are (often unknowing) participants in the creation of the experience. They cannot be assumed to be cooperative, nor adversarial. Thus, we must provide paradigms to designers that enable them to work with players to co-create experiences without transferring the system's goals (specified by authors) to players and without systems having a model of players' behaviors. This dissertation formalizes compact representations and efficient algorithms that enable computer systems to represent, reason about, and shape player experiences in interactive narratives.
Early work on interactive narratives relied heavily on "script-and-trigger" systems, requiring sizable engineering efforts from designers to provide concrete instructions for when and how systems can modify an environment to provide a narrative experience for players. While there have been advances in techniques for representing and reasoning about narratives at an abstract level that automate the trigger side of script-and-trigger systems, few techniques have reduced the need for scripting system adaptations or reconfigurations---one of the contributions of this dissertation.
We first describe a decomposition of the design process for interactive narrative into three technical problems: goal selection, action/plan selection/generation, and action/plan refinement. This decomposition allows techniques to be developed for reasoning about the complete implementation of an interactive narrative. We then describe representational and algorithmic solutions to these problems: a Markov Decision Process-based formalism for goal selection, a schema-based planning architecture using theories of influence from social psychology for action/plan selection/generation, and a natural language-based template system for action/plan refinement. To evaluate these techniques, we conduct simulation experiments and human subjects experiments in an interactive story.
Using these techniques realizes the following three goals: 1) efficient algorithmic support for authoring interactive narratives; 2) design a paradigm for AI systems to reason and act to shape player experiences based on author-specified aesthetic goals; and 3) accomplish (1) and (2) with players feeling more engaged and without perceiving a decrease in self-agency.
|
29 |
Planning and scheduling problems in manufacturing systems with high degree of resource degradationAgrawal, Rakshita 07 August 2009 (has links)
The term resource is used to refer to a machine, tool-group, piece of equipment or personnel. Optimization models for resource maintenance are obtained in conjunction with other production related decisions like production planning, production scheduling, resource allocation and job inspection. Emphasis is laid on integrating the above inter-dependent decisions into a unified optimization framework. This is accomplished for large stationary resources, small non-stationary resources with high breaking rate and for resources that form a part of a network.
Owing to large problem size and high uncertainty, the optimal decisions are determined by formulating and solving the above problems as Markov decision processes (MDPs). Approximate dynamic programming based algorithms are used for solving the large optimization problems at hand. The performance of resulting near optimal policies is compared with that of traditional formulations in all cases. The latter treat the resource maintenance decisions independent of other manufacturing related decisions.
In certain formulations, the resource state is not completely observable. This results in a partially observable MDP (POMDP). An alternative algorithm for the solution of POMDP is developed, where several mixed integer linear programs (MILPs) are solved during each ADP iteration. This helps obtain better quality solutions for the POMDPs with very large or continuous action spaces in an efficient manner.
|
30 |
Scaling solutions to Markov Decision ProblemsZang, Peng 14 November 2011 (has links)
The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider.
In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors.
|
Page generated in 0.5688 seconds