• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 22
  • 17
  • 6
  • 6
  • 5
  • 1
  • Tagged with
  • 136
  • 136
  • 112
  • 39
  • 27
  • 22
  • 22
  • 21
  • 19
  • 18
  • 17
  • 17
  • 17
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Markov Decision Processes and ARIMA models to analyze and predict Ice Hockey player’s performance

Sans Fuentes, Carles January 2019 (has links)
In this thesis, player’s performance on ice hockey is modelled to create newmetricsby match and season for players. AD-trees have been used to summarize ice hockey matches using state variables, which combine context and action variables to estimate the impact of each action under that specific state using Markov Decision Processes. With that, an impact measure has been described and four player metrics have been derived by match for regular seasons 2007-2008 and 2008-2009. General analysis has been performed for these metrics and ARIMA models have been used to analyze and predict players performance. The best prediction achieved in the modelling is the mean of the previous matches. The combination of several metrics including the ones created in this thesis could be combined to evaluate player’s performance using salary ranges to indicate whether a player is worth hiring/maintaining/firing
32

Learning Average Reward Irreducible Stochastic Games: Analysis and Applications

Li, Jun, 13 November 2003 (has links)
A large class of sequential decision making problems under uncertainty with multiple competing decision makers/agents can be modeled as stochastic games. Stochastic games having Markov properties are called Markov games or competitive Markov decision processes. This dissertation presents an approach to solve non cooperative stochastic games, in which each decision maker makes her/his own decision independently and each has an individual payoff function. In stochastic games, the environment is nonstationary and each agent's payoff is affected by joint decisions of all agents, which results in the conflict of interest among the decision makers. In this research, the theory of Markov decision processes (MDPs) is combined with the game theory to analyze the structure of Nash equilibrium for stochastic games. In particular, the Laurent series expansion technique is used to extend the results of discounted reward stochastic games to average reward stochastic games. As a result, auxiliary matrix games are developed that have equivalent equilibrium points and values to a class of stochastic games that are irreducible and have average reward performance metric. R-learning is a well known machine learning algorithm that deals with average reward MDPs. The R-learning algorithm is extended to develop a Nash-R reinforcement learning algorithm for obtaining the equivalent auxiliary matrices. A convergence analysis of the Nash-R algorithm is developed from the study of the asymptotic behavior of its two time scale stochastic approximation scheme, and the stability of the associated ordinary differential equations (ODEs). The Nash-R learning algorithm is tested and then benchmarked with MDP based learning methods using a well known grid game. Subsequently, a real life application of stochastic games in deregulated power market is explored. According to the current literature, Cournot, Bertrand, and Supply Function Equilibrium (SFEs) are the three primary equilibrium models that are used to evaluate the power market designs. SFE is more realistic for pool type power markets. However, for a complicated power system, the convex assumption for optimization problems is violated in most cases, which makes the problems more difficult to solve. The SFE concept in adopted in this research, and the generators' behaviors are modeled as a stochastic game instead of one shot game. The power market is considered to have features such as multi-settlement (bilateral, day-ahead market, spot markets and transmission congestion contracts), and demand elasticity. Such a market consisting of multiple competing suppliers (generators) is modeled as a competitive Markov decision processes and is studied using the Nash-R algorithm.
33

Post-decision Processes : Consolidation and value conflicts in decision making

Shamoun, Sanny January 2004 (has links)
The studies in the present thesis focus on post-decision processes using the theoretical framework of Differentiation and Consolidation Theory. This thesis consists of three studies. In all these studies, pre-decision evaluations are compared with post-decision evaluations in order to explore differences in evaluations of decision alternatives before and after a decision. The main aim of the studies was to describe and gain a clearer and better understanding of how people re-evaluate information, following a decision for which they have experienced the decision and outcome. The studies examine how the attractiveness evaluations of important attributes are restructured from the pre-decision to the post-decision phase; particularly restructuring processes of value conflicts. Value conflict attributes are those in which information speaks against the chosen alternative in a decision. The first study investigates an important real-life decision and illustrates different post-decision (consolidation) processes following the decision. The second study tests whether decisions with value conflicts follow the same consolidation (post-decision restructuring) processes when the conflict is controlled experimentally, as in earlier studies of less controlled real-life decisions. The third study investigates consolidation and value conflicts in decisions in which the consequences are controlled and of different magnitudes. The studies in the present thesis have shown how attractiveness restructuring of attributes in conflict occurs in the post-decision phase. Results from the three studies indicated that attractiveness restructuring of attributes in conflict was stronger for important real-life decisions (Study 1) and in situations in which real consequences followed a decision (Study 3) than in more controlled, hypothetical decision situations (Study 2). Finally, some proposals for future research are suggested, including studies of the effects of outcomes and consequences on consolidation of prior decisions and how a decision maker’s involvement affects his or her pre- and post-decision processes.
34

Decision-Theoretic Planning under Risk-Sensitive Planning Objectives

Liu, Yaxin 18 April 2005 (has links)
Risk attitudes are important for human decision making, especially in scenarios where huge wins or losses are possible, as exemplified by planetary rover navigation, oilspill response, and business applications. Decision-theoretic planners therefore need to take risk aspects into account to serve their users better. However, most existing decision-theoretic planners use simplistic planning objectives that are risk-neutral. The thesis research is the first comprehensive study of how to incorporate risk attitudes into decision-theoretic planners and solve large-scale planning problems represented as Markov decision process models. The thesis consists of three parts. The first part of the thesis work studies risk-sensitive planning in case where exponential utility functions are used to model risk attitudes. I show that existing decision-theoretic planners can be transformed to take risk attitudes into account. Moreover, different versions of the transformation are needed if the transition probabilities are implicitly given, namely, temporally extended probabilities and probabilities given in a factored form. The second part of the thesis work studies risk-sensitive planning in case where general nonlinear utility functions are used to model risk attitudes. I show that a state-augmentation approach can be used to reduce a risk-sensitive planning problem to a risk-neutral planning problem with an augmented state space. I further use a functional interpretation of value functions and approximation methods to solve the planning problems efficiently with value iteration. I also show an exact method for solving risk-sensitive planning problems where one-switch utility functions are used to model risk attitudes. The third part of the thesis work studies risk sensitive planning in case where arbitrary rewards are used. I propose a spectrum of conditions that can be used to constrain the utility function and the planning problem so that the optimal expected utilities exist and are finite. I prove that the existence and finiteness properties hold for stationary plans, where the action to perform in each state does not change over time, under different sets of conditions.
35

Computational techniques for reasoning about and shaping player experiences in interactive narratives

Roberts, David L. 06 April 2010 (has links)
Interactive narratives are marked by two characteristics: 1) a space of player interactions, some subset of which are specified as aesthetic goals for the system; and 2) the affordance for players to express self-agency and have meaningful interactions. As a result, players are (often unknowing) participants in the creation of the experience. They cannot be assumed to be cooperative, nor adversarial. Thus, we must provide paradigms to designers that enable them to work with players to co-create experiences without transferring the system's goals (specified by authors) to players and without systems having a model of players' behaviors. This dissertation formalizes compact representations and efficient algorithms that enable computer systems to represent, reason about, and shape player experiences in interactive narratives. Early work on interactive narratives relied heavily on "script-and-trigger" systems, requiring sizable engineering efforts from designers to provide concrete instructions for when and how systems can modify an environment to provide a narrative experience for players. While there have been advances in techniques for representing and reasoning about narratives at an abstract level that automate the trigger side of script-and-trigger systems, few techniques have reduced the need for scripting system adaptations or reconfigurations---one of the contributions of this dissertation. We first describe a decomposition of the design process for interactive narrative into three technical problems: goal selection, action/plan selection/generation, and action/plan refinement. This decomposition allows techniques to be developed for reasoning about the complete implementation of an interactive narrative. We then describe representational and algorithmic solutions to these problems: a Markov Decision Process-based formalism for goal selection, a schema-based planning architecture using theories of influence from social psychology for action/plan selection/generation, and a natural language-based template system for action/plan refinement. To evaluate these techniques, we conduct simulation experiments and human subjects experiments in an interactive story. Using these techniques realizes the following three goals: 1) efficient algorithmic support for authoring interactive narratives; 2) design a paradigm for AI systems to reason and act to shape player experiences based on author-specified aesthetic goals; and 3) accomplish (1) and (2) with players feeling more engaged and without perceiving a decrease in self-agency.
36

Planning and scheduling problems in manufacturing systems with high degree of resource degradation

Agrawal, Rakshita 07 August 2009 (has links)
The term resource is used to refer to a machine, tool-group, piece of equipment or personnel. Optimization models for resource maintenance are obtained in conjunction with other production related decisions like production planning, production scheduling, resource allocation and job inspection. Emphasis is laid on integrating the above inter-dependent decisions into a unified optimization framework. This is accomplished for large stationary resources, small non-stationary resources with high breaking rate and for resources that form a part of a network. Owing to large problem size and high uncertainty, the optimal decisions are determined by formulating and solving the above problems as Markov decision processes (MDPs). Approximate dynamic programming based algorithms are used for solving the large optimization problems at hand. The performance of resulting near optimal policies is compared with that of traditional formulations in all cases. The latter treat the resource maintenance decisions independent of other manufacturing related decisions. In certain formulations, the resource state is not completely observable. This results in a partially observable MDP (POMDP). An alternative algorithm for the solution of POMDP is developed, where several mixed integer linear programs (MILPs) are solved during each ADP iteration. This helps obtain better quality solutions for the POMDPs with very large or continuous action spaces in an efficient manner.
37

Scaling solutions to Markov Decision Problems

Zang, Peng 14 November 2011 (has links)
The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider. In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors.
38

Texplore : temporal difference reinforcement learning for robots and time-constrained domains / Temporal difference reinforcement learning for robots and time-constrained domains

Hester, Todd 30 January 2013 (has links)
Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This dissertation identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuous state features; 3) it must handle sensor and/or actuator delays; and 4) it should continually select actions in real time. This dissertation focuses on addressing all four of these challenges. In particular, this dissertation is focused on time-constrained domains where the first challenge is critically important. In these domains, the agent's lifetime is not long enough for it to explore the domain thoroughly, and it must learn in very few samples. Although existing RL algorithms successfully address one or more of the RL for Robotics Challenges, no prior algorithm addresses all four of them. To fill this gap, this dissertation introduces TEXPLORE, the first algorithm to address all four challenges. TEXPLORE is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. Each tree in the random forest model represents a hypothesis of the domain's true dynamics, and the agent uses these hypotheses to explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, TEXPLORE can select actions continually in real time whenever necessary. We empirically evaluate each component of TEXPLORE in comparison with other state-of-the-art approaches. In addition, we present modifications of TEXPLORE's exploration mechanism for different types of domains. The key result of this dissertation is a demonstration of TEXPLORE learning to control the velocity of an autonomous vehicle on-line, in real time, while running on-board the robot. After controlling the vehicle for only two minutes, TEXPLORE is able to learn to move the pedals of the vehicle to drive at the desired velocities. The work presented in this dissertation represents an important step towards applying RL to robotics and enabling robots to perform more tasks in society. By enabling robots to learn in few actions while acting on-line in real time on robots with continuous state and actuator delays, TEXPLORE significantly broadens the applicability of RL to robots. / text
39

Multi-state Bayesian Process Control

Wang, Jue 14 January 2014 (has links)
Bayesian process control is a statistical process control (SPC) scheme that uses the posterior state probabilities as the control statistic. The key issue is to decide when to restore the process based on real-time observations. Such problems have been extensively studied in the framework of partially observable Markov decision processes (POMDP), with particular emphasis on the structure of optimal control policy. Almost all existing structural results on the optimal policies are limited to the two-state processes, where the class of control-limit policy is optimal. However, the two-state model is a gross simplification, as real production processes almost always involve multiple states. For example, a machine in the production system often has multiple failure modes differing in their effects; the deterioration process can often be divided into multiple stages with different degradation levels; the condition of a complex multi-unit system also requires a multi-state representation. We investigate the optimal control policies for multi-state processes with fixed sampling scheme, in which information about the process is represented by a belief vector within a high dimensional probability simplex. It is well known that obtaining structural results for such high-dimensional POMDP is challenging. Firstly, we prove that for an infinite-horizon process subject to multiple competing assignable causes, a so-called conditional control limit policy is optimal. The optimal policy divides the belief space into two individually connected regions, which have analytical bounds. Next, we address a finite-horizon process with at least one absorbing state and show that a structured optimal policy can be established by transforming the belief space into a polar coordinate system, where a so-called polar control limit policy is optimal. Our model is general enough to include many existing models in the literature as special cases. The structural results also lead to significantly efficient algorithms for computing the optimal policies. In addition, we characterize the condition for some out-of-control state to be more desirable than the in-control state. The existence of such counterintuitive situation indicates that multi-state process control is drastically different from the two-state case.
40

A constrained MDP-based vertical handoff decision algorithm for wireless networks

Sun, Chi 11 1900 (has links)
The 4th generation wireless communication systems aim to provide users with the convenience of seamless roaming among heterogeneous wireless access networks. To achieve this goal, the support of vertical handoff is important in mobility management. This thesis focuses on the vertical handoff decision algorithm, which determines the criteria under which vertical handoff should be performed. The problem is formulated as a constrained Markov decision process. The objective is to maximize the expected total reward of a connection subject to the expected total access cost constraint. In our model, a benefit function is used to assess the quality of the connection, and a penalty function is used to model the signaling incurred and call dropping. The user's velocity and location information are also considered when making the handoff decisions. The policy iteration and Q-learning algorithms are employed to determine the optimal policy. Structural results on the optimal vertical handoff policy are derived by using the concept of supermodularity. We show that the optimal policy is a threshold policy in bandwidth, delay, and velocity. Numerical results show that our proposed vertical handoff decision algorithm outperforms other decision schemes in a wide range of conditions such as variations on connection duration, user's velocity, user's budget, traffic type, signaling cost, and monetary access cost.

Page generated in 0.1081 seconds