Spelling suggestions: "subject:"markov decision process"" "subject:"parkov decision process""
1 
Modeling exotic options with maturity extensions by stochastic dynamic programmingTapeinos, Socratis January 2009 (has links)
The exotic options that are examined in this thesis have a combination of nonstandard characteristics which can be found in shout, multicallable, pathdependent and Bermudan options. These options are called reset options. A reset option is an option which allows the holder to reset, one or more times, certain terms of the contract based on prespecified rules during the life of the option. Overall in this thesis, an attempt has been made to tackle the modeling challenges that arise from the exotic properties of the reset option embedded in segregated funds. Initially, the relevant literature was reviewed and the lack of published work, advanced enough to deal with the complexities of the reset option, was identified. Hence, there appears to be a clear and urgent need to have more sophisticated approaches which will model the reset option. The reset option on the maturity guarantee of segregated funds is formulated as a nonstationary finite horizon Markov Decision Process. The returns from the underlying asset are modeled using a discrete time approximation of the lognormal model. An Optimal Exercise Boundary of the reset option is derived where a threshold value is depicted such that if the value of the underlying asset price exceeds it then it is optimal for the policyholder to reset his maturity guarantee. Otherwise, it is optimal for the policyholder to rollover his maturity guarantee. It is noteworthy that the model is able to depict the Optimal Exercise Boundary of not just the first but of all the segregated fund contracts which can be issued throughout the planning horizon of the policyholder. The main finding of the model is that as the segregated fund contract approaches its maturity, the threshold value in the Optimal Exercise Boundary increases. However, in the last period before the maturity of the segregated fund, the threshold value decreases. The reason for this is that if the reset option is not exercised it will expire worthless. The model is then extended to re ect on the characteristics of the range of products which are traded in the market. Firstly, the issuer of the segregated fund contract is allowed to charge a management fee to the policyholder. The effect from incorporating this fee is that the policyholder requires a higher return in order to optimally reset his maturity guarantee while the total value of the segregated fund is diminished. Secondly, the maturity guarantee becomes a function of the number of times that the reset option has been exercised. The effect is that the policyholder requires a higher return in order to choose to reset his maturity guarantee while the total value of the segregated fund is diminished. Thirdly, the policyholder is allowed to reset the maturity guarantee at any point in time within each year from the start of the planning horizon, but only once. The effect is that the total value of the segregated fund is increased since the policyholder may lock in higher market gains as he has more reset decision points. In response to the well documented deficiencies of the lognormal model to capture the jumps experienced by stock markets, extensions were built which incorporate such jumps in the original model. The effect from incorporating such jumps is that the policyholder requires a higher return in order to choose to reset his maturity guarantee while the total value of the segregated fund is diminished due to the adverse effect of the negative jumps on the value of the underlying asset.

2 
Optimal threshold policy for opportunistic network coding under phase type arrivalsGunasekara, Charith 01 September 2016 (has links)
Network coding allows each node in a network to perform some coding operations on the data packets and improve the overall throughput of communication. However, network coding cannot be done unless there are enough packets to be coded so at times it may be advantageous to wait for packets to arrive.
We consider a scenario in which two wireless nodes each with its own buffer communicate via a single access point using network coding. The access point first pairs each data packet being sent from each node and then performs the network coding operation. Packets arriving at the access point that are unable to be paired are instead loaded into one of the two buffers at the access point. In the case where one of the buffers is empty and the other is not network coding is not possible. When this happens the access point must either wait for a network coding opportunity, or transmit the unpaired packet without coding. Delaying packet transmission is associated with an increased waiting cost but also allows for an increase in the overall efficiency of wireless spectrum usage, thus a decrease in packet transmission cost. Conversely, sending packets uncoded is associated with a decrease in waiting cost but also a decrease in the overall efficiency of the wireless spectrum usage. Hence, there is a tradeoff between decreasing packet delay time, and increasing the efficiency of the wireless spectrum usage.
We show that the optimal waiting policy for this system with respect to total cost, under phasetype packet arrivals, is to have a separate threshold for the buffer size that is dependent on the current phase of each arrival. We then show that the solution to this optimization problem can be obtained by treating it as a double ended pushout queueing theory problem. We develop a new technique to keep track of the packet waiting time and the number of packets waiting in the two ended pushout queue. We use the resulting queueing model to resolve the optimal threshold policy and then analyze the performance of the system using numerical approach. / October 2016

3 
Qualitative analysis of synchronizing probabilistic systems / Analyse qualitative des systèmes probabilistes synchronisantsShirmohammadi, Mahsa 10 December 2014 (has links)
Les Markov Decision Process (MDP) sont des systèmes finis probabilistes avec à la fois des choix aléatoires et des stratégies, et sont ainsi reconnus comme de puissants outils pour modéliser les interactions entre un contrôleur et les réponses aléatoires de l'environment. Mathématiquement, un MDP peut être vu comme un jeu stochastique à un joueur et demi où le contrôleur choisit à chaque tour une action et l'environment répond en choisissant un successeur selon une distribution de probabilités fixée.Il existe deux incomparables représentations du comportement d'un MDP une fois les choix de la stratégie fixés.Dans la représentation classique, un MDP est un générateur de séquences d'états, appelées stateoutcome; les conditions gagnantes du joueur sont ainsi exprimées comme des ensembles de séquences désirables d'états qui sont visités pendant le jeu, e.g. les conditions de Borel telles que l'accessibilité. La complexité des problèmes de décision ainsi que la capacité mémoire requise des stratégies gagnantes pour les conditions dites stateoutcome ont été déjà fortement étudiées.Depuis peu, les MDPs sont également considérés comme des générateurs de séquences de distributions de probabilités sur les états, appelées distributionoutcome. Nous introduisons des conditions de synchronisation sur les distributionsoutcome, qui intuitivement demandent à ce que la masse de probabilité s'accumule dans un (ensemble d') état, potentiellement de façon asymptotique.Une distribution de probabilités est psynchrone si la masse de probabilité est d'au moins p dans un état; et la séquence de distributions de probabilités est toujours, éventuellement, faiblement, ou fortement psynchrone si, respectivement toutes, certaines, infiniment plusieurs ou toutes sauf un nombre fini de distributions dans la séquence sont psynchrones.Pour chaque type de synchronisation, un MDP peut être(i) assurément gagnant si il existe une stratégie qui génère une séquence 1synchrone;(ii) presqueassurément gagnant si il existe une stratégie qui génère une séquence (1epsilon)synchrone et cela pour tout epsilon strictement positif;(iii) asymptotiquement gagnant si pour tout epsilon strictement positif, il existe une stratégie produisant une séquence (1epsilon)synchrone.Nous considérons le problème consistant à décider si un MDP est gagnant, pour chaque type de synchronisation et chaque mode gagnant: nous établissons les limites supérieures et inférieures de la complexité de ces problèmes ainsi que la capacité mémoire requise pour une stratégie gagnante optimale.En outre, nous étudions les problèmes de synchronisation pour les automates probabilistes (PAs) qui sont en fait des instances de MDP où les contrôleurs sont restreint à utiliser uniquement des stratégiesmots; c'est à dire qu'ils n'ont pas la possibilité d'observer l'historique de l'exécution du système et ne peuvent connaitre que le nombre de choix effectués jusque là. Les langages synchrones d'un PA sont donc l'ensemble des stratégiesmots synchrones: nous établissons la complexité des problèmes des langages synchrones vides et universels pour chaque mode gagnant.Nous répercutons nos résultats obtenus pour les problèmes de synchronisation sur les MDPs et PAs aux jeux tour à tour à deux joueurs ainsi qu'aux automates finis nondéterministes. En plus de nos résultats principaux, nous établissons de nouveaux résultats de complexité sur les automates finis alternants avec des alphabets à une lettre. Enfin, nous étudions plusieurs variations de synchronisation sur deux instances de systèmes infinis que sont les automates temporisés et pondérés. / Markov decision processes (MDPs) are finitestate probabilistic systems with bothstrategic and random choices, hence wellestablished to model the interactions between a controller and its randomly responding environment.An MDP can be mathematically viewed as a one and half player stochastic game played in rounds when the controller chooses an action,and the environment chooses a successor according to a fixedprobability distribution.There are two incomparable views on the behavior of an MDP, when thestrategic choices are fixed. In the traditional view, an MDP is a generator of sequence of states, called the stateoutcome; the winning condition of the player is thus expressed as a set of desired sequences of states that are visited during the game, e.g. Borel condition such as reachability.The computational complexity of related decision problems and memory requirement of winning strategies for the stateoutcome conditions are wellstudied.Recently, MDPs have been viewed as generators of sequences of probability distributions over states, calledthe distributionoutcome. We introduce synchronizing conditions defined on distributionoutcomes,which intuitively requires that the probability mass accumulates insome (group of) state(s), possibly in limit.A probability distribution is psynchronizing if the probabilitymass is at least p in some state, anda sequence of probability distributions is always, eventually,weakly, or strongly psynchronizing if respectively all, some, infinitely many, or all but finitely many distributions in the sequence are psynchronizing.For each synchronizing mode, an MDP can be (i) sure winning if there is a strategy that produces a 1synchronizing sequence; (ii) almostsure winning if there is a strategy that produces a sequence that is, for all epsilon > 0, a (1epsilon)synchronizing sequence; (iii) limitsure winning if for all epsilon > 0, there is a strategy that produces a (1epsilon)synchronizing sequence.We consider the problem of deciding whether an MDP is winning, for each synchronizing and winning mode: we establish matching upper and lower complexity bounds of the problems, as well as the memory requirementfor optimal winning strategies.As a further contribution, we study synchronization in probabilistic automata (PAs), that are kind of MDPs where controllers are restricted to use only wordstrategies; i.e. no ability to observe the history of the system execution, but the number of choices made so far.The synchronizing languages of a PA is then the set of all synchronizing wordstrategies: we establish the computational complexity of theemptiness and universality problems for all synchronizing languages in all winning modes.We carry over results for synchronizing problems from MDPs and PAs to twoplayer turnbased games and nondeterministic finite state automata. Along with the main results, we establish new complexity results foralternating finite automata over a oneletter alphabet.In addition, we study different variants of synchronization for timed andweighted automata, as two instances of infinitestate systems.

4 
Crosslayer adaptive transmission scheduling in wireless networksNgo, Minh Hanh 05 1900 (has links)
A new promising approach for wireless network optimization is from a crosslayer perspective. This thesis focuses on exploiting channel state information (CSI) from the physical layer for optimal transmission scheduling at the medium access control (MAC) layer. The first part of the thesis considers exploiting CSI via a distributed channelaware MAC protocol. The MAC protocol is analysed using a centralized design approach and a noncooperative game theoretic approach. Structural results are obtained and provably convergent stochastic approximation algorithms that can estimate the optimal transmission policies are proposed. Especially, in the game theoretic MAC formulation, it is proved that the best response transmission policies are threshold in the channel state and there exists a Nash equilibrium at which every user deploys a threshold transmission policy. This threshold result leads to a particularly efficient stochasticapproximationbased adaptive learning algorithm and a simple distributed implementation of the MAC protocol. Simulations show that the channelaware MAC protocols result in system throughputs that increase with the number of users.
The thesis also considers opportunistic transmission scheduling from the perspective of a single user using Markov Decision Process (MDP) approaches. Both channel state information and channel memory are exploited for opportunistic transmission. First, a finite horizon MDP transmission scheduling problem is considered. The finite horizon formulation is suitable for shortterm delay constraints. It is proved for the finite horizon opportunistic transmission scheduling problem that the optimal transmission policy is threshold in the buffer occupancy state and the transmission time. This twodimensional threshold structure substantially reduces the computational complexity required to compute and implement the optimal policy. Second, the opportunistic transmission scheduling problem is formulated as an infinite horizon average cost MDP with a constraint on the average waiting cost. An advantage of the infinite horizon formulation is that the optimal policy is stationary. Using the Lagrange dynamic programming theory and the supermodularity method, it is proved that the stationary optimal transmission scheduling policy is a randomized mixture of two policies that are threshold in the buffer occupancy state. A stochastic approximation algorithm and a Qlearning based algorithm that can adaptively estimate the optimal transmission scheduling policies are then proposed.

5 
Crosslayer adaptive transmission scheduling in wireless networksNgo, Minh Hanh 05 1900 (has links)
A new promising approach for wireless network optimization is from a crosslayer perspective. This thesis focuses on exploiting channel state information (CSI) from the physical layer for optimal transmission scheduling at the medium access control (MAC) layer. The first part of the thesis considers exploiting CSI via a distributed channelaware MAC protocol. The MAC protocol is analysed using a centralized design approach and a noncooperative game theoretic approach. Structural results are obtained and provably convergent stochastic approximation algorithms that can estimate the optimal transmission policies are proposed. Especially, in the game theoretic MAC formulation, it is proved that the best response transmission policies are threshold in the channel state and there exists a Nash equilibrium at which every user deploys a threshold transmission policy. This threshold result leads to a particularly efficient stochasticapproximationbased adaptive learning algorithm and a simple distributed implementation of the MAC protocol. Simulations show that the channelaware MAC protocols result in system throughputs that increase with the number of users.
The thesis also considers opportunistic transmission scheduling from the perspective of a single user using Markov Decision Process (MDP) approaches. Both channel state information and channel memory are exploited for opportunistic transmission. First, a finite horizon MDP transmission scheduling problem is considered. The finite horizon formulation is suitable for shortterm delay constraints. It is proved for the finite horizon opportunistic transmission scheduling problem that the optimal transmission policy is threshold in the buffer occupancy state and the transmission time. This twodimensional threshold structure substantially reduces the computational complexity required to compute and implement the optimal policy. Second, the opportunistic transmission scheduling problem is formulated as an infinite horizon average cost MDP with a constraint on the average waiting cost. An advantage of the infinite horizon formulation is that the optimal policy is stationary. Using the Lagrange dynamic programming theory and the supermodularity method, it is proved that the stationary optimal transmission scheduling policy is a randomized mixture of two policies that are threshold in the buffer occupancy state. A stochastic approximation algorithm and a Qlearning based algorithm that can adaptively estimate the optimal transmission scheduling policies are then proposed.

6 
A Simulation Based Approximate Dynamic Programming Approach to Multiclass, Multiresource Surgical SchedulingAstaraky, Davood 09 January 2013 (has links)
The thesis focuses on a model that seeks to address patient scheduling step of the surgical scheduling process to determine the number of surgeries to perform in a given day. Specifically, provided a master schedule that provides a cyclic breakdown of total OR availability into specific daily allocations to each surgical specialty, we look to provide a scheduling policy for all surgeries that minimizes a combination of the lead time between patient request and surgery date, overtime in the ORs and congestion in the wards. We cast the problem of generating optimal control strategies into the framework of Markov Decision Process (MDP). The Approximate Dynamic Programming (ADP) approach has been employed to solving the model which would otherwise be intractable due to the size of the state space. We assess performance of resulting policy and quality of the driven policy through simulation and we provide our policy insights and conclusions.

7 
Evolutionarily Stable Learning and Foraging StrategiesCOWNDEN, DANIEL 01 February 2012 (has links)
This thesis examines a series of problems with the goal of better understanding the fundamental dilemma of whether to invest effort in obtaining information that may lead to better opportunities in the future versus exploiting immediately available opportunities. In particular this work investigates how this dilemma is affected by competition in an evolutionary setting. To achieve this requires both the use of evolutionary game theory, and Markov decision procesess or stochastic dynamic programming. This thesis grows directly out of earlier work on the Social Learning Strategies Tournament. Although I cast the problem in the biological setting of optimal foraging theory, where it fills an obvious gap, this fundamental dilemma should also be of some interest to economists, operations researchers, as well as those working in ecology, evolution and behaviour. / Thesis (Ph.D, Mathematics & Statistics)  Queen's University, 20120131 19:55:25.11

8 
A Simulation Based Approximate Dynamic Programming Approach to Multiclass, Multiresource Surgical SchedulingAstaraky, Davood 09 January 2013 (has links)
The thesis focuses on a model that seeks to address patient scheduling step of the surgical scheduling process to determine the number of surgeries to perform in a given day. Specifically, provided a master schedule that provides a cyclic breakdown of total OR availability into specific daily allocations to each surgical specialty, we look to provide a scheduling policy for all surgeries that minimizes a combination of the lead time between patient request and surgery date, overtime in the ORs and congestion in the wards. We cast the problem of generating optimal control strategies into the framework of Markov Decision Process (MDP). The Approximate Dynamic Programming (ADP) approach has been employed to solving the model which would otherwise be intractable due to the size of the state space. We assess performance of resulting policy and quality of the driven policy through simulation and we provide our policy insights and conclusions.

9 
An average cost Markov decision process model to decide when to challenge a call in a tennis matchNadimpalli, Vamsi Krishna 16 February 2011 (has links)
In a standard tennis match each player has an unlimited opportunity
to challenge an umpire’s call, but if three incorrect challenges are made in a set he is not allowed to challenge anymore in that set. If the set goes into a tie break the limit on incorrect challenges increases by one. These limited
incorrect challenges are not carried over from one set to another. So this is kind of a limited resource available to the player and if he knows how to use
this resource in a best possible way, there is a scope for increasing his overall chances of winning a match. With the motive of gaining insight on when to challenge a call, we have modeled a single game in a tennis match as a Markov decision process. We have also studied the impact of variables like player’s probability of winning a point, the player’s perception of the challengability
of a call and proportion of challengable calls on the decision making process. / text

10 
Refuse or reuse : managing the quality of returns in product recovery systemsMarshall, Sarah Elizabeth January 2012 (has links)
Increasing legislative and societal pressures are forcing manufacturers to become environmentallyconscious and take responsibility for the fate of their goods after they have been used by consumers. As a result, some manufacturers operate hybrid systems which produce new goods and recover used goods. Product recovery describes the process by which used products are returned to their manufacturers or sent to a specialised facility for recovery, before being sold on the original or a secondary market. The quality of the returned goods is a significant issue in product recovery systems as it can affect both the type of recovery and costs associated with it. Quality in product recovery systems has not been adequately studied, with many authors either ignoring the possibility of receiving lower quality returns, or assuming they are disposed of rather than recovered. However, such assumptions ignore the possibility that the firm might be able to salvage value from lower quality returns by using them for parts or materials. This thesis presents four models that investigate the importance of considering the quality of returns in the management of inventory in a product recovery system, by examining the costeffectiveness of recovering both high quality and low quality returns. The first model is a deterministic lotsizing model of a product recovery system. It was found that performing both high and low quality recovery reduced the sensitivity of the optimal cost to operational restrictions on the choice of decision variables. The second model is a discretetime, periodicreview model formulated as a Markov decision process (MDP) and introduces uncertainty in demand, returns, and the quality of the returns. It was found that performing both types of recovery can lead to cost savings and better customer service for firms through an increased fill rate. The third model addresses those industries where produced and recovered goods cannot be sold on the same market due to customers’ perceptions and environmental legalisation. Using an MDP formulation, the model examines a product recovery system in which produced and recovered goods are sold on separate markets. The profitability of offering twoway substitution between these markets was investigated. It was found that offering substitution can allow firms to increase both their profits and fill rates. The fourth model examines the issue of separate markets and substitution in the continuous time domain using a semiMarkov decision process. The continuous nature of the model allows more detailed examination of the substitution decision. It was found that offering substitution can allow firms to increase their profit and in some cases also increase their fill rate. In some cases, production is performed less frequently when downward substitution can be offered, and recovery is performed less often when upward substitution can be offered. The findings of this thesis could be used to help a firm that is currently recovering high quality returns assess the costeffectiveness of also recovering lower quality returns. Recovering lowquality items, rather than disposing of them, may allow a firm to increase the amount it recycles. The findings highlight the importance of considering the quality of returns when managing a product recovery system as they show that economic gains can be achieved by reusing rather than refusing low quality returns.

Page generated in 0.2213 seconds