Spelling suggestions: "subject:"markov codecision canprocess"" "subject:"markov codecision 3.3vprocess""
11 |
A Simulation Based Approximate Dynamic Programming Approach to Multi-class, Multi-resource Surgical SchedulingAstaraky, Davood 09 January 2013 (has links)
The thesis focuses on a model that seeks to address patient scheduling step of the surgical scheduling process to determine the number of surgeries to perform in a given day. Specifically, provided a master schedule that provides a cyclic breakdown of total OR availability into specific daily allocations to each surgical specialty, we look to provide a scheduling policy for all surgeries that minimizes a combination of the lead time between patient request and surgery date, overtime in the ORs and congestion in the wards. We cast the problem of generating optimal control strategies into the framework of Markov Decision Process (MDP). The Approximate Dynamic Programming (ADP) approach has been employed to solving the model which would otherwise be intractable due to the size of the state space. We assess performance of resulting policy and quality of the driven policy through simulation and we provide our policy insights and conclusions.
|
12 |
Cross-layer adaptive transmission scheduling in wireless networksNgo, Minh Hanh 05 1900 (has links)
A new promising approach for wireless network optimization is from a cross-layer perspective. This thesis focuses on exploiting channel state information (CSI) from the physical layer for optimal transmission scheduling at the medium access control (MAC) layer. The first part of the thesis considers exploiting CSI via a distributed channel-aware MAC protocol. The MAC protocol is analysed using a centralized design approach and a non-cooperative game theoretic approach. Structural results are obtained and provably convergent stochastic approximation algorithms that can estimate the optimal transmission policies are proposed. Especially, in the game theoretic MAC formulation, it is proved that the best response transmission policies are threshold in the channel state and there exists a Nash equilibrium at which every user deploys a threshold transmission policy. This threshold result leads to a particularly efficient stochastic-approximation-based adaptive learning algorithm and a simple distributed implementation of the MAC protocol. Simulations show that the channel-aware MAC protocols result in system throughputs that increase with the number of users.
The thesis also considers opportunistic transmission scheduling from the perspective of a single user using Markov Decision Process (MDP) approaches. Both channel state information and channel memory are exploited for opportunistic transmission. First, a finite horizon MDP transmission scheduling problem is considered. The finite horizon formulation is suitable for short-term delay constraints. It is proved for the finite horizon opportunistic transmission scheduling problem that the optimal transmission policy is threshold in the buffer occupancy state and the transmission time. This two-dimensional threshold structure substantially reduces the computational complexity required to compute and implement the optimal policy. Second, the opportunistic transmission scheduling problem is formulated as an infinite horizon average cost MDP with a constraint on the average waiting cost. An advantage of the infinite horizon formulation is that the optimal policy is stationary. Using the Lagrange dynamic programming theory and the supermodularity method, it is proved that the stationary optimal transmission scheduling policy is a randomized mixture of two policies that are threshold in the buffer occupancy state. A stochastic approximation algorithm and a Q-learning based algorithm that can adaptively estimate the optimal transmission scheduling policies are then proposed.
|
13 |
Global dual-sourcing strategy : is it effective in mitigating supply disruption?Ahmad Mustaffa, Nurakmal January 2015 (has links)
Most firms are still failing to think strategically and systematically about managing supply disruption risk and most of the supply chain management efforts are focused on reducing supply chain operation costs rather than managing disruption. Some innovative firms have taken steps to implement supply chain risk management (SCRM). Inventory management is part of SCRM because supply disruptions negatively affect the reliability of deliveries from suppliers and the costs associated with the ordering process. The complexity of existing inventory models makes it challenging to combine the management of the supply process and inventory in a single model due, for example, to the difficulty of including the characteristics of the disruption process in the supply chain network structure. Therefore, there is a need for a simple flexible model that can incorporate the key elements of supply disruption in an inventory model. This thesis presents a series of models that investigate the importance of information on disruption discovery and recovery for a firm’s supply and inventory management. A simple two-echelon supply chain with one firm and two suppliers (i.e., referred to as the onshore and offshore suppliers) in a single product/component setting has been considered in this thesis for the purpose of experimental analyses. The sourcing decisions that the firm faces during periods of supply disruption are examined leading to an assessment of how information about the risk and length of disruption and recovery can be used to facilitate the firm’s sourcing decisions and monitor the performance of stock control during the disruption. The first part of this thesis analyses basic ordering models (Model 1 and Model 2 respectively) without the risk of supply disruption and with the risk of supply disruption. The second part analyses the value of supply disruption information, using a model with advance information on the length of disruption (Model 3) and a model with learning about the length of disruption (Model 4). The third part explores a quantitative recovery model and the analyses in this part consider of three models. Model 5 assumes a basic phased recovery model, Model 6 assumes advance information about the phased recovery process and Model 7 assumes learning about the phased recovery process. The last part of this thesis investigates the order pressure scenario that exists in the firm’s supply chain. Under this scenario, disruption to one part of the supply chain network increases demand on the remainder resulting in a lower service levels than normal. This scenario is applied to all the previous models apart from Model 1. The models in this thesis are examined under finite and infinite planning horizons and with constant and stochastic demand. The objective of the models is to minimise the expected inventory cost and optimise the order quantity from the suppliers given the different assumptions with respect to the length of supply disruption and information about the recovery process. The models have been developed using the discrete time Markov decision process (DTMDP) technique and implemented using the Java programming language. The findings of this thesis could be used to help a firm that is facing the risk supply disruption to develop its SCRM program. The findings highlight the importance of considering quantitative measures of the disruption and recovery processes, something which is still not popular within SCRM in some organisations.
|
14 |
The Optimal Control of Child Delivery for Women with Hypertensive Disorders of PregnancyJanuary 2018 (has links)
abstract: Hypertensive disorders of pregnancy (HDP) affect up to 5%-15% of pregnancies around the globe, and form a leading cause of maternal and neonatal morbidity and mortality. HDP are progressive disorders for which the only cure is to deliver the baby. An increasing trend in the prevalence of HDP has been observed in the recent years. This trend is anticipated to continue due to the rise in the prevalence of diseases that strongly influence hypertension such as obesity and metabolic syndrome. In order to lessen the adverse outcomes due to HDP, we need to study (1) the natural progression of HDP, (2) the risks of adverse outcomes associated with these disorders, and (3) the optimal timing of delivery for women with HDP.
In the first study, the natural progression of HDP in the third trimester of pregnancy is modeled with a discrete-time Markov chain (DTMC). The transition probabilities of the DTMC are estimated using clinical data with an order restricted inference model that maximizes the likelihood function subject to a set of order restrictions between the transition probabilities. The results provide useful insights on the progression of HDP, and the estimated transition probabilities are used to parametrize the decision models in the third study.
In the second study, the risks of maternal and neonatal adverse outcomes for women with HDP are quantified with a composite measure of childbirth morbidity, and the estimated risks are compared with respect to type of HDP at delivery, gestational age at delivery, and type of delivery in a retrospective cohort study. Furthermore, the safety of child delivery with respect to the same variables is assessed with a provider survey and technique for order performance by similarity to ideal solution (TOPSIS). The methods and results of this study are used to parametrize the decision models in the third study.
In the third study, the decision problem of timing of delivery for women with HDP is formulated as a discrete-time Markov decision process (MDP) model that minimizes the risks of maternal and neonatal adverse outcomes. We additionally formulate a robust MDP model that gives the worst-case optimal policy when transition probabilities are allowed to vary within their confidence intervals. The results of the decision models are assessed within a probabilistic sensitivity analysis (PSA) that considers the uncertainty in the estimated risk values. In our PSA, the performance of candidate delivery policies is evaluated using a large number of problem instances that are constructed according to the orders between model parameters to incorporate physicians' intuition. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2018
|
15 |
Cross-layer adaptive transmission scheduling in wireless networksNgo, Minh Hanh 05 1900 (has links)
A new promising approach for wireless network optimization is from a cross-layer perspective. This thesis focuses on exploiting channel state information (CSI) from the physical layer for optimal transmission scheduling at the medium access control (MAC) layer. The first part of the thesis considers exploiting CSI via a distributed channel-aware MAC protocol. The MAC protocol is analysed using a centralized design approach and a non-cooperative game theoretic approach. Structural results are obtained and provably convergent stochastic approximation algorithms that can estimate the optimal transmission policies are proposed. Especially, in the game theoretic MAC formulation, it is proved that the best response transmission policies are threshold in the channel state and there exists a Nash equilibrium at which every user deploys a threshold transmission policy. This threshold result leads to a particularly efficient stochastic-approximation-based adaptive learning algorithm and a simple distributed implementation of the MAC protocol. Simulations show that the channel-aware MAC protocols result in system throughputs that increase with the number of users.
The thesis also considers opportunistic transmission scheduling from the perspective of a single user using Markov Decision Process (MDP) approaches. Both channel state information and channel memory are exploited for opportunistic transmission. First, a finite horizon MDP transmission scheduling problem is considered. The finite horizon formulation is suitable for short-term delay constraints. It is proved for the finite horizon opportunistic transmission scheduling problem that the optimal transmission policy is threshold in the buffer occupancy state and the transmission time. This two-dimensional threshold structure substantially reduces the computational complexity required to compute and implement the optimal policy. Second, the opportunistic transmission scheduling problem is formulated as an infinite horizon average cost MDP with a constraint on the average waiting cost. An advantage of the infinite horizon formulation is that the optimal policy is stationary. Using the Lagrange dynamic programming theory and the supermodularity method, it is proved that the stationary optimal transmission scheduling policy is a randomized mixture of two policies that are threshold in the buffer occupancy state. A stochastic approximation algorithm and a Q-learning based algorithm that can adaptively estimate the optimal transmission scheduling policies are then proposed. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate
|
16 |
A Simulation Based Approximate Dynamic Programming Approach to Multi-class, Multi-resource Surgical SchedulingAstaraky, Davood January 2013 (has links)
The thesis focuses on a model that seeks to address patient scheduling step of the surgical scheduling process to determine the number of surgeries to perform in a given day. Specifically, provided a master schedule that provides a cyclic breakdown of total OR availability into specific daily allocations to each surgical specialty, we look to provide a scheduling policy for all surgeries that minimizes a combination of the lead time between patient request and surgery date, overtime in the ORs and congestion in the wards. We cast the problem of generating optimal control strategies into the framework of Markov Decision Process (MDP). The Approximate Dynamic Programming (ADP) approach has been employed to solving the model which would otherwise be intractable due to the size of the state space. We assess performance of resulting policy and quality of the driven policy through simulation and we provide our policy insights and conclusions.
|
17 |
Integrating Deterministic Planning and Reinforcement Learning for Complex Sequential Decision MakingErnsberger, Timothy S. 07 March 2013 (has links)
No description available.
|
18 |
Inventory and Pricing Management of Perishable Products with Fixed and Random Shelf lifeMoshtagh, Mohammad January 2024 (has links)
In this dissertation, we study inventory and revenue management problems for perishable products with customer choice considerations. This dissertation is composed of six chapters. In Chapter 1, we provide an overview and the motivation of problems. Subsequently, in Chapter 2, we propose a joint inventory and pricing problem for a perishable product with two freshness levels. After a stochastic time, a fresh item turns into a non-fresh item, which will expire after another random duration. Under an (r, Q) ordering policy and a markdown pricing strategy for non-fresh items, we formulate a model that maximizes the long-run average profit rate. We then reduce the model to a mixed-integer bilinear program (MIBLP), which can be solved efficiently by state-of-the-art commercial solvers. We also investigate the value of using a markdown strategy by establishing bounds on it under limiting regimes of some parameters such as large market demand. Further, we consider an Economic Order Quantity (EOQ)-type heuristic and bound the optimality gap asymptotically. Our results reveal that although the clearance strategy is always beneficial for the retailer, it may hurt customers who are willing to buy fresh products.
In Chapter 3, we extend this model to the dynamic setting with multiple freshness levels of perishable products. Due to the complexity of the problem, we study the structural properties of value function and characterize the structure of the optimal policies by using the concept of anti-multimodularity. The structural analysis enables us to devise three novel and efficient heuristic policies. We further extend the model by considering donation policy and replenishment system. Our results imply that freshness-dependent pricing and dynamic pricing are two substitute strategies, while freshness-dependent pricing and donation strategy are two complement strategies for matching supply with demand. Also, high variability in product quality under dynamic pricing benefits the firm, but it may result in significant losses with a static pricing strategy.
In Chapter 4, we study a joint inventory-pricing model for perishable items with fixed shelf lives to examine the effectiveness of different markdown policies, including single-stage, multiple-stage, and dynamic markdown policies both theoretically and numerically. We show that the value of multiple-stage markdown policies over single-stage ones asymptotically vanishes as the shelf life, market demand, or customers’ maximum willingness-to-pay increase.
In chapter 5, with a focus on blood products, we optimize blood supply chain structure along with the operations optimization. Specifically, we study collection, production, replenishment, issuing, inventory, wastage, and substitution decisions under three different blood supply chain channel structures, i.e., the decentralized, centralized, and coordinated. We propose a bi-level optimization program to model the decentralized system and use the Karush–Kuhn–Tucker (KKT) optimality conditions to solve that. Although centralized systems result in a higher performance than decentralized systems, it is challenging to implement them. Thus, we design a novel coordination mechanism to motivate hospitals to operate in a centralized system. We also extend the model to the case with demand uncertainty and compare different issuing and replenishment policies. Analysis of a realistic case-study indicates that integration can significantly improve the performance of the system. Finally, Chapter 6 concludes this dissertation and proposes future research directions. / Dissertation / Doctor of Philosophy (PhD)
|
19 |
Basis Construction and Utilization for Markov Decision Processes Using GraphsJohns, Jeffrey Thomas 01 February 2010 (has links)
The ease or difficulty in solving a problemstrongly depends on the way it is represented. For example, consider the task of multiplying the numbers 12 and 24. Now imagine multiplying XII and XXIV. Both tasks can be solved, but it is clearly more difficult to use the Roman numeral representations of twelve and twenty-four. Humans excel at finding appropriate representations for solving complex problems. This is not true for artificial systems, which have largely relied on humans to provide appropriate representations. The ability to autonomously construct useful representations and to efficiently exploit them is an important challenge for artificial intelligence. This dissertation builds on a recently introduced graph-based approach to learning representations for sequential decision-making problems modeled as Markov decision processes (MDPs). Representations, or basis functions, forMDPs are abstractions of the problem’s state space and are used to approximate value functions, which quantify the expected long-term utility obtained by following a policy. The graph-based approach generates basis functions capturing the structure of the environment. Handling large environments requires efficiently constructing and utilizing these functions. We address two issues with this approach: (1) scaling basis construction and value function approximation to large graphs/data sets, and (2) tailoring the approximation to a specific policy’s value function. We introduce two algorithms for computing basis functions from large graphs. Both algorithms work by decomposing the basis construction problem into smaller, more manageable subproblems. One method determines the subproblems by enforcing block structure, or groupings of states. The other method uses recursion to solve subproblems which are then used for approximating the original problem. Both algorithms result in a set of basis functions from which we employ basis selection algorithms. The selection algorithms represent the value function with as few basis functions as possible, thereby reducing the computational complexity of value function approximation and preventing overfitting. The use of basis selection algorithms not only addresses the scaling problem but also allows for tailoring the approximation to a specific policy. This results in a more accurate representation than obtained when using the same subset of basis functions irrespective of the policy being evaluated. To make effective use of the data, we develop a hybrid leastsquares algorithm for setting basis function coefficients. This algorithm is a parametric combination of two common least-squares methods used for MDPs. We provide a geometric and analytical interpretation of these methods and demonstrate the hybrid algorithm’s ability to discover improved policies. We also show how the algorithm can include graphbased regularization to help with sparse samples from stochastic environments. This work investigates all aspects of linear value function approximation: constructing a dictionary of basis functions, selecting a subset of basis functions from the dictionary, and setting the coefficients on the selected basis functions. We empirically evaluate each of these contributions in isolation and in one combined architecture.
|
20 |
A Mathematical Model for the Energy Allocation Function of SleepSwang, Theodore W., II 07 July 2017 (has links)
No description available.
|
Page generated in 0.0778 seconds