Return to search

New Markov Decision Process Formulations and Optimal Policy Structure for Assemble-to-Order and New Product Development Problems

This thesis examines two complex, dynamic problems by employing the theory of Markov
Decision Processes (MDPs). Chapters 2 and 3 consider assemble-to-order (ATO) inventory systems. An ATO system consists of several components and several products, and assembles products as demand is realized; it is becoming increasingly popular since it provides greater flexibility in manufacturing at a reasonable cost. This work contributes to the ATO research stream by characterizing optimal inventory replenishment and allocation policies. Chapter 4 examines the new product development (NPD) process with scarce resources and many projects in parallel, each lasting several periods, in the face of uncertainty. This study advances the NPD literature by revealing that optimal project selection and resource allocation decisions are congestion-dependent. Below, I elaborate on the novel optimal policies and structural results I obtain using MDP formulations, which is the overarching theme of the thesis.
In Chapter 2, I consider generalized ATO “M-systems" with multiple components and multiple products. These systems involve a single “master" product which uses multiple units from each component, and multiple individual products each of which consumes multiple units from a different component. Such systems are common for manufacturers selling an assembled product as well as individual spare parts.
I model these systems as infinite-horizon MDPs under the discounted cost criterion. Each component is produced in batches of fixe size in a make-to stock fashion; batch sizes are determined by individual product sizes. Production times are independent and exponentially distributed. Demand for each product arrives as an independent Poisson process. If not satisfied immediately upon arrival, these demands are lost. Therefore the state of the system can be described by component inventory levels.
A control policy specifies when a batch of components should be produced (i.e., inventory replenishment), and whether an arriving demand for each product should be satisfied (i.e.,inventory allocation). The convexity property that has been largely used to characterize optimal policies in the MDP literature may fail to hold in our case. Therefore I introduce new functional characterizations for submodularity and supermodularity restricted to certain lattices of the state space. The optimal cost function satisfies these new characterizations: The state space of the problem can be partitioned into disjoint lattices such that, on each lattice, (a) it is optimal to produce a batch of a particular component if and only if the state vectors less than a certain threshold associated with that component, and (b) it is optimal to fulfill a demand of a particular product if and only if the state vector is greater than or equal to a certain threshold associated with that product. I refer to this policy as a lattice-dependent base-stock and lattice-dependent rationing (LBLR) policy. I also show that if the optimization criterion is modified to the average cost rate, LBLR remains optimal.
Chapter 2 makes three important contributions. First, this is the first study that establishes the optimal inventory replenishment and allocation policies for M-systems. Second, this study is the first to characterize the optimal policies for any ATO problem when different products may use the same component in different quantities. Third, I introduce new functional characterizations restricted to certain lattices of the state space, giving rise to an LBLR policy.
In Chapter 3, I evaluate the use of an LBLR policy for general ATO systems as a heuristic.
I numerically compare the globally optimal policy to LBLR and two other heuristics from the literature: a state-dependent base-stock and state-dependent rationing (SBSR) policy, and a fixed base-stock and fixed rationing (FBFR) policy. Taking the average cost rate as the performance criterion, I develop a linear program to find the globally optimal cost, and Mixed Integer Programming formulations to find the optimal cost within each heuristic class. I generate more than 1800 instances for the general ATO problem, not restricted to the assumptions of Chapter 2, such as the M-system product structure. Interestingly, LBLR yields the globally optimal cost in all instances, while SBSR and FBFR provide solutions within 2.7% and 4.8% of the globally optimal cost, respectively. These numerical results also provide several insights into the performance of LBLR relative to other heuristics: LBLR and SBSR perform significantly better than FBFR when replenishment batch sizes imperfectly match the component requirements of the most valuable or most highly demanded product. In addition,
LBLR substantially outperforms SBSR if it is crucial to hold a significant amount of inventory that must be rationed.
Based on the numerical findings in Chapter 3, future research could investigate the optimality of LBLR for ATO systems with general product structures. However, as I construct counter-examples showing that submodularity and supermodularity { which are used to prove the optimality of LBLR in Chapter 2 { need not hold for general ATO systems, showing the optimality of LBLR for general ATO systems will likely require alternate proof techniques.
In Chapter 4, I study the problem of project selection and resource allocation in a multistage new product development (NPD) process with stage-dependent resource constraints.
As in Chapters 2 and 3, I model the problem as an infinite-horizon MDP, specifically under the discounted cost criterion. Each NPD project undergoes a different experiment at each stage of the NPD process; these experiments generate signals about the true nature of the project. Experimentation times are independent and exponentially distributed. Beliefs about the ultimate outcome of each project are updated after each experiment according to a Bayesian rule. Projects thus become differentiated through their signals, and all available signals for a project determine its category. The state of the system is described by the numbers of projects in each category. A control policy specifies, given the system state, how to utilize the resources at each stage, i.e., the projects (i) to experiment at each stage, and (ii) to terminate.
I characterize the optimal control policy as following a new type of strategy, state-dependent on-congestive promotion (SDNCP), for two different special cases of the general problem: (a)when there is a single informative experiment and projects are not terminated, or (b) when there are multiple uninformative experiments. An SDNCP policy implies that, at each stage, it is optimal to advance a project with the highest expected reward to the next stage if and only if the number of projects in each successor category is less than a state-dependent threshold. In addition, I show that threshold values decrease in a non-strict sense as a later stage becomes more congested or as an earlier stage becomes less congested. (A stage becomes “more congested" with an increase in the number of projects at this stage or with an increase in the expected reward of any project at this stage.) An SDNCP policy can be used as a heuristic for the general problem. I support the outstanding performance of an SDNCP policy in the general case through a numerical study. These findings highlight the importance of taking into account congestion in optimal portfolio strategies.
Date09 May 2012
CreatorsNadar, Emre
PublisherResearch Showcase @ CMU
Source SetsCarnegie Mellon University
Detected LanguageEnglish

Page generated in 0.0131 seconds