Spelling suggestions: "subject:"equential decision"" "subject:"aequential decision""
1 |
Contributions to the theory of Gittins indices : with applications in pharmaceutical research and clinical trialsWang, You-Gan January 1991 (has links)
No description available.
|
2 |
Nuosekliųjų statistinių sprendimų teorijos taikymas inžineriniuose ir ekonominiuose uždaviniuose / Theory of consistent statistical solutions applying in engineering and economic tasksVaitekėnas, Giedrius 25 May 2006 (has links)
In practical people activity (in economy, control, projection of constructive and so on) there are often cases when you have to chose one solution from several. Inherently arises the task to find the best solution. Most of solutions in persons life and organisations are done in series. It means, that they can make the last solution at any moment or suspend it for later time, in the hope of better times. Part of consistent statistical solutions shows not clear events in the future and limited plausibility of our observations. Dynamic programming is simple way for optimizing consistent solutions. Dynamic programming is applied when specific problems arise, such as development of specific algorithms. There are analyzed two types of tasks in this work: one-off consistent solution and multifold consistent solutions. There are two appointed tasks, one is engineering type (technological process braking), other economic type (purchase of flat).
|
3 |
Group sequential and adaptive methods : topics with applications for clinical trialsÖhrn, Carl Fredrik January 2011 (has links)
This thesis deals with sequential and adaptive methods for clinical trials, and how such methods can be used to achieve efficient clinical trial designs. The efficiency gains that can be achieved through non-adaptive group sequential methods are well established, while the newer adaptive methods seek to combine the best of the classical group sequential framework with an approach that gives increased flexibility. Our results show that the adaptive methods can provide some additional efficiency, as well as increased possibilities to respond to new internal and external information. Care is however needed when applying adaptive methods. While sub-optimal rules for adaptation can lead to inefficiencies, the logistical challenges can also be considerable. Efficient non-adaptive group sequential designs are often easier to implement in practice, and have for the cases we have considered been quite competitive in terms of efficiency. The four problems that are presented in this thesis are very relevant to how clinical trials are run in practice. The solutions that we present are either new approaches to problems that have not previously been solved, or methods that are more efficient than the ones currently available in the literature. Several challenging optimisation problems are solved through numerical computations. The optimal designs that are achieved can be used to benchmark new methods proposed in this thesis as well as methods available in the statistical literature. The problem that is solved in Chapter 5 can be viewed as a natural extension to the other problems. It brings together methods that we have used to the design of individual trials, to solve the more complex problem of designing a sequence of trials that are the core part of a clinical development program. The expected utility that is maximised is motivated by how the development of new medicines works in practice.
|
4 |
Machine Learning Solution Methods for Multistage Stochastic ProgrammingDefourny, Boris 20 December 2010 (has links)
This thesis investigates the following question: Can supervised learning techniques be successfully used for finding better solutions to multistage stochastic programs? A similar question had already been posed in the context of reinforcement learning, and had led to algorithmic and conceptual advances in the field of approximate value function methods over the years. This thesis identifies several ways to exploit the combination "multistage stochastic programming/supervised learning" for sequential decision making under uncertainty.
Multistage stochastic programming is essentially the extension of stochastic programming to several recourse stages. After an introduction to multistage stochastic programming and a summary of existing approximation approaches based on scenario trees, this thesis mainly focusses on the use of supervised learning for building decision policies from scenario-tree approximations.
Two ways of exploiting learned policies in the context of the practical issues posed by the multistage stochastic programming framework are explored: the fast evaluation of performance guarantees for a given approximation, and the selection of good scenario trees. The computational efficiency of the approach allows novel investigations relative to the construction of scenario trees, from which novel insights, solution approaches and algorithms are derived. For instance, we generate and select scenario trees with random branching structures for problems over large planning horizons. Our experiments on the empirical performances of learned policies, compared to golden-standard policies, suggest that the combination of stochastic programming and machine learning techniques could also constitute a method per se for sequential decision making under uncertainty, inasmuch as learned policies are simple to use, and come with performance guarantees that can actually be quite good.
Finally, limitations of approaches that build an explicit model to represent an optimal solution mapping are studied in a simple parametric programming setting, and various insights regarding this issue are obtained.
|
5 |
Semi-Cooperative Learning in Smart Grid AgentsReddy, Prashant P. 01 December 2013 (has links)
Striving to reduce the environmental impact of our growing energy demand creates tough new challenges in how we generate and use electricity. We need to develop Smart Grid systems in which distributed sustainable energy resources are fully integrated and energy consumption is efficient. Customers, i.e., consumers and distributed producers, require agent technology that automates much of their decision-making to become active participants in the Smart Grid. This thesis develops models and learning algorithms for such autonomous agents in an environment where customers operate in modern retail power markets and thus have a choice of intermediary brokers with whom they can contract to buy or sell power. In this setting, customers face a learning and multiscale decision-making problem – they must manage contracts with one or more brokers and simultaneously, on a finer timescale, manage their consumption or production levels under existing contracts. On a contextual scale, they can optimize their isolated selfinterest or consider their shared goals with other agents. We advance the idea that a Learning Utility Management Agent (LUMA), or a network of such agents, deployed on behalf of a Smart Grid customer can autonomously address that customer’s multiscale decision-making responsibilities. We study several relationships between a given LUMA and other agents in the environment. These relationships are semi-cooperative and the degree of expected cooperation can change dynamically with the evolving state of the world. We exploit the multiagent structure of the problem to control the degree of partial observability. Since a large portion of relevant hidden information is visible to the other agents in the environment, we develop methods for Negotiated Learning, whereby a LUMA can offer incentives to the other agents to obtain information that sufficiently reduces its own uncertainty while trading off the cost of offering those incentives. The thesis first introduces pricing algorithms for autonomous broker agents, time series forecasting models for long range simulation, and capacity optimization algorithms for multi-dwelling customers. We then introduce Negotiable Entity Selection Processes (NESP) as a formal representation where partial observability is negotiable amongst certain classes of agents. We then develop our ATTRACTIONBOUNDED- LEARNING algorithm, which leverages the variability of hidden information for efficient multiagent learning. We apply the algorithm to address the variable-rate tariff selection and capacity aggregate management problems faced by Smart Grid customers. We evaluate the work on real data using Power TAC, an agent-based Smart Grid simulation platform and substantiate the value of autonomous Learning Utility Management Agents in the Smart Grid.
|
6 |
Online Combinatorial Optimization under Bandit FeedbackTalebi Mazraeh Shahi, Mohammad Sadegh January 2016 (has links)
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems. / <p>QC 20160201</p>
|
7 |
Design of Joint Verification-Correction Strategies for Engineered SystemsXu, Peng 28 June 2022 (has links)
System verification is a critical process in the development of engineered systems. Engineers gain confidence in the correct functionality of the system by executing system verification. Traditionally, system verification is implemented by conducting a verification strategy (VS) consisting of verification activities (VA). A VS can be generated using industry standards, expert experience, or quantitative-based methods. However, two limitations exist in these previous studies. First, as an essential part of system verification, correction activities (CA) are used to correct system errors or defects identified by VAs. However, CAs are usually simplified and treated as a component associated with VAs instead of independent decisions. Even though this simplification may accelerate the VS design, it results in inferior VSs because the optimization of correction decisions is ignored. Second, current methods have not handled the issue of complex engineered systems. As the number of activities increases, the magnitude of the possible VSs becomes so large that finding the optimal VS is impossible or impractical. Therefore, these limitations leave room for improving the VS design, especially for complex engineered systems.
This dissertation presents a joint verification-correction model (JVCM) to address these gaps. The basic idea of this model is to provide an engineering paradigm for complex engineered systems that simultaneously consider decisions about VAs and CAs. The accompanying research problem is to develop a modeling and analysis framework to solve for joint verification-correction strategies (JVCS). This dissertation aims to address them in three steps. First, verification processes (VP) are modeled mathematically to capture the impacts of VAs and CAs. Second, a JVCM with small strategy spaces is established with all conditions of a VP. A modified backward induction method is proposed to solve for an optimal JVCS in small strategy spaces. Third, a UCB-based tree search approach is designed to find near-optimal JVCSs in large strategy spaces. A case study is conducted and analyzed in each step to show the feasibility of the proposed models and methods. / Doctor of Philosophy / System verification is a critical step in the life cycle of system development. It is used to check that a system conforms to its design requirements. Traditionally, system verification is implemented by conducting a verification strategy (VS) consisting of verification activities (VA). A VS can be generated using industry standards, expert experience, or quantitative-based methods. However, two limitations exist in these methods. First, as an essential part of system verification, correction activities (CA) are used to correct system errors or defects identified by VAs. However, CAs are usually simplified and treated as remedial measures that depend on the results of VAs instead of independent decision choices. Even though this simplification may accelerate the VS design, it results in inferior VSs because the optimization of correction decisions is ignored. Second, current methods have not handled the issue of large systems. As the number of activities increases, the total number of possible VSs becomes so large that it is impossible to find the optimal solution. Therefore, these limitations leave room for improving the VS design, especially for large systems.
This dissertation presents a joint verification-correction model (JVCM) to address these gaps. The basic idea of this model is to provide a paradigm for large systems that simultaneously consider decisions about VAs and CAs. The accompanying research problem is to develop a modeling and analysis framework to solve for joint verification-correction strategies (JVCS). This dissertation aims to address them in three steps. First, verification processes (VP) are modeled mathematically to capture the impacts of VAs and CAs. Second, a JVCM with small strategy spaces is established with all conditions of a VP. A modified backward induction method is proposed to solve for an optimal JVCS in small strategy spaces. Third, a UCB-based tree search approach is designed to find near-optimal JVCSs in large strategy spaces. A case study is conducted and analyzed in each step to show the feasibility of the proposed models and methods.
|
8 |
Resource Allocation Decision-Making in Sequential Adaptive Clinical TrialsRojas Cordova, Alba Claudia 19 June 2017 (has links)
Adaptive clinical trials for new drugs or treatment options promise substantial benefits to both the pharmaceutical industry and the patients, but complicate resource allocation decisions. In this dissertation, we focus on sequential adaptive clinical trials with binary response, which allow for early termination of drug testing for benefit or futility at interim analysis points. The option to stop the trial early enables the trial sponsor to mitigate investment risks on ineffective drugs, and to shorten the development time line of effective drugs, hence reducing expenditures and expediting patient access to these new therapies. In this setting, decision makers need to determine a testing schedule, or the number of patients to recruit at each interim analysis point, and stopping criteria that inform their decision to continue or stop the trial, considering performance measures that include drug misclassification risk, time-to-market, and expected profit. In the first manuscript, we model current practices of sequential adaptive trials, so as to quantify the magnitude of drug misclassification risk. Towards this end, we build a simulation model to realistically represent the current decision-making process, including the utilization of the triangular test, a widely implemented sequential methodology. We find that current practices lead to a high risk of incorrectly terminating the development of an effective drug, thus, to unrecoverable expenses for the sponsor, and unfulfilled patient needs. In the second manuscript, we study the sequential resource allocation decision, in terms of a testing schedule and stopping criteria, so as to quantify the impact of interim analyses on the aforementioned performance measures. Towards this end, we build a stochastic dynamic programming model, integrated with a Bayesian learning framework for updating the drug’s estimated efficacy. The resource allocation decision is characterized by endogenous uncertainty, and a trade-off between the incentive to establish that the drug is effective early on (exploitation), due to a time-decreasing market revenue, and the benefit from collecting some information on the drug’s efficacy prior to committing a large budget (exploration). We derive important structural properties of an optimal resource allocation strategy and perform a numerical study based on realistic data, and show that sequential adaptive trials with interim analyses substantially outperform traditional trials. Finally, the third manuscript integrates the first two models, and studies the benefits of an optimal resource allocation decision over current practices. Our findings indicate that our optimal testing schedules outperform different types of fixed testing schedules under both perfect and imperfect information. / Ph. D. / Adaptive clinical trials for new drugs or treatment options have the potential to reduce pharmaceutical research and development costs, and to expedite patient access to new therapies. Sequential adaptive clinical trials allow investigators and trial sponsors to terminate drug testing “early,” at interim analysis points, either for benefit or futility reasons. In the first manuscript, we model current practices of sequential adaptive trials, so as to quantify the risk of terminating the development of an effective drug incorrectly. Towards this end, we build a simulation model to realistically represent the current decision-making process. In the second manuscript, we study the financial investment decisions made by the trial sponsor, such as pharmaceutical firms, so as to quantify the impact of interim analyses on a series of performance measures relevant to the firm and the patients. Towards this end, we build a a mathematical optimization model that incorporates elements representing the knowledge gained by decision makers on the drug’s efficacy, which is unknown to them at the beginning of the trial. As a result of our analysis, we obtain an optimal strategy to allocate financial resources in a sequential adaptive trial. In the third and final manuscript, we compare the performance of our optimal resource allocation strategy against the performance of the triangular test, a well-known and widely implemented sequential testing methodology, as measured by the aforementioned performance measures.
|
9 |
A Novel Control Engineering Approach to Designing and Optimizing Adaptive Sequential Behavioral InterventionsJanuary 2014 (has links)
abstract: Control engineering offers a systematic and efficient approach to optimizing the effectiveness of individually tailored treatment and prevention policies, also known as adaptive or ``just-in-time'' behavioral interventions. These types of interventions represent promising strategies for addressing many significant public health concerns. This dissertation explores the development of decision algorithms for adaptive sequential behavioral interventions using dynamical systems modeling, control engineering principles and formal optimization methods. A novel gestational weight gain (GWG) intervention involving multiple intervention components and featuring a pre-defined, clinically relevant set of sequence rules serves as an excellent example of a sequential behavioral intervention; it is examined in detail in this research.
A comprehensive dynamical systems model for the GWG behavioral interventions is developed, which demonstrates how to integrate a mechanistic energy balance model with dynamical formulations of behavioral models, such as the Theory of Planned Behavior and self-regulation. Self-regulation is further improved with different advanced controller formulations. These model-based controller approaches enable the user to have significant flexibility in describing a participant's self-regulatory behavior through the tuning of controller adjustable parameters. The dynamic simulation model demonstrates proof of concept for how self-regulation and adaptive interventions influence GWG, how intra-individual and inter-individual variability play a critical role in determining intervention outcomes, and the evaluation of decision rules.
Furthermore, a novel intervention decision paradigm using Hybrid Model Predictive Control framework is developed to generate sequential decision policies in the closed-loop. Clinical considerations are systematically taken into account through a user-specified dosage sequence table corresponding to the sequence rules, constraints enforcing the adjustment of one input at a time, and a switching time strategy accounting for the difference in frequency between intervention decision points and sampling intervals. Simulation studies illustrate the potential usefulness of the intervention framework.
The final part of the dissertation presents a model scheduling strategy relying on gain-scheduling to address nonlinearities in the model, and a cascade filter design for dual-rate control system is introduced to address scenarios with variable sampling rates. These extensions are important for addressing real-life scenarios in the GWG intervention. / Dissertation/Thesis / Doctoral Dissertation Chemical Engineering 2014
|
10 |
Dynamic computational models of risk and effort discounting in sequential decision makingCuevas Rivera, Dario 30 June 2021 (has links)
Dissertation based on my publications in the field of risky behavior in dynamic, sequential decision making tasks.:1.- Introduction
2.- Context-dependent risk aversion: a model-based approach
3.- Modeling dynamic allocation of effort in a sequential task using discounting models
4.- General discussion
|
Page generated in 0.1001 seconds