Spelling suggestions: "subject:"[een] SEQUENTIAL DECISION"" "subject:"[enn] SEQUENTIAL DECISION""
1 |
Contributions to the theory of Gittins indices : with applications in pharmaceutical research and clinical trialsWang, You-Gan January 1991 (has links)
No description available.
|
2 |
Nuosekliųjų statistinių sprendimų teorijos taikymas inžineriniuose ir ekonominiuose uždaviniuose / Theory of consistent statistical solutions applying in engineering and economic tasksVaitekėnas, Giedrius 25 May 2006 (has links)
In practical people activity (in economy, control, projection of constructive and so on) there are often cases when you have to chose one solution from several. Inherently arises the task to find the best solution. Most of solutions in persons life and organisations are done in series. It means, that they can make the last solution at any moment or suspend it for later time, in the hope of better times. Part of consistent statistical solutions shows not clear events in the future and limited plausibility of our observations. Dynamic programming is simple way for optimizing consistent solutions. Dynamic programming is applied when specific problems arise, such as development of specific algorithms. There are analyzed two types of tasks in this work: one-off consistent solution and multifold consistent solutions. There are two appointed tasks, one is engineering type (technological process braking), other economic type (purchase of flat).
|
3 |
Group sequential and adaptive methods : topics with applications for clinical trialsÖhrn, Carl Fredrik January 2011 (has links)
This thesis deals with sequential and adaptive methods for clinical trials, and how such methods can be used to achieve efficient clinical trial designs. The efficiency gains that can be achieved through non-adaptive group sequential methods are well established, while the newer adaptive methods seek to combine the best of the classical group sequential framework with an approach that gives increased flexibility. Our results show that the adaptive methods can provide some additional efficiency, as well as increased possibilities to respond to new internal and external information. Care is however needed when applying adaptive methods. While sub-optimal rules for adaptation can lead to inefficiencies, the logistical challenges can also be considerable. Efficient non-adaptive group sequential designs are often easier to implement in practice, and have for the cases we have considered been quite competitive in terms of efficiency. The four problems that are presented in this thesis are very relevant to how clinical trials are run in practice. The solutions that we present are either new approaches to problems that have not previously been solved, or methods that are more efficient than the ones currently available in the literature. Several challenging optimisation problems are solved through numerical computations. The optimal designs that are achieved can be used to benchmark new methods proposed in this thesis as well as methods available in the statistical literature. The problem that is solved in Chapter 5 can be viewed as a natural extension to the other problems. It brings together methods that we have used to the design of individual trials, to solve the more complex problem of designing a sequence of trials that are the core part of a clinical development program. The expected utility that is maximised is motivated by how the development of new medicines works in practice.
|
4 |
Semi-Cooperative Learning in Smart Grid AgentsReddy, Prashant P. 01 December 2013 (has links)
Striving to reduce the environmental impact of our growing energy demand creates tough new challenges in how we generate and use electricity. We need to develop Smart Grid systems in which distributed sustainable energy resources are fully integrated and energy consumption is efficient. Customers, i.e., consumers and distributed producers, require agent technology that automates much of their decision-making to become active participants in the Smart Grid. This thesis develops models and learning algorithms for such autonomous agents in an environment where customers operate in modern retail power markets and thus have a choice of intermediary brokers with whom they can contract to buy or sell power. In this setting, customers face a learning and multiscale decision-making problem – they must manage contracts with one or more brokers and simultaneously, on a finer timescale, manage their consumption or production levels under existing contracts. On a contextual scale, they can optimize their isolated selfinterest or consider their shared goals with other agents. We advance the idea that a Learning Utility Management Agent (LUMA), or a network of such agents, deployed on behalf of a Smart Grid customer can autonomously address that customer’s multiscale decision-making responsibilities. We study several relationships between a given LUMA and other agents in the environment. These relationships are semi-cooperative and the degree of expected cooperation can change dynamically with the evolving state of the world. We exploit the multiagent structure of the problem to control the degree of partial observability. Since a large portion of relevant hidden information is visible to the other agents in the environment, we develop methods for Negotiated Learning, whereby a LUMA can offer incentives to the other agents to obtain information that sufficiently reduces its own uncertainty while trading off the cost of offering those incentives. The thesis first introduces pricing algorithms for autonomous broker agents, time series forecasting models for long range simulation, and capacity optimization algorithms for multi-dwelling customers. We then introduce Negotiable Entity Selection Processes (NESP) as a formal representation where partial observability is negotiable amongst certain classes of agents. We then develop our ATTRACTIONBOUNDED- LEARNING algorithm, which leverages the variability of hidden information for efficient multiagent learning. We apply the algorithm to address the variable-rate tariff selection and capacity aggregate management problems faced by Smart Grid customers. We evaluate the work on real data using Power TAC, an agent-based Smart Grid simulation platform and substantiate the value of autonomous Learning Utility Management Agents in the Smart Grid.
|
5 |
Online Combinatorial Optimization under Bandit FeedbackTalebi Mazraeh Shahi, Mohammad Sadegh January 2016 (has links)
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems. / <p>QC 20160201</p>
|
6 |
Design of Joint Verification-Correction Strategies for Engineered SystemsXu, Peng 28 June 2022 (has links)
System verification is a critical process in the development of engineered systems. Engineers gain confidence in the correct functionality of the system by executing system verification. Traditionally, system verification is implemented by conducting a verification strategy (VS) consisting of verification activities (VA). A VS can be generated using industry standards, expert experience, or quantitative-based methods. However, two limitations exist in these previous studies. First, as an essential part of system verification, correction activities (CA) are used to correct system errors or defects identified by VAs. However, CAs are usually simplified and treated as a component associated with VAs instead of independent decisions. Even though this simplification may accelerate the VS design, it results in inferior VSs because the optimization of correction decisions is ignored. Second, current methods have not handled the issue of complex engineered systems. As the number of activities increases, the magnitude of the possible VSs becomes so large that finding the optimal VS is impossible or impractical. Therefore, these limitations leave room for improving the VS design, especially for complex engineered systems.
This dissertation presents a joint verification-correction model (JVCM) to address these gaps. The basic idea of this model is to provide an engineering paradigm for complex engineered systems that simultaneously consider decisions about VAs and CAs. The accompanying research problem is to develop a modeling and analysis framework to solve for joint verification-correction strategies (JVCS). This dissertation aims to address them in three steps. First, verification processes (VP) are modeled mathematically to capture the impacts of VAs and CAs. Second, a JVCM with small strategy spaces is established with all conditions of a VP. A modified backward induction method is proposed to solve for an optimal JVCS in small strategy spaces. Third, a UCB-based tree search approach is designed to find near-optimal JVCSs in large strategy spaces. A case study is conducted and analyzed in each step to show the feasibility of the proposed models and methods. / Doctor of Philosophy / System verification is a critical step in the life cycle of system development. It is used to check that a system conforms to its design requirements. Traditionally, system verification is implemented by conducting a verification strategy (VS) consisting of verification activities (VA). A VS can be generated using industry standards, expert experience, or quantitative-based methods. However, two limitations exist in these methods. First, as an essential part of system verification, correction activities (CA) are used to correct system errors or defects identified by VAs. However, CAs are usually simplified and treated as remedial measures that depend on the results of VAs instead of independent decision choices. Even though this simplification may accelerate the VS design, it results in inferior VSs because the optimization of correction decisions is ignored. Second, current methods have not handled the issue of large systems. As the number of activities increases, the total number of possible VSs becomes so large that it is impossible to find the optimal solution. Therefore, these limitations leave room for improving the VS design, especially for large systems.
This dissertation presents a joint verification-correction model (JVCM) to address these gaps. The basic idea of this model is to provide a paradigm for large systems that simultaneously consider decisions about VAs and CAs. The accompanying research problem is to develop a modeling and analysis framework to solve for joint verification-correction strategies (JVCS). This dissertation aims to address them in three steps. First, verification processes (VP) are modeled mathematically to capture the impacts of VAs and CAs. Second, a JVCM with small strategy spaces is established with all conditions of a VP. A modified backward induction method is proposed to solve for an optimal JVCS in small strategy spaces. Third, a UCB-based tree search approach is designed to find near-optimal JVCSs in large strategy spaces. A case study is conducted and analyzed in each step to show the feasibility of the proposed models and methods.
|
7 |
Resource Allocation Decision-Making in Sequential Adaptive Clinical TrialsRojas Cordova, Alba Claudia 19 June 2017 (has links)
Adaptive clinical trials for new drugs or treatment options promise substantial benefits to both the pharmaceutical industry and the patients, but complicate resource allocation decisions. In this dissertation, we focus on sequential adaptive clinical trials with binary response, which allow for early termination of drug testing for benefit or futility at interim analysis points. The option to stop the trial early enables the trial sponsor to mitigate investment risks on ineffective drugs, and to shorten the development time line of effective drugs, hence reducing expenditures and expediting patient access to these new therapies. In this setting, decision makers need to determine a testing schedule, or the number of patients to recruit at each interim analysis point, and stopping criteria that inform their decision to continue or stop the trial, considering performance measures that include drug misclassification risk, time-to-market, and expected profit. In the first manuscript, we model current practices of sequential adaptive trials, so as to quantify the magnitude of drug misclassification risk. Towards this end, we build a simulation model to realistically represent the current decision-making process, including the utilization of the triangular test, a widely implemented sequential methodology. We find that current practices lead to a high risk of incorrectly terminating the development of an effective drug, thus, to unrecoverable expenses for the sponsor, and unfulfilled patient needs. In the second manuscript, we study the sequential resource allocation decision, in terms of a testing schedule and stopping criteria, so as to quantify the impact of interim analyses on the aforementioned performance measures. Towards this end, we build a stochastic dynamic programming model, integrated with a Bayesian learning framework for updating the drug’s estimated efficacy. The resource allocation decision is characterized by endogenous uncertainty, and a trade-off between the incentive to establish that the drug is effective early on (exploitation), due to a time-decreasing market revenue, and the benefit from collecting some information on the drug’s efficacy prior to committing a large budget (exploration). We derive important structural properties of an optimal resource allocation strategy and perform a numerical study based on realistic data, and show that sequential adaptive trials with interim analyses substantially outperform traditional trials. Finally, the third manuscript integrates the first two models, and studies the benefits of an optimal resource allocation decision over current practices. Our findings indicate that our optimal testing schedules outperform different types of fixed testing schedules under both perfect and imperfect information. / Ph. D. / Adaptive clinical trials for new drugs or treatment options have the potential to reduce pharmaceutical research and development costs, and to expedite patient access to new therapies. Sequential adaptive clinical trials allow investigators and trial sponsors to terminate drug testing “early,” at interim analysis points, either for benefit or futility reasons. In the first manuscript, we model current practices of sequential adaptive trials, so as to quantify the risk of terminating the development of an effective drug incorrectly. Towards this end, we build a simulation model to realistically represent the current decision-making process. In the second manuscript, we study the financial investment decisions made by the trial sponsor, such as pharmaceutical firms, so as to quantify the impact of interim analyses on a series of performance measures relevant to the firm and the patients. Towards this end, we build a a mathematical optimization model that incorporates elements representing the knowledge gained by decision makers on the drug’s efficacy, which is unknown to them at the beginning of the trial. As a result of our analysis, we obtain an optimal strategy to allocate financial resources in a sequential adaptive trial. In the third and final manuscript, we compare the performance of our optimal resource allocation strategy against the performance of the triangular test, a well-known and widely implemented sequential testing methodology, as measured by the aforementioned performance measures.
|
8 |
A Novel Control Engineering Approach to Designing and Optimizing Adaptive Sequential Behavioral InterventionsJanuary 2014 (has links)
abstract: Control engineering offers a systematic and efficient approach to optimizing the effectiveness of individually tailored treatment and prevention policies, also known as adaptive or ``just-in-time'' behavioral interventions. These types of interventions represent promising strategies for addressing many significant public health concerns. This dissertation explores the development of decision algorithms for adaptive sequential behavioral interventions using dynamical systems modeling, control engineering principles and formal optimization methods. A novel gestational weight gain (GWG) intervention involving multiple intervention components and featuring a pre-defined, clinically relevant set of sequence rules serves as an excellent example of a sequential behavioral intervention; it is examined in detail in this research.
A comprehensive dynamical systems model for the GWG behavioral interventions is developed, which demonstrates how to integrate a mechanistic energy balance model with dynamical formulations of behavioral models, such as the Theory of Planned Behavior and self-regulation. Self-regulation is further improved with different advanced controller formulations. These model-based controller approaches enable the user to have significant flexibility in describing a participant's self-regulatory behavior through the tuning of controller adjustable parameters. The dynamic simulation model demonstrates proof of concept for how self-regulation and adaptive interventions influence GWG, how intra-individual and inter-individual variability play a critical role in determining intervention outcomes, and the evaluation of decision rules.
Furthermore, a novel intervention decision paradigm using Hybrid Model Predictive Control framework is developed to generate sequential decision policies in the closed-loop. Clinical considerations are systematically taken into account through a user-specified dosage sequence table corresponding to the sequence rules, constraints enforcing the adjustment of one input at a time, and a switching time strategy accounting for the difference in frequency between intervention decision points and sampling intervals. Simulation studies illustrate the potential usefulness of the intervention framework.
The final part of the dissertation presents a model scheduling strategy relying on gain-scheduling to address nonlinearities in the model, and a cascade filter design for dual-rate control system is introduced to address scenarios with variable sampling rates. These extensions are important for addressing real-life scenarios in the GWG intervention. / Dissertation/Thesis / Doctoral Dissertation Chemical Engineering 2014
|
9 |
Markovian sequential decision-making in non-stationary environments : application to argumentative debates / Décision séquentielle markovienne en environnements non-stationnaires : application aux débats d'argumentationHadoux, Emmanuel 26 November 2015 (has links)
Les problèmes de décision séquentielle dans l’incertain requièrent qu’un agent prenne des décisions, les unes après les autres, en fonction de l’état de l’environnement dans lequel il se trouve. Dans la plupart des travaux, l’environnement dans lequel évolue l’agent est supposé stationnaire, c’est-à-dire qu’il n’évolue pas avec le temps. Toute- fois, l’hypothèse de stationnarité peut ne pas être vérifiée quand, par exemple, des évènements exogènes au problème interviennent. Dans cette thèse, nous nous intéressons à la prise de décision séquentielle dans des environnements non-stationnaires. Nous proposons un nouveau modèle appelé HS3MDP permettant de représenter les problèmes non-stationnaires dont les dynamiques évoluent parmi un ensemble fini de contextes. Afin de résoudre efficacement ces problèmes, nous adaptons l’algorithme POMCP aux HS3MDP. Dans le but d’apprendre les dynamiques des problèmes de cette classe, nous présentons RLCD avec SCD, une méthode utilisable sans connaître à priori le nombre de contextes. Nous explorons ensuite le domaine de l’argumentation où peu de travaux se sont intéressés à la décision séquentielle. Nous étudions deux types de problèmes : les débats stochastiques (APS ) et les problèmes de médiation face à des agents non-stationnaires (DMP). Nous présentons dans ce travail un modèle formalisant les APS et permettant de les transformer en MOMDP afin d’optimiser la séquence d’arguments d’un des agents du débat. Nous étendons cette modélisation aux DMP afin de permettre à un médiateur de répartir stratégiquement la parole dans un débat. / In sequential decision-making problems under uncertainty, an agent makes decisions, one after another, considering the current state of the environment where she evolves. In most work, the environment the agent evolves in is assumed to be stationary, i.e., its dynamics do not change over time. However, the stationarity hypothesis can be invalid if, for instance, exogenous events can occur. In this document, we are interested in sequential decision-making in non-stationary environments. We propose a new model named HS3MDP, allowing us to represent non-stationary problems whose dynamics evolve among a finite set of contexts. In order to efficiently solve those problems, we adapt the POMCP algorithm to HS3MDPs. We also present RLCD with SCD, a new method to learn the dynamics of the environments, without knowing a priori the number of contexts. We then explore the field of argumentation problems, where few works consider sequential decision-making. We address two types of problems: stochastic debates (APS ) and mediation problems with non-stationary agents (DMP). In this work, we present a model formalizing APS and allowing us to transform them into an MOMDP in order to optimize the sequence of arguments of one agent in the debate. We then extend this model to DMPs to allow a mediator to strategically organize speak-turns in a debate.
|
10 |
Statistical Methods for Offline Deep Reinforcement LearningDanyang Wang (18414336) 20 April 2024 (has links)
<p dir="ltr">Reinforcement learning (RL) has been a rapidly evolving field of research over the past years, enhancing developments in areas such as artificial intelligence, healthcare, and education, to name a few. Regardless of the success of RL, its inherent online learning nature presents obstacles for its real-world applications, since in many settings, online data collection with the latest learned policy can be expensive and/or dangerous (such as robotics, healthcare, and autonomous driving). This challenge has catalyzed research into offline RL, which involves reinforcement learning from previously collected static datasets, without the need for further online data collection. However, most existing offline RL methods depend on two key assumptions: unconfoundedness and positivity (also known as the full-coverage assumption), which frequently do not hold in the context of static datasets. </p><p dir="ltr">In the first part of this dissertation, we simultaneously address these two challenges by proposing a novel policy learning algorithm: PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on Front-Door Criterion, to remove the confounding bias. Additionally, we adopt the pessimistic principle to tackle the distributional shift problem induced by the under-coverage issue. This issue refers to the mismatch of distributions between the action distributions induced by candidate policies, and the policy that generates the observational data (known as the behavior policy). Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.</p><p dir="ltr">In the second part of this dissertation, in contrast to the first part, which approaches the distributional shift issue implicitly by penalizing the value function as a whole, we explicitly constrain the learned policy to not deviate significantly from the behavior policy, while still enabling flexible adjustment of the degree of constraints. Building upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, we propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based Actor Critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL benchmarks.</p>
|
Page generated in 0.0604 seconds