Global ETD Search

1	Contributions to the theory of Gittins indices : with applications in pharmaceutical research and clinical trials Wang, You-Gan January 1991 (has links) No description available. 519.5 Sequential decision processes
2	Search behaviour : an analysis of information collection and usage during the decision process Fletcher, Keith January 1986 (has links) The purpose of this research was to investigate the nature of consumer decision making. It considered the purchase of a video cassette recorder and investigated whether the assumptions of a model based on satisficing behaviour could be justified. It considered the nature of search behaviour and evaluation during the decision process and the factors which might influence it. The research therefore studied the stages of the decision process from the nature of Problem Recognition and Problem Classification, including the development of evoked sets during the decision process, the preference for and use of different information sources, the nature of search behaviour, the importance of choice criteria and the decision rules used while employing these choice criteria. This was investigated using three seperate but linked research approaches. A sample of the population in the West of Scotland was analysed to investigate differences between video owners and non video owners, while qualitative interviews were conducted to study the decision process itself. Conjoint Analysis was used to consider the relative importance of choice criteria. The study confirmed the sequential nature of the decision process and found a phased sequence of choice and search. Despite the nature of the good (expensive and innovative) the decision was generally considered of a low involvement nature. While the predictions of low involvement learning that a satisficing decision would be taken were found to be true our findings disagreed with the accepted theory on the use of information sources. It was also considered that it would be wrong to assume no cognitive processes were taking place as various choice heuristics were found which simplified the decision for the consumer. 330 Consumer decision processes
3	Acceleration of Iterative Methods for Markov Decision Processes Shlakhter, Oleksandr 21 April 2010 (has links) This research focuses on Markov Decision Processes (MDP). MDP is one of the most important and challenging areas of Operations Research. Every day people make many decisions: today's decisions impact tomorrow's and tomorrow's will impact the ones made the day after. Problems in Engineering, Science, and Business often pose similar challenges: a large number of options and uncertainty about the future. MDP is one of the most powerful tools for solving such problems. There are several standard methods for finding optimal or approximately optimal policies for MDP. Approaches widely employed to solve MDP problems include value iteration and policy iteration. Although simple to implement, these approaches are, nevertheless, limited in the size of problems that can be solved, due to excessive computation required to find close-to-optimal solutions. My thesis proposes a new value iteration and modified policy iteration methods for classes of the expected discounted MDPs and average cost MDPs. We establish a class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search. Application of these operators requires a little additional computation per iteration but reduces the number of iterations significantly. The development of the acceleration operators relies on two key properties of Markov operator, namely contraction mapping and monotonicity in a restricted region. Since Markov operators of the classical value iteration and modified policy iteration methods for average cost MDPs do not possess the contraction mapping property, for these models we restrict our study to average cost problems that can be formulated as the stochastic shortest path problem. The performance improvement is significant, while the implementation of the operator into the value iteration is trivial. Numerical studies show that the accelerated methods can be hundreds of times more efficient for solving MDP problems than the other known approaches. The computational savings can be significant especially when the discount factor approaches 1 and the transition probability matrix becomes dense, in which case the standard iterative algorithms suffer from slow convergence. Markov Decision Processes 0546
4	Acceleration of Iterative Methods for Markov Decision Processes Shlakhter, Oleksandr 21 April 2010 (has links) This research focuses on Markov Decision Processes (MDP). MDP is one of the most important and challenging areas of Operations Research. Every day people make many decisions: today's decisions impact tomorrow's and tomorrow's will impact the ones made the day after. Problems in Engineering, Science, and Business often pose similar challenges: a large number of options and uncertainty about the future. MDP is one of the most powerful tools for solving such problems. There are several standard methods for finding optimal or approximately optimal policies for MDP. Approaches widely employed to solve MDP problems include value iteration and policy iteration. Although simple to implement, these approaches are, nevertheless, limited in the size of problems that can be solved, due to excessive computation required to find close-to-optimal solutions. My thesis proposes a new value iteration and modified policy iteration methods for classes of the expected discounted MDPs and average cost MDPs. We establish a class of operators that can be integrated into value iteration and modified policy iteration algorithms for Markov Decision Processes, so as to speed up the convergence of the iterative search. Application of these operators requires a little additional computation per iteration but reduces the number of iterations significantly. The development of the acceleration operators relies on two key properties of Markov operator, namely contraction mapping and monotonicity in a restricted region. Since Markov operators of the classical value iteration and modified policy iteration methods for average cost MDPs do not possess the contraction mapping property, for these models we restrict our study to average cost problems that can be formulated as the stochastic shortest path problem. The performance improvement is significant, while the implementation of the operator into the value iteration is trivial. Numerical studies show that the accelerated methods can be hundreds of times more efficient for solving MDP problems than the other known approaches. The computational savings can be significant especially when the discount factor approaches 1 and the transition probability matrix becomes dense, in which case the standard iterative algorithms suffer from slow convergence. Markov Decision Processes 0546
5	A minimum cost and risk mitigation approach for blood collection Zeng, Chenxi 27 May 2016 (has links) Due to the limited supply and perishable nature of blood products, effective management of blood collection is critical for high quality healthcare delivery. Whole blood is typically collected over a 6 to 8 hour collection window from volunteer donors at sites, e.g., schools, universities, churches, companies, that are a significant distance from the blood products processing facility and then transported from collection site to processing facility by a blood mobile. The length of time between collecting whole blood and processing it into cryoprecipitate ("cryo"), a critical blood product for controlling massive hemorrhaging, cannot take longer than 8 hours (the 8 hour collection to completion constraint), while the collection to completion constraint for other blood products is 24 hours. In order to meet the collection to completion constraint for cryo, it is often necessary to have a "mid-drive collection"; i.e., for a vehicle other than the blood mobile to pickup and transport, at extra cost, whole blood units collected during early in the collection window to the processing facility. In this dissertation, we develop analytical models to: (1) analyze which collection sites should be designated as cryo collection sites to minimize total collection costs while satisfying the collection to completion constraint and meeting the weekly production target (the non-split case), (2) analyze the impact of changing the current process to allow collection windows to be split into two intervals and then determining which intervals should be designated as cryo collection intervals (the split case), (3) insure that the weekly production target is met with high probability. These problems lead to MDP models with large state and action spaces and constraints to guarantee that the weekly production target is met with high probability. These models are computationally intractable for problems having state and action spaces of realistic cardinality. We consider two approaches to guarantee that the weekly production target is met with high probability: (1) a penalty function approach and (2) a chance constraint approach. For the MDP with penalty function approach, we first relax a constraint that significantly reduces the cardinality of the state space and provides a lower bound on the optimal expected weekly cost of collecting whole blood for cryo while satisfying the collection to completion constraint. We then present an action elimination procedure that coupled with the constraint relaxation leads to a computationally tractable lower bound. We then develop several heuristics that generate sub-optimal policies and provide an analytical description of the difference between the upper and lower bounds in order to determine the quality of the heuristics. For the multiple decision epoch MDP model with chance constraint approach, we first note by example that a straightforward application of dynamic programming can lead to a sub-optimal policy. We then restrict the model to a single decision epoch. We then use a computationally tractable rolling horizon procedure for policy determination. We also present a simple greedy heuristic (another rolling horizon decision making procedure) based on ranking the collection intervals by mid-drive pickup cost per unit of expected cryo collected, which results in a competitive sub-optimal solution and leads to the development of a practical decision support tool (DST). Using real data from the American Red Cross (ARC), we estimate that this DST reduces total cost by about 30% for the non-split case and 70% for the split case, compared to the current practice. Initial implementation of the DST at the ARC Southern regional manufacturing and service center supports our estimates and indicates the potential for significant improvement in current practice. Cryoprecipitate Cryo Markov decision processes (MDP)
6	Reinforcement learning by incremental patching Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details. Reinforcement learning. Markov decision processes. Patchwork.
7	Systems Medicine: An Integrated Approach with Decision Making Perspective Faryabi, Babak 14 January 2010 (has links) Two models are proposed to describe interactions among genes, transcription factors, and signaling cascades involved in regulating a cellular sub-system. These models fall within the class of Markovian regulatory networks, and can accommodate for different biological time scales. These regulatory networks are used to study pathological cellular dynamics and discover treatments that beneficially alter those dynamics. The salient translational goal is to design effective therapeutic actions that desirably modify a pathological cellular behavior via external treatments that vary the expressions of targeted genes. The objective of therapeutic actions is to reduce the likelihood of the pathological phenotypes related to a disease. The task of finding effective treatments is formulated as sequential decision making processes that discriminate the gene-expression profiles with high pathological competence versus those with low pathological competence. Thereby, the proposed computational frameworks provide tools that facilitate the discovery of effective drug targets and the design of potent therapeutic actions on them. Each of the proposed system-based therapeutic methods in this dissertation is motivated by practical and analytical considerations. First, it is determined how asynchronous regulatory models can be used as a tool to search for effective therapeutic interventions. Then, a constrained intervention method is introduced to incorporate the side-effects of treatments while searching for a sequence of potent therapeutic actions. Lastly, to bypass the impediment of model inference and to mitigate the numerical challenges of exhaustive search algorithms, a heuristic method is proposed for designing system-based therapies. The presentation of the key ideas in method is facilitated with the help of several case studies. Regulatory Networks Markov Decision Processes Computational Biology
8	Reinforcement learning by incremental patching Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than ???flat??? solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution. In order to improve the agent???s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation. As well as improving the agent???s behaviour, patching is also applied to the agent???s model of the environment. Inaccuracies in the agent???s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data. The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks. These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Complex problems cannot be solved by brute force alone, and some form of approximation is necessary to handle large problems. However, this does not mean that the limitations of approximate solutions must be accepted without question. Patching demonstrates one way in which an agent can leverage approximation techniques without losing the ability to handle fine yet important details. Reinforcement learning. Markov decision processes. Patchwork.
9	Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learning Bai, Yitao 05 April 2024 (has links) We consider a multi-task learning problem, where an agent is presented a number of N reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate O(1/√k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. / Master of Science / First, we introduce a popular machine learning technique called Reinforcement Learning (RL), where an agent, such as a robot, uses a policy to choose an action, like moving forward, based on observations from sensors like cameras. The agent receives a reward that helps judge if the policy is good or bad. The objective of the agent is to find a policy that maximizes the cumulative reward it receives by repeating the above process. RL has many applications, including Cruise autonomous cars, Google industry automation, training ChatGPT language models, and Walmart inventory management. However, RL suffers from task sensitivity and requires a lot of training data. For example, if the task changes slightly, the agent needs to train the policy from the beginning. This motivates the technique called Multi-Task Reinforcement Learning (MTRL), where different tasks give different rewards and the agent maximizes the sum of cumulative rewards of all the tasks. We focus on the incremental setting where the agent can only access the tasks one by one randomly. In this case, we only need one agent and it is not required to know which task it is performing. We show that the incremental policy gradient methods we proposed converge to the optimal value of the MTRL objectives at a sublinear rate O(1/ √ k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. Markov decision processes Multi-task reinforcement learning
10	What if you are not Bayesian? The consequences for decisions involving risk Goodwin, P., Onkal, Dilek, Stekler, H.O. 2017 September 1922 (has links) Yes / Many studies have examined the extent to which individuals’ probability judgments depart from Bayes’ theorem when revising probability estimates in the light of new information. Generally, these studies have not considered the implications of such departures for decisions involving risk. We identify when such departures will occur in two common types of decisions. We then report on two experiments where people were asked to revise their own prior probabilities of a forthcoming economic recession in the light of new information. When the reliability of the new information was independent of the state of nature, people tended to overreact to it if their prior probability was low and underreact if it was high. When it was not independent, they tended to display conservatism. We identify the circumstances where discrepancies in decisions arising from a failure to use Bayes’ theorem were most likely to occur in the decision context we examined. We found that these discrepancies were relatively rare and, typically, were not serious. Decision processes Bayes' theorem Judgmental biases Risk

Search results