Global ETD Search

41	Managing populations in the face of uncertainty: adaptive management, partial observability and the dynamic value of information. Moore, Alana L. January 2008 (has links) The work presented in this thesis falls naturally into two parts. The first part (Chapter 2), is concerned with the benefit of perturbing a population into an immediately undesirable state, in order to improve estimates of a static probability which may improve long-term management. We consider finding the optimal harvest policy for a theoretical harvested population when a key parameter is unknown. We employ an adaptive management framework to study when it is worth sacrificing short term rewards in order to increase long term profits. / Active adaptive management has been increasingly advocated in natural resource management and conservation biology as a methodology for resolving key uncertainties about population dynamics and responses to management. However, when comparing management policies it is traditional to weigh future rewards geometrically (at a constant discount rate) which results in far-distant rewards making a negligible contribution to the total benefit. Under such a discounting scheme active adaptive management is rarely of much benefit, especially if learning is slow. In Chapter 2, we consider two proposed alternative forms of discounting for evaluating optimal policies for long term decisions which have a social component. / We demonstrate that discount functions which weigh future rewards more heavily result in more conservative harvesting strategies, but do not necessarily encourage active learning. Furthermore, the optimal management strategy is not equivalent to employing geometric discounting at a lower rate. If alternative discount functions are made mandatory in calculating optimal management policies for environmental management, then this will affect the structure of optimal management regimes and change when and how much we are willing to invest in learning. / The second part of this thesis is concerned with how to account for partial observability when calculating optimal management policies. We consider the problem of controlling an invasive pest species when only partial observations are available at each time step. In the model considered, the monitoring data available are binomial observations of a probability which is an index of the population size. We are again concerned with estimating a probability, however, in this model the probability is changing over time. / Before including partial observability explicitly, we consider a model in which perfect observations of the population are available at each time step (Chapter 3). It is intuitive that monitoring will be beneficial only if the management decision depends on the outcome. Hence, a necessary condition for monitoring to be worthwhile is that control polices which are specified in terms of the system state, out-perform simpler time-based control policies. Consequently, in addition to providing a benchmark against which we can compare the optimal management policy in the case of partial observations, analysing the perfect observation case also provides insight into when monitoring is likely to be most valuable. / In Chapters 4 and 5 we include partial observability by modelling the control problem as a partially observable Markov decision process (POMDP). We outline several tests which stem from a property of conservation of expected utility under monitoring, which aid in validating the model. We discuss the optimal management policy prescribed by the POMDP for a range of model scenarios, and use simulation to compare the POMDP management policy to several alternative policies, including controlling with perfect observations and no observations. / In Chapter 6 we propose an alternative model, developed in the spirit of a POMDP, that does not strictly satisfy the definition of a POMDP. We find that although the second model has some conceptually appealing attributes, it makes an undesirable implicit assumption about the underlying population dynamics.
42	Design and Analysis of Ambulance Diversion Policies January 2011 (has links) abstract: Overcrowding of Emergency Departments (EDs) put the safety of patients at risk. Decision makers implement Ambulance Diversion (AD) as a way to relieve congestion and ensure timely treatment delivery. However, ineffective design of AD policies reduces the accessibility to emergency care and adverse events may arise. The objective of this dissertation is to propose methods to design and analyze effective AD policies that consider performance measures that are related to patient safety. First, a simulation-based methodology is proposed to evaluate the mean performance and variability of single-factor AD policies in a single hospital environment considering the trade-off between average waiting time and percentage of time spent on diversion. Regression equations are proposed to obtain parameters of AD policies that yield desired performance level. The results suggest that policies based on the total number of patients waiting are more consistent and provide a high precision in predicting policy performance. Then, a Markov Decision Process model is proposed to obtain the optimal AD policy assuming that information to start treatment in a neighboring hospital is available. The model is designed to minimize the average tardiness per patient in the long run. Tardiness is defined as the time that patients have to wait beyond a safety time threshold to start receiving treatment. Theoretical and computational analyses show that there exists an optimal policy that is of threshold type, and diversion can be a good alternative to decrease tardiness when ambulance patients cause excessive congestion in the ED. Furthermore, implementation of AD policies in a simulation model that accounts for several relaxations of the assumptions suggests that the model provides consistent policies under multiple scenarios. Finally, a genetic algorithm is combined with simulation to design effective policies for multiple hospitals simultaneously. The model has the objective of minimizing the time that patients spend in non-value added activities, including transportation, waiting and boarding in the ED. Moreover, the AD policies are combined with simple ambulance destination policies to create ambulance flow control mechanisms. Results show that effective ambulance management can significantly reduce the time that patients have to wait to receive appropriate level of care. / Dissertation/Thesis / Ph.D. Industrial Engineering 2011 Health care management Operations research Ambulance Diversion Discrete-Event Simulation Genetic Algorithms Markov Decision Process
43	Elicitation and planning in Markov decision processes with unknown rewards / Elicitation et planification dans les processus décisionnel de MARKOV avec récompenses inconnues Alizadeh, Pegah 09 December 2016 (has links) Les processus décisionnels de Markov (MDPs) modélisent des problèmes de décisionsséquentielles dans lesquels un utilisateur interagit avec l’environnement et adapte soncomportement en prenant en compte les signaux de récompense numérique reçus. La solutiond’unMDP se ramène à formuler le comportement de l’utilisateur dans l’environnementà l’aide d’une fonction de politique qui spécifie quelle action choisir dans chaque situation.Dans de nombreux problèmes de décision du monde réel, les utilisateurs ont despréférences différentes, donc, les gains de leurs actions sur les états sont différents et devraientêtre re-décodés pour chaque utilisateur. Dans cette thèse, nous nous intéressonsà la résolution des MDPs pour les utilisateurs ayant des préférences différentes.Nous utilisons un modèle nommé MDP à Valeur vectorielle (VMDP) avec des récompensesvectorielles. Nous proposons un algorithme de recherche-propagation qui permetd’attribuer une fonction de valeur vectorielle à chaque politique et de caractériser chaqueutilisateur par un vecteur de préférences sur l’ensemble des fonctions de valeur, où levecteur de préférence satisfait les priorités de l’utilisateur. Etant donné que le vecteurde préférences d’utilisateur n’est pas connu, nous présentons plusieurs méthodes pourrésoudre des MDP tout en approximant le vecteur de préférence de l’utilisateur.Nous introduisons deux algorithmes qui réduisent le nombre de requêtes nécessairespour trouver la politique optimale d’un utilisateur: 1) Un algorithme de recherchepropagation,où nous propageons un ensemble de politiques optimales possibles pourle MDP donné sans connaître les préférences de l’utilisateur. 2) Un algorithme interactifd’itération de la valeur (IVI) sur les MDPs, nommé algorithme d’itération de la valeurbasé sur les avantages (ABVI) qui utilise le clustering et le regroupement des avantages.Nous montrons également comment l’algorithme ABVI fonctionne correctement pourdeux types d’utilisateurs différents: confiant et incertain.Nous travaillons finalement sur une méthode d’approximation par critére de regret minimaxcomme méthode pour trouver la politique optimale tenant compte des informationslimitées sur les préférences de l’utilisateur. Dans ce système, tous les objectifs possiblessont simplement bornés entre deux limites supérieure et inférieure tandis que le systèmeine connaît pas les préférences de l’utilisateur parmi ceux-ci. Nous proposons une méthodeheuristique d’approximation par critère de regret minimax pour résoudre des MDPsavec des récompenses inconnues. Cette méthode est plus rapide et moins complexe queles méthodes existantes dans la littérature. / Markov decision processes (MDPs) are models for solving sequential decision problemswhere a user interacts with the environment and adapts her policy by taking numericalreward signals into account. The solution of an MDP reduces to formulate the userbehavior in the environment with a policy function that specifies which action to choose ineach situation. In many real world decision problems, the users have various preferences,and therefore, the gain of actions on states are different and should be re-decoded foreach user. In this dissertation, we are interested in solving MDPs for users with differentpreferences.We use a model named Vector-valued MDP (VMDP) with vector rewards. We propose apropagation-search algorithm that allows to assign a vector-value function to each policyand identify each user with a preference vector on the existing set of preferences wherethe preference vector satisfies the user priorities. Since the user preference vector is notknown we present several methods for solving VMDPs while approximating the user’spreference vector.We introduce two algorithms that reduce the number of queries needed to find the optimalpolicy of a user: 1) A propagation-search algorithm, where we propagate a setof possible optimal policies for the given MDP without knowing the user’s preferences.2) An interactive value iteration algorithm (IVI) on VMDPs, namely Advantage-basedValue Iteration (ABVI) algorithm that uses clustering and regrouping advantages. Wealso demonstrate how ABVI algorithm works properly for two different types of users:confident and uncertain.We finally work on a minimax regret approximation method as a method for findingthe optimal policy w.r.t the limited information about user’s preferences. All possibleobjectives in the system are just bounded between two higher and lower bounds while thesystem is not aware of user’s preferences among them. We propose an heuristic minimaxregret approximation method for solving MDPs with unknown rewards that is faster andless complex than the existing methods in the literature. Processus décisionnel de Markov Valeur vectorielle MDP Markov decision process Vector-valued MPD Policy iteration Reward elicitation
44	Data-Driven and Game-Theoretic Approaches for Privacy January 2018 (has links) abstract: In the past few decades, there has been a remarkable shift in the boundary between public and private information. The application of information technology and electronic communications allow service providers (businesses) to collect a large amount of data. However, this ``data collection" process can put the privacy of users at risk and also lead to user reluctance in accepting services or sharing data. This dissertation first investigates privacy sensitive consumer-retailers/service providers interactions under different scenarios, and then focuses on a unified framework for various information-theoretic privacy and privacy mechanisms that can be learned directly from data. Existing approaches such as differential privacy or information-theoretic privacy try to quantify privacy risk but do not capture the subjective experience and heterogeneous expression of privacy-sensitivity. The first part of this dissertation introduces models to study consumer-retailer interaction problems and to better understand how retailers/service providers can balance their revenue objectives while being sensitive to user privacy concerns. This dissertation considers the following three scenarios: (i) the consumer-retailer interaction via personalized advertisements; (ii) incentive mechanisms that electrical utility providers need to offer for privacy sensitive consumers with alternative energy sources; (iii) the market viability of offering privacy guaranteed free online services. We use game-theoretic models to capture the behaviors of both consumers and retailers, and provide insights for retailers to maximize their profits when interacting with privacy sensitive consumers. Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. In the second part, a novel context-aware privacy framework called generative adversarial privacy (GAP) is introduced. Inspired by recent advancements in generative adversarial networks, GAP allows the data holder to learn the privatization mechanism directly from the data. Under GAP, finding the optimal privacy mechanism is formulated as a constrained minimax game between a privatizer and an adversary. For appropriately chosen adversarial loss functions, GAP provides privacy guarantees against strong information-theoretic adversaries. Both synthetic and real-world datasets are used to show that GAP can greatly reduce the adversary's capability of inferring private information at a small cost of distorting the data. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Electrical engineering Game Theory Generative Adversarial Networks Information Theory Markov Decision Process Privacy Retailer-Consumer Interaction
45	Decision-making algorithms for autonomous robots / Algorithmes de prise de décision stratégique pour robots autonomes Hofer, Ludovic 27 November 2017 (has links) Afin d'être autonomes, les robots doivent êtres capables de prendre des décisions en fonction des informations qu'ils perçoivent de leur environnement. Cette thèse modélise les problèmes de prise de décision robotique comme des processus de décision markoviens avec un espace d'état et un espace d'action tous deux continus. Ce choix de modélisation permet de représenter les incertitudes sur le résultat des actions appliquées par le robot. Les nouveaux algorithmes d'apprentissage présentés dans cette thèse se focalisent sur l'obtention de stratégies applicables dans un domaine embarqué. Ils sont appliqués à deux problèmes concrets issus de la RoboCup, une compétition robotique internationale annuelle. Dans ces problèmes, des robots humanoïdes doivent décider de la puissance et de la direction de tirs afin de maximiser les chances de marquer et contrôler la commande d'une primitive motrice pour préparer un tir. / The autonomy of robots heavily relies on their ability to make decisions based on the information provided by their sensors. In this dissertation, decision-making in robotics is modeled as continuous state and action markov decision process. This choice allows modeling of uncertainty on the results of the actions chosen by the robots. The new learning algorithms proposed in this thesis focus on producing policies which can be used online at a low computational cost. They are applied to real-world problems in the RoboCup context, an international robotic competition held annually. In those problems, humanoid robots have to choose either the direction and power of kicks in order to maximize the probability of scoring a goal or the parameters of a walk engine to move towards a kickable position. Processus de décision markovien Robotique autonome Apprentissage Markov decision process Autonomous robotics Machine learning
46	On Hierarchical Goal Based Reinforcement Learning Denis, Nicholas 27 August 2019 (has links) Discrete time sequential decision processes require that an agent select an action at each time step. As humans, we plan over long time horizons and use temporal abstraction by selecting temporally extended actions such as “make lunch” or “get a masters degree”, whereby each is comprised of more granular actions. This thesis concerns itself with such hierarchical temporal abstractions in the form of macro actions and options, as they apply to goal-based Markov Decision Processes. A novel algorithm for discovering hierarchical macro actions in goal-based MDPs, as well as a novel algorithm utilizing landmark options for transfer learning in multi-task goal- based reinforcement learning settings are introduced. Theoretical properties regarding the life-long regret of an agent executing the latter algorithm are also discussed. Markov decision process Reinforcement learning Options framework Temporal abstraction Macro actions
47	Optimal Mammography Schedule Estimates Under Varying Disease Burden, Infrastructure Availability, and Other Cause Mortality: A Comparative Analyses of Six Low- and Middle- Income Countries Shifali, Shifali 18 December 2020 (has links) Low-and-middle-income countries (LMICs) have a higher mortality-to-incidence ratio for breast cancer compared to high-income countries (HICs) because of late-stage diagnosis. Mammography screening is recommended for early diagnosis, however, current screening guidelines are only generalized by economic disparities, and are based on extrapolation of data from randomized controlled trials in HICs, which have different disease burdens and all-cause mortality compared to LMICs. Moreover, the infrastructure capacity in LMICs is far below that needed for adopting current screening guidelines. This study analyzes the impact of disease burden, infrastructure availability, and other cause mortality on optimal mammography screening schedules for LMICs. Further, these key features are analyzed under the context of overdiagnosis, epidemiologic/clinical uncertainty in pathways of the initial stage of cancer, and variability in technological availability for diagnosis and treatment. It uses a Markov decision process (MDP) model to estimate optimal schedules under varying assumptions of resource availability, applying it to six LMICs. Results suggest that screening schedules should change with disease burden and life-expectancy. For countries with similar life-expectancy but different disease burden, the model suggests to screen age groups with higher incidence rates. For countries with similar incidence rate and different life expectancy, the model suggests to screen younger age groups for countries with lower life-expectancy. Overdiagnosis and differences in screening technology had minimal impact on optimal schedules. Optimality of screening schedules were sensitive to epidemiologic/clinical uncertainty. Results from this study suggest that, instead of generalized screening schedules, those tailored to disease burden and infrastructure capacity could help optimize resources. Results from this study can help inform current screening guidelines and future health investment plans. Markov decision process Public health policy making Optimal screening schedules Industrial Engineering Operational Research
48	Increasing the Value of Information During Planning in Uncertain Environments Pokharel, Gaurab 30 July 2021 (has links) No description available. Artificial Intelligence Computer Science POMDPs Artificial Intelligence Computer Science Planning Uncertainty Domain Markov Decision Process POMCP
49	Plánování cesty robota pomocí dynamického programování / Robot path planning by means of dynamic programming Stárek, Ivo January 2009 (has links) This work is dedicated to robot path planning with using principles of dynamic programing in discrete state space. Theoretical part is dedicated to actual situation in this field and to principle of applying Markov decission process to path planning. Practical part is dedicated to implementation of two algorithms based on MDP principles.
50	Performance analysis of access control and resource management methods in heterogeneous networks Pacheco Páramo, Diego Felipe 07 January 2014 (has links) El escenario actual de las redes móviles se caracteriza por la creciente demanda de los usuarios por los servicios de datos, circunstancia que se ha visto potenciada por la popularidad de los teléfonos inteligentes y el auge de aplicaciones que necesitan de una conexión permanente a internet, como aquellas que hacen uso de recursos "en la nube" o los servicios de streaming para vídeo. El consumo de datos crece exponencialmente, tanto para los países desarrollados como en los países en desarrollo, y esto ha llevado a los operadores a plantearse soluciones que permitan proveer dichas condiciones de acceso. Las redes heterogéneas se caracterizan por utilizar diferentes tecnologías de una manera coherente y organizada para proveer a los usuarios con la calidad de servicio requerida en sus comunicaciones, de tal manera que la comunicación sea para estos "transparente". Dicha heterogeneidad se puede dar a nivel de acceso, con la coexistencia de tecnologías como 802.11, WiMAX y redes móviles en sus diferentes generaciones, o incluso a nivel de capas dentro de las redes móviles con la coexistencia de macro, micro, pico y femto celdas. Haciendo un uso organizado de estos múltiples recursos, es posible optimizar las prestaciones de la red y proveer a los usuarios con una mejor calidad de servicio. Pero la posibilidad de mejorar las prestaciones de la red no se da sólo por el uso simultáneo de estas tecnologías de acceso. Para mejorar la eficiencia en el uso del espectro electromagnético, un recurso limitado y subutilizado según diferentes estudios, se propuso la tecnología de cognitive radio. Por medio de esta tecnología es posible que un usuario sea capaz de medir el instante en el que una parte del espectro electromagnético no esta siendo utilizado para enviar información, siempre evitando interferir en las comunicaciones de aquellos usuarios que usan dicho espectro regularmente. En el presente trabajo se proveen diferentes soluciones dentro del contexto de las redes heterogéneas que buscan optimizar el uso de los recursos disponibles en la red para proveer a los usuarios con la calidad de servicio esperada, ya sea por medio del control de acceso o la gestión de recursos. Por un lado se estudia el efecto que la reserva de canales para realizar handoff espectral tiene sobre las prestaciones para los usuarios secundarios en un sistema de cognitive radio. Por otro lado se estudian políticas de acceso para una red en la que dos tecnologías de acceso están disponibles: TDMA y WCDMA, y los usuarios tienen acceso a los servicios de voz y datos. Por otro lado / Performance requirements on mobile networks are tighter than ever as a result of the adoption of mobile devices such as smartphones or tablets, and the QoS levels that mobile applications demand for their correct operation. The data traffic volume carried in mobile networks for 2012 is the same as the total internet traffic in 2000, and this exponential growth tendency will continue in years to come. In order to fulfill users¿ expectations, it is imperative for mobile networks to make the best use of the available resources. Heterogeneous networks (Hetnets) have the ability to integrate several technologies in a coherent and efficient manner in order to enhance users¿ experience. The first challenge of heterogeneous networks is to integrate several radio access technologies, which exist as a result of simultaneous technology developments and a paced replacement of legacy technology. A joint management of several RAT¿s enhances network¿s efficiency, and this influences user¿s experience. Another challenge of heterogeneous networks is the improvement of current macrocells through an efficient use of the electromagnetic spectrum. Some approaches aim to optimize the antennas or use higher-order modulation techniques, but a more disruptive approach is the use of dynamic spectrum techniques through a technology known as cognitive radio. Finally, heterogeneous networks should be able to integrate several layers. In addition to the well studied micro and pico cells, a new generation of cheaper and easily configurable small cell networks have been proposed. However, its success is attached to its ability to adapt to the current context of mobile networks. / Pacheco Páramo, DF. (2013). Performance analysis of access control and resource management methods in heterogeneous networks [Tesis doctoral]. Editorial Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/34782 / Alfresco Heterogeneous networks Resource managament Markov decision Process Wireless networks INGENIERIA TELEMATICA

Search results