Global ETD Search

101	Planejamento probabilístico sensível a risco com ILAO* e função utilidade exponencial / Probabilistic risk-sensitive planning with ILAO* and exponential utility function Freitas, Elthon Manhas de 18 October 2018 (has links) Os processos de decisão de Markov (Markov Decision Process - MDP) têm sido usados para resolução de problemas de tomada de decisão sequencial. Existem problemas em que lidar com os riscos do ambiente para obter um resultado confiável é mais importante do que maximizar o retorno médio esperado. MDPs que lidam com esse tipo de problemas são chamados de processos de decisão de Markov sensíveis a risco (Risk-Sensitive Markov Decision Process - RSMDP). Dentre as diversas variações de RSMDP, estão os trabalhos baseados em utilidade exponencial que utilizam um fator de risco, o qual modela a atitude a risco do agente e que pode ser propensa ou aversa. Os algoritmos existentes na literatura para resolver esse tipo de RSMDPs são ineficientes se comparados a outros algoritmos de MDP. Neste projeto, é apresentada uma solução que pode ser usada em problemas maiores, tanto por executar cálculos apenas em estados relevantes para atingir um conjunto de estados meta partindo de um estado inicial, quanto por permitir processamento de números com expoentes muito elevados para os ambientes computacionais atuais. Os experimentos realizados evidenciam que (i) o algoritmo proposto é mais eficiente, se comparado aos algoritmos estado-da-arte para RSMDPs; e (ii) o uso da técnica LogSumExp permite resolver o problema de trabalhar com expoentes muito elevados em RSMDPs. / Markov Decision Process (MDP) has been used very efficiently to solve sequential decision-making problems. There are problems where dealing with environmental risks to get a reliable result is more important than maximizing the expected average return. MDPs that deal with this type of problem are called risk-sensitive Markov decision processes (RSMDP). Among the several variations of RSMDP are the works based on exponential utility that use a risk factor, which models the agent\'s risk attitude that can be prone or averse. The algorithms in the literature to solve this type of RSMDPs are inefficient when compared to other MDP algorithms. In this project, a solution is presented that can be used in larger problems, either by performing calculations only in relevant states to reach a set of meta states starting from an initial state, or by allowing the processing of numbers with very high exponents for the current computational environments. The experiments show that (i) the proposed algorithm is more efficient when compared to state-of-the-art algorithms for RSMDPs; and (ii) the LogSumExp technique solves the problem of working with very large exponents in RSMDPs Aversão a risco Exponential utility Markov decision process Planejamento probabilístico Probabilistic planning Processos de decisão de Markov Risk averse Risk sensitive Sensibilidade a risco Utilidade exponencial
102	Planejamento probabilístico usando programação dinâmica assíncrona e fatorada / Probabilistic planning using asynchronous and factored dynamic programming. Holguin, Mijail Gamarra 03 April 2013 (has links) Processos de Decisão Markovianos (Markov Decision Process - MDP) modelam problemas de tomada de decisão sequencial em que as possíveis ações de um agente possuem efeitos probabilísticos sobre os estados sucessores (que podem ser definidas por matrizes de transição de estados). Programação dinâmica em tempo real (Real-time dynamic programming - RTDP), é uma técnica usada para resolver MDPs quando existe informação sobre o estado inicial. Abordagens tradicionais apresentam melhor desempenho em problemas com matrizes esparsas de transição de estados porque podem alcançar eficientemente a convergência para a política ótima, sem ter que visitar todos os estados. Porém essa vantagem pode ser perdida em problemas com matrizes densas de transição, nos quais muitos estados podem ser alcançados em um passo (por exemplo, problemas de controle com eventos exógenos). Uma abordagem para superar essa limitação é explorar regularidades existentes na dinâmica do domínio através de uma representação fatorada, isto é, uma representação baseada em variáveis de estado. Nesse trabalho de mestrado, propomos um novo algoritmo chamado de FactRTDP (RTDP Fatorado), e sua versão aproximada aFactRTDP (RTDP Fatorado e Aproximado), que é a primeira versão eficiente fatorada do algoritmo clássico RTDP. Também propomos outras 2 extensões desses algoritmos, o FactLRTDP e aFactLRTDP, que rotulam estados cuja função valor convergiu para o ótimo. Os resultados experimentais mostram que estes novos algoritmos convergem mais rapidamente quando executados em domínios com matrizes de transição densa e tem bom comportamento online em domínios com matrizes de transição densa com pouca dependência entre as variáveis de estado. / Markov Decision Process (MDP) model problems of sequential decision making, where the possible actions have probabilistic effects on the successor states (defined by state transition matrices). Real-time dynamic programming (RTDP), is a technique for solving MDPs when there exists information about the initial state. Traditional approaches show better performance in problems with sparse state transition matrices, because they can achieve the convergence to optimal policy efficiently, without visiting all states. But, this advantage can be lose in problems with dense state transition matrices, in which several states can be achieved in a step (for example, control problems with exogenous events). An approach to overcome this limitation is to explore regularities existing in the domain dynamics through a factored representation, i.e., a representation based on state variables. In this master thesis, we propose a new algorithm called FactRTDP (Factored RTDP), and its approximate version aFactRTDP (Approximate and Factored RTDP), that are the first factored efficient versions of the classical RTDP algorithm. We also propose two other extensions, FactLRTDP and aFactLRTDP, that label states for which the value function has converged to the optimal. The experimental results show that when these new algorithms are executed in domains with dense transition matrices, they converge faster. And they have a good online performance in domains with dense transition matrices and few dependencies among state variables. Approximate Reasoning. Markov Decision Process Planejamento Probabilístico Probabilistic Planning Processo de Decisão Markoviano Programação Dinâmica em Tempo Real Raciocínio Aproximado. Real-Time Dynamic Programming
103	Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos. / Markov decision processes with imprecise probabilities and relational representations: foundations and algorithms. Shirota Filho, Ricardo 03 May 2012 (has links) Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL). / This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL). Algorithm Algoritmos Foundations Fundamentos Imprecise probabilities Markov decision process Probabilidades imprecisas Processo de decisão Markoviano Relational representations Representações relacionais Sequential decision making Tomada de decisão sequencial
104	Automated Hierarchy Discovery for Planning in Partially Observable Domains Charlin, Laurent January 2006 (has links) Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems which, can then be solved independently of one another. Several approaches, mainly dealing with fully observable domains, have been proposed to optimize a plan that decomposes according to a hierarchy specified a priori. Some researchers have also proposed to discover hierarchies in fully observable domains. In this thesis, we investigate the problem of automatically discovering planning hierarchies in partially observable domains. The main advantage of discovering hierarchies is that, for a plan of a fixed size, hierarchical plans can be more expressive than non-hierarchical ones. Our solution frames the discovery and optimization of a hierarchical policy as a non-convex optimization problem. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Successfully solving the optimization problem therefore yields an optimal hierarchy and an optimal policy. We describe several techniques to solve the optimization problem. Namely, we provide results using general non-linear solvers, mixed-integer linear and non-linear solvers or a form of bounded hierarchical policy iteration. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters). Computer Science Artificial Intelligence Reasoning under uncertainty Decision Making Planning Markov Decision Process Partially Observable Hierarchy Discovery Hierarchical Policy Optimization Computer Science
105	Automated Hierarchy Discovery for Planning in Partially Observable Domains Charlin, Laurent January 2006 (has links) Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems which, can then be solved independently of one another. Several approaches, mainly dealing with fully observable domains, have been proposed to optimize a plan that decomposes according to a hierarchy specified a priori. Some researchers have also proposed to discover hierarchies in fully observable domains. In this thesis, we investigate the problem of automatically discovering planning hierarchies in partially observable domains. The main advantage of discovering hierarchies is that, for a plan of a fixed size, hierarchical plans can be more expressive than non-hierarchical ones. Our solution frames the discovery and optimization of a hierarchical policy as a non-convex optimization problem. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Successfully solving the optimization problem therefore yields an optimal hierarchy and an optimal policy. We describe several techniques to solve the optimization problem. Namely, we provide results using general non-linear solvers, mixed-integer linear and non-linear solvers or a form of bounded hierarchical policy iteration. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters). Computer Science Artificial Intelligence Reasoning under uncertainty Decision Making Planning Markov Decision Process Partially Observable Hierarchy Discovery Hierarchical Policy Optimization Computer Science
106	Efficient Partially Observable Markov Decision Process Based Formulation Of Gene Regulatory Network Control Problem Erdogdu, Utku 01 April 2012 (has links) (PDF) The need to analyze and closely study the gene related mechanisms motivated the research on the modeling and control of gene regulatory networks (GRN). Dierent approaches exist to model GRNs / they are mostly simulated as mathematical models that represent relationships between genes. Though it turns into a more challenging problem, we argue that partial observability would be a more natural and realistic method for handling the control of GRNs. Partial observability is a fundamental aspect of the problem / it is mostly ignored and substituted by the assumption that states of GRN are known precisely, prescribed as full observability. On the other hand, current works addressing partially observability focus on formulating algorithms for the nite horizon GRN control problem. So, in this work we explore the feasibility of realizing the problem in a partially observable setting, mainly with Partially Observable Markov Decision Processes (POMDP). We proposed a POMDP formulation for the innite horizon version of the problem. Knowing the fact that POMDP problems suer from the curse of dimensionality, we also proposed a POMDP solution method that automatically decomposes the problem by isolating dierent unrelated parts of the problem, and then solves the reduced subproblems. We also proposed a method to enrich gene expression data sets given as input to POMDP control task, because in available data sets there are thousands of genes but only tens or rarely hundreds of samples. The method is based on the idea of generating more than one model using the available data sets, and then sampling data from each of the models and nally ltering the generated samples with the help of metrics that measure compatibility, diversity and coverage of the newly generated samples.
107	On the control of airport departure operations. Burgain, Pierrick Antoine 15 November 2010 (has links) This thesis is focused on airport departure operations; its objective is to assign a value to surface surveillance information within a collaborative framework. The research develops a cooperative concept that improves the control of departure operations at busy airports and evaluates its merit using a classical and widely accepted airport departure model. The research then assumes departure operations are collaboratively controlled and develops a stochastic model of taxi operations on the airport surface. Finally, this study investigates the effect of feeding back different levels of surface surveillance information to the departure control process. More specifically, it examines the environmental and operational impact of aircraft surface location information on the taxi clearance process. Benefits are evaluated by measuring and comparing engine emissions for given runway utilization rates. Taxiway Airport Operations Markov Markov decision process Aviation Air transportation Optimization Departure Queue Collaborative decision making Virtual queuing Pushback Markov processes Stochastic models Mathematical models
108	A reinforcement learning approach to obtain treatment strategies in sequential medical decision problems [electronic resource] / by Radhika Poolla. Poolla, Radhika. January 2003 (has links) Title from PDF of title page. / Document formatted into pages; contains 104 pages. / Thesis (M.S.I.E.)--University of South Florida, 2003. / Includes bibliographical references. / Text (Electronic thesis) in PDF format. / ABSTRACT: Medical decision problems are extremely complex owing to their dynamic nature, large number of variable factors, and the associated uncertainty. Decision support technology entered the medical field long after other areas such as the airline industry and the manufacturing industry. Yet, it is rapidly becoming an indispensable tool in medical decision making problems including the class of sequential decision problems. In these problems, physicians decide on a treatment plan that optimizes a benefit measure such as the treatment cost, and the quality of life of the patient. The last decade saw the emergence of many decision support applications in medicine. However, the existing models have limited applications to decision problems with very few states and actions. An urgent need is being felt by the medical research community to expand the applications to more complex dynamic problems with large state and action spaces. / ABSTRACT: This thesis proposes a methodology which models the class of sequential medical decision problems as a Markov decision process, and solves the model using a simulation based reinforcement learning (RL) algorithm. Such a methodology is capable of obtaining near optimal treatment strategies for problems with large state and action spaces. This methodology overcomes, to a large extent, the computational complexity of the value-iteration and policy-iteration algorithms of dynamic programming. An average reward reinforcement-learning algorithm is developed. The algorithm is applied on a sample problem of treating hereditary spherocytosis. The application demonstrates the ability of the proposed methodology to obtain effective treatment strategies for sequential medical decision problems. / System requirements: World Wide Web browser and PDF reader. / Mode of access: World Wide Web. quality adjusted life years. intervention. dynamic decision model. markov decision process. hereditory spherocytosis. average reward.
109	Multimedia Delivery over Heterogeneous Wireless Networks Xing, Min 29 April 2015 (has links) There is an increasing demand for multimedia services in heterogeneous wireless networks. Considering the highly dynamic wireless channels and the relatively large size of the multimedia data, how to support efficient and reliable multimedia delivery is a pressing issue. In this dissertation, we investigate the multimedia delivery algorithms in heterogeneous wireless networks from three different aspects. First, we study the single-flow rate adaptation of video streaming algorithm over multiple wireless interfaces. In order to maintain high video streaming quality while reducing the wireless service cost, the optimal video streaming process with multiple links is formulated as a Markov Decision Process (MDP). The reward function is designed to consider the quality of service (QoS) requirements for video traffic, such as the startup latency, playback fluency, average playback quality, playback smoothness and wireless service cost. To solve the MDP in real time, we propose an adaptive, best-action search algorithm to obtain a sub-optimal solution. To evaluate the performance of the proposed adaptation algorithm, we implemented a testbed using the Android mobile phone and the Scalable Video Coding (SVC) codec and conducted experiments with real video flow. Then, with the multiple multimedia flows competing for limited wireless resources, we propose a utility-based scheduling algorithm for multimedia transmission in Drive-thru Internet. A utility model is devised to map the throughput to user's satisfaction level in terms of multimedia data quality, such as Peak Signal-to-Noise Ratio (PSNR) of video. The objective of the scheduling problem is to maximize the total utility. Then the optimization problem is formulated as a finite-state decision problem with the assumption that future arrival information is known, and it is solved by a searching algorithm as the benchmark. To obtain a real-time solution, a practical heuristic algorithm based on the concept of utility potential is devised. We further implemented the solution and conducted extensive simulations using NS-3. Finally, the multimedia dissemination problem in large-scale VANETs is investigated. We first utilize a hybrid-network framework to address the mobility and scalability issues in large-scale VANETs content distribution. Then, we formulate a utility-based maximization problem to find the best delivery strategy and select an optimal path for the multimedia data dissemination, where the utility function has taken the delivery delay, the Quality of Services (QoS) and the storage cost into consideration. We obtain the closed-form of the utility function, and then obtain the optimal solution of the problem with the convex optimization theory. Finally, we conducted extensive trace-driven simulations to evaluate the performance of the proposed algorithm with real traces collected by taxis in Shanghai. In summary, the research outcomes of the dissertation can contribute to three different aspects of multimedia delivery in heterogeneous wireless networks. First, we have proposed a real-time rate adaptation algorithm for video streaming with multiple wireless interfaces, to maintain the high quality while reducing the wireless services cost. Second, we have presented an optimal scheduling algorithm which can maximize the total satisfaction for multimedia transmission in Drive-thru Internet. Third, we have derived the theoretical analysis of the utility functions including delivery delay, QoS and the storage cost, and have obtained an optimal solution for multimedia data dissemination in large-scale VANETs to achieve the highest utility. / Graduate / 0984 / 0544 DASH Markov decision process Video Streaming Multiple links Multimedia transmission Drive-thru Internet VANET Delay-tolerant Networks Multimedia dissemination Utility Maximization
110	Contributions to Simulation-based High-dimensional Sequential Decision Making Hoock, Jean-Baptiste 10 April 2013 (has links) (PDF) My thesis is entitled "Contributions to Simulation-based High-dimensional Sequential Decision Making". The context of the thesis is about games, planning and Markov Decision Processes. An agent interacts with its environment by successively making decisions. The agent starts from an initial state until a final state in which the agent can not make decision anymore. At each timestep, the agent receives an observation of the state of the environment. From this observation and its knowledge, the agent makes a decision which modifies the state of the environment. Then, the agent receives a reward and a new observation. The goal is to maximize the sum of rewards obtained during a simulation from an initial state to a final state. The policy of the agent is the function which, from the history of observations, returns a decision. We work in a context where (i) the number of states is huge, (ii) reward carries little information, (iii) the probability to reach quickly a good final state is weak and (iv) prior knowledge is either nonexistent or hardly exploitable. Both applications described in this thesis present these constraints : the game of Go and a 3D simulator of the european project MASH (Massive Sets of Heuristics). In order to take a satisfying decision in this context, several solutions are brought : 1. Simulating with the compromise exploration/exploitation (MCTS) 2. Reducing the complexity by local solving (GoldenEye) 3. Building a policy which improves itself (RBGP) 4. Learning prior knowledge (CluVo+GMCTS) Monte-Carlo Tree Search (MCTS) is the state of the art for the game of Go. From a model of the environment, MCTS builds incrementally and asymetrically a tree of possible futures by performing Monte-Carlo simulations. The tree starts from the current observation of the agent. The agent switches between the exploration of the model and the exploitation of decisions which statistically give a good cumulative reward. We discuss 2 ways for improving MCTS : the parallelization and the addition of prior knowledge. The parallelization does not solve some weaknesses of MCTS; in particular some local problems remain challenges. We propose an algorithm (GoldenEye) which is composed of 2 parts : detection of a local problem and then its resolution. The algorithm of resolution reuses some concepts of MCTS and it solves difficult problems of a classical database. The addition of prior knowledge by hand is laborious and boring. We propose a method called Racing-based Genetic Programming (RBGP) in order to add automatically prior knowledge. The strong point is that RBGP rigorously validates the addition of a prior knowledge and RBGP can be used for building a policy (instead of only optimizing an algorithm). In some applications such as MASH, simulations are too expensive in time and there is no prior knowledge and no model of the environment; therefore Monte-Carlo Tree Search can not be used. So that MCTS becomes usable in this context, we propose a method for learning prior knowledge (CluVo). Then we use pieces of prior knowledge for improving the rapidity of learning of the agent and for building a model, too. We use from this model an adapted version of Monte-Carlo Tree Search (GMCTS). This method solves difficult problems of MASH and gives good results in an application to a word game. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Monte Carlo Tree Search Learning from simulations Games Planning Markov decision process MoGo MASH

Search results