Spelling suggestions: "subject:"markov decision canprocess"" "subject:"markov decision 3.3vprocess""
101 |
Planejamento probabilístico usando programação dinâmica assíncrona e fatorada / Probabilistic planning using asynchronous and factored dynamic programming.Holguin, Mijail Gamarra 03 April 2013 (has links)
Processos de Decisão Markovianos (Markov Decision Process - MDP) modelam problemas de tomada de decisão sequencial em que as possíveis ações de um agente possuem efeitos probabilísticos sobre os estados sucessores (que podem ser definidas por matrizes de transição de estados). Programação dinâmica em tempo real (Real-time dynamic programming - RTDP), é uma técnica usada para resolver MDPs quando existe informação sobre o estado inicial. Abordagens tradicionais apresentam melhor desempenho em problemas com matrizes esparsas de transição de estados porque podem alcançar eficientemente a convergência para a política ótima, sem ter que visitar todos os estados. Porém essa vantagem pode ser perdida em problemas com matrizes densas de transição, nos quais muitos estados podem ser alcançados em um passo (por exemplo, problemas de controle com eventos exógenos). Uma abordagem para superar essa limitação é explorar regularidades existentes na dinâmica do domínio através de uma representação fatorada, isto é, uma representação baseada em variáveis de estado. Nesse trabalho de mestrado, propomos um novo algoritmo chamado de FactRTDP (RTDP Fatorado), e sua versão aproximada aFactRTDP (RTDP Fatorado e Aproximado), que é a primeira versão eficiente fatorada do algoritmo clássico RTDP. Também propomos outras 2 extensões desses algoritmos, o FactLRTDP e aFactLRTDP, que rotulam estados cuja função valor convergiu para o ótimo. Os resultados experimentais mostram que estes novos algoritmos convergem mais rapidamente quando executados em domínios com matrizes de transição densa e tem bom comportamento online em domínios com matrizes de transição densa com pouca dependência entre as variáveis de estado. / Markov Decision Process (MDP) model problems of sequential decision making, where the possible actions have probabilistic effects on the successor states (defined by state transition matrices). Real-time dynamic programming (RTDP), is a technique for solving MDPs when there exists information about the initial state. Traditional approaches show better performance in problems with sparse state transition matrices, because they can achieve the convergence to optimal policy efficiently, without visiting all states. But, this advantage can be lose in problems with dense state transition matrices, in which several states can be achieved in a step (for example, control problems with exogenous events). An approach to overcome this limitation is to explore regularities existing in the domain dynamics through a factored representation, i.e., a representation based on state variables. In this master thesis, we propose a new algorithm called FactRTDP (Factored RTDP), and its approximate version aFactRTDP (Approximate and Factored RTDP), that are the first factored efficient versions of the classical RTDP algorithm. We also propose two other extensions, FactLRTDP and aFactLRTDP, that label states for which the value function has converged to the optimal. The experimental results show that when these new algorithms are executed in domains with dense transition matrices, they converge faster. And they have a good online performance in domains with dense transition matrices and few dependencies among state variables.
|
102 |
Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos. / Markov decision processes with imprecise probabilities and relational representations: foundations and algorithms.Shirota Filho, Ricardo 03 May 2012 (has links)
Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL). / This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL).
|
103 |
Automated Hierarchy Discovery for Planning in Partially Observable DomainsCharlin, Laurent January 2006 (has links)
Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems which, can then be solved independently of one another. Several approaches, mainly dealing with fully observable domains, have been proposed to optimize a plan that decomposes according to a hierarchy specified a priori. Some researchers have also proposed to discover hierarchies in fully observable domains.
In this thesis, we investigate the problem of automatically discovering planning hierarchies in partially observable domains. The main advantage of discovering hierarchies is that, for a plan of a fixed size, hierarchical plans can be more expressive than non-hierarchical ones.
Our solution frames the discovery and optimization of a hierarchical policy as a non-convex optimization problem. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Successfully solving the optimization problem therefore yields an optimal hierarchy and an optimal policy. We describe several techniques to solve the optimization problem. Namely, we provide results using general non-linear solvers, mixed-integer linear and non-linear solvers or a form of bounded hierarchical policy iteration. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters).
|
104 |
Automated Hierarchy Discovery for Planning in Partially Observable DomainsCharlin, Laurent January 2006 (has links)
Planning in partially observable domains is a notoriously difficult problem. However, in many real-world scenarios, planning can be simplified by decomposing the task into a hierarchy of smaller planning problems which, can then be solved independently of one another. Several approaches, mainly dealing with fully observable domains, have been proposed to optimize a plan that decomposes according to a hierarchy specified a priori. Some researchers have also proposed to discover hierarchies in fully observable domains.
In this thesis, we investigate the problem of automatically discovering planning hierarchies in partially observable domains. The main advantage of discovering hierarchies is that, for a plan of a fixed size, hierarchical plans can be more expressive than non-hierarchical ones.
Our solution frames the discovery and optimization of a hierarchical policy as a non-convex optimization problem. By encoding the hierarchical structure as variables of the optimization problem, we can automatically discover a hierarchy. Successfully solving the optimization problem therefore yields an optimal hierarchy and an optimal policy. We describe several techniques to solve the optimization problem. Namely, we provide results using general non-linear solvers, mixed-integer linear and non-linear solvers or a form of bounded hierarchical policy iteration. Our method is flexible enough to allow any parts of the hierarchy to be specified based on prior knowledge while letting the optimization discover the unknown parts. It can also discover hierarchical policies, including recursive policies, that are more compact (potentially infinitely fewer parameters).
|
105 |
Efficient Partially Observable Markov Decision Process Based Formulation Of Gene Regulatory Network Control ProblemErdogdu, Utku 01 April 2012 (has links) (PDF)
The need to analyze and closely study the gene related mechanisms motivated the
research on the modeling and control of gene regulatory networks (GRN). Dierent
approaches exist to model GRNs / they are mostly simulated as mathematical models
that represent relationships between genes. Though it turns into a more challenging
problem, we argue that partial observability would be a more natural and realistic
method for handling the control of GRNs. Partial observability is a fundamental
aspect of the problem / it is mostly ignored and substituted by the assumption that
states of GRN are known precisely, prescribed as full observability. On the other hand,
current works addressing partially observability focus on formulating algorithms for
the nite horizon GRN control problem. So, in this work we explore the feasibility of
realizing the problem in a partially observable setting, mainly with Partially Observable
Markov Decision Processes (POMDP). We proposed a POMDP formulation for
the innite horizon version of the problem. Knowing the fact that POMDP problems
suer from the curse of dimensionality, we also proposed a POMDP solution method
that automatically decomposes the problem by isolating dierent unrelated parts of
the problem, and then solves the reduced subproblems. We also proposed a method
to enrich gene expression data sets given as input to POMDP control task, because
in available data sets there are thousands of genes but only tens or rarely hundreds of
samples. The method is based on the idea of generating more than one model using
the available data sets, and then sampling data from each of the models and nally
ltering the generated samples with the help of metrics that measure compatibility,
diversity and coverage of the newly generated samples.
|
106 |
On the control of airport departure operations.Burgain, Pierrick Antoine 15 November 2010 (has links)
This thesis is focused on airport departure operations; its objective is to assign a value to surface surveillance information within a collaborative framework.
The research develops a cooperative concept that improves the control of departure operations at busy airports and evaluates its merit using a classical and widely accepted airport departure model. The research then assumes departure operations are collaboratively controlled and develops a stochastic model of taxi operations on the airport surface. Finally, this study investigates the effect of feeding back different levels of surface surveillance information to the departure control process. More specifically, it examines the environmental and operational impact of aircraft surface location information on the taxi clearance process. Benefits are evaluated by measuring and comparing engine emissions for given runway utilization rates.
|
107 |
A reinforcement learning approach to obtain treatment strategies in sequential medical decision problems [electronic resource] / by Radhika Poolla.Poolla, Radhika. January 2003 (has links)
Title from PDF of title page. / Document formatted into pages; contains 104 pages. / Thesis (M.S.I.E.)--University of South Florida, 2003. / Includes bibliographical references. / Text (Electronic thesis) in PDF format. / ABSTRACT: Medical decision problems are extremely complex owing to their dynamic nature, large number of variable factors, and the associated uncertainty. Decision support technology entered the medical field long after other areas such as the airline industry and the manufacturing industry. Yet, it is rapidly becoming an indispensable tool in medical decision making problems including the class of sequential decision problems. In these problems, physicians decide on a treatment plan that optimizes a benefit measure such as the treatment cost, and the quality of life of the patient. The last decade saw the emergence of many decision support applications in medicine. However, the existing models have limited applications to decision problems with very few states and actions. An urgent need is being felt by the medical research community to expand the applications to more complex dynamic problems with large state and action spaces. / ABSTRACT: This thesis proposes a methodology which models the class of sequential medical decision problems as a Markov decision process, and solves the model using a simulation based reinforcement learning (RL) algorithm. Such a methodology is capable of obtaining near optimal treatment strategies for problems with large state and action spaces. This methodology overcomes, to a large extent, the computational complexity of the value-iteration and policy-iteration algorithms of dynamic programming. An average reward reinforcement-learning algorithm is developed. The algorithm is applied on a sample problem of treating hereditary spherocytosis. The application demonstrates the ability of the proposed methodology to obtain effective treatment strategies for sequential medical decision problems. / System requirements: World Wide Web browser and PDF reader. / Mode of access: World Wide Web.
|
108 |
Multimedia Delivery over Heterogeneous Wireless NetworksXing, Min 29 April 2015 (has links)
There is an increasing demand for multimedia services in heterogeneous wireless networks. Considering the highly dynamic wireless channels and the relatively large size of the multimedia data, how to support efficient and reliable multimedia delivery is a pressing issue. In this dissertation, we investigate the multimedia delivery algorithms in heterogeneous wireless networks from three different aspects.
First, we study the single-flow rate adaptation of video streaming algorithm over multiple wireless interfaces. In order to maintain high video streaming quality while reducing the wireless service cost, the optimal video streaming process with multiple links is formulated as a Markov Decision Process (MDP). The reward function is designed to consider the quality of service (QoS) requirements for video traffic, such as the startup latency, playback fluency, average playback quality, playback smoothness and wireless service cost. To solve the MDP in real time, we propose an adaptive, best-action search algorithm to obtain a sub-optimal solution. To evaluate the performance of the proposed adaptation algorithm, we implemented a testbed using the Android mobile phone and the Scalable Video Coding (SVC) codec and conducted experiments with real video flow.
Then, with the multiple multimedia flows competing for limited wireless resources, we propose a utility-based scheduling algorithm for multimedia transmission in Drive-thru Internet. A utility model is devised to map the throughput to user's satisfaction level in terms of multimedia data quality, such as Peak Signal-to-Noise Ratio (PSNR) of video. The objective of the scheduling problem is to maximize the total utility. Then the optimization problem is formulated as a finite-state decision problem with the assumption that future arrival information is known, and it is solved by a searching algorithm as the benchmark. To obtain a real-time solution, a practical heuristic algorithm based on the concept of utility potential is devised. We further implemented the solution and conducted extensive simulations using NS-3.
Finally, the multimedia dissemination problem in large-scale VANETs is investigated. We first utilize a hybrid-network framework to address the mobility and scalability issues in large-scale VANETs content distribution. Then, we formulate a utility-based maximization problem to find the best delivery strategy and select an optimal path for the multimedia data dissemination, where the utility function has taken the delivery delay, the Quality of Services (QoS) and the storage cost into consideration. We obtain the closed-form of the utility function, and then obtain the optimal solution of the problem with the convex optimization theory. Finally, we conducted extensive trace-driven simulations to evaluate the performance of the proposed algorithm with real traces collected by taxis in Shanghai.
In summary, the research outcomes of the dissertation can contribute to three different aspects of multimedia delivery in heterogeneous wireless networks. First, we have proposed a real-time rate adaptation algorithm for video streaming with multiple wireless interfaces, to maintain the high quality while reducing the wireless services cost. Second, we have presented an optimal scheduling algorithm which can maximize the total satisfaction for multimedia transmission in Drive-thru Internet. Third, we have derived the theoretical analysis of the utility functions including delivery delay, QoS and the storage cost, and have obtained an optimal solution for multimedia data dissemination in large-scale VANETs to achieve the highest utility. / Graduate / 0984 / 0544
|
109 |
Contributions to Simulation-based High-dimensional Sequential Decision MakingHoock, Jean-Baptiste 10 April 2013 (has links) (PDF)
My thesis is entitled "Contributions to Simulation-based High-dimensional Sequential Decision Making". The context of the thesis is about games, planning and Markov Decision Processes. An agent interacts with its environment by successively making decisions. The agent starts from an initial state until a final state in which the agent can not make decision anymore. At each timestep, the agent receives an observation of the state of the environment. From this observation and its knowledge, the agent makes a decision which modifies the state of the environment. Then, the agent receives a reward and a new observation. The goal is to maximize the sum of rewards obtained during a simulation from an initial state to a final state. The policy of the agent is the function which, from the history of observations, returns a decision. We work in a context where (i) the number of states is huge, (ii) reward carries little information, (iii) the probability to reach quickly a good final state is weak and (iv) prior knowledge is either nonexistent or hardly exploitable. Both applications described in this thesis present these constraints : the game of Go and a 3D simulator of the european project MASH (Massive Sets of Heuristics). In order to take a satisfying decision in this context, several solutions are brought : 1. Simulating with the compromise exploration/exploitation (MCTS) 2. Reducing the complexity by local solving (GoldenEye) 3. Building a policy which improves itself (RBGP) 4. Learning prior knowledge (CluVo+GMCTS) Monte-Carlo Tree Search (MCTS) is the state of the art for the game of Go. From a model of the environment, MCTS builds incrementally and asymetrically a tree of possible futures by performing Monte-Carlo simulations. The tree starts from the current observation of the agent. The agent switches between the exploration of the model and the exploitation of decisions which statistically give a good cumulative reward. We discuss 2 ways for improving MCTS : the parallelization and the addition of prior knowledge. The parallelization does not solve some weaknesses of MCTS; in particular some local problems remain challenges. We propose an algorithm (GoldenEye) which is composed of 2 parts : detection of a local problem and then its resolution. The algorithm of resolution reuses some concepts of MCTS and it solves difficult problems of a classical database. The addition of prior knowledge by hand is laborious and boring. We propose a method called Racing-based Genetic Programming (RBGP) in order to add automatically prior knowledge. The strong point is that RBGP rigorously validates the addition of a prior knowledge and RBGP can be used for building a policy (instead of only optimizing an algorithm). In some applications such as MASH, simulations are too expensive in time and there is no prior knowledge and no model of the environment; therefore Monte-Carlo Tree Search can not be used. So that MCTS becomes usable in this context, we propose a method for learning prior knowledge (CluVo). Then we use pieces of prior knowledge for improving the rapidity of learning of the agent and for building a model, too. We use from this model an adapted version of Monte-Carlo Tree Search (GMCTS). This method solves difficult problems of MASH and gives good results in an application to a word game.
|
110 |
Load allocation for optimal risk management in systems with incipient failure modesBole, Brian McCaslyn 13 January 2014 (has links)
The development and implementation challenges associated with a proposed load allocation paradigm for fault risk assessment and system health management based on uncertain fault diagnostic and failure prognostic information are investigated. Health management actions are formulated in terms of a value associated with improving system reliability, and a cost associated with inducing deviations from a system's nominal performance. Three simulated case study systems are considered to highlight some of the fundamental challenges of formulating and solving an optimization on the space of available supervisory control actions in the described health management architecture. Repeated simulation studies on the three case-study systems are used to illustrate an empirical approach for tuning the conservatism of health management policies by way of adjusting risk assessment metrics in the proposed health management paradigm. The implementation and testing of a real-world prognostic system is presented to illustrate model development challenges not directly addressed in the analysis of the simulated case study systems. Real-time battery charge depletion prediction for a small unmanned aerial vehicle is considered in the real-world case study. An architecture for offline testing of prognostics and decision making algorithms is explained to facilitate empirical tuning of risk assessment metrics and health management policies, as was demonstrated for the three simulated case study systems.
|
Page generated in 0.0881 seconds