Global ETD Search

51	Intelligent Knowledge Distribution for Multi-Agent Communication, Planning, and Learning Fowler, Michael C. 06 May 2020 (has links) This dissertation addresses a fundamental question of multi-agent coordination: what infor- mation should be sent to whom and when, with the limited resources available to each agent? Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this dissertation introduces new concepts to enable Intelligent Knowledge Distribution (IKD), including Constrained-action POMDPs (CA-POMDP) and concurrent decentralized (CoDec) POMDPs for an agnostic plug-and-play capability for fully autonomous systems. Each agent runs a CoDec POMDP where all the decision making (motion planning, task allocation, asset monitoring, and communication) are separated into concurrent individual MDPs to reduce the combinatorial explosion of the action and state space while maintaining dependencies between the models. We also introduce the CA-POMDP with action-based constraints on partially observable Markov decision processes, rewards driven by the value of information, and probabilistic constraint satisfaction through discrete optimization and Markov chain Monte Carlo analysis. IKD is adapted real-time through machine learning of the actual environmental impacts on the behavior of the system, including collaboration strategies between autonomous agents, the true value of information between heterogeneous systems, observation probabilities and resource utilization. / Doctor of Philosophy / This dissertation addresses a fundamental question behind when multiple autonomous sys- tems, like drone swarms, in the field need to coordinate and share data: what information should be sent to whom and when, with the limited resources available to each agent? Intelligent Knowledge Distribution is a framework that answers these questions. Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this dissertation introduces new concepts to enable Intelligent Knowledge Distribution (IKD), including Constrained-action POMDPs and concurrent decentralized (CoDec) POMDPs for an agnostic plug-and-play capability for fully autonomous systems. The IKD model was able to demonstrate its validity as a "plug-and-play" library that manages communications between agents that ensures the right information is being transmitted at the right time to the right agent to ensure mission success. Multi-agent System Distributed Decision Making Markov Decision Processes Relational Learning Probabilistic Constraint Satisfaction Wireless Communications
52	Probabilistic Causality in Markovian Models Ziemek, Robin 23 September 2024 (has links) The complexity of modern computer and software systems still seems to grow exponentially, while the human user is widely left without explanations on how to understand these systems. One of the central tasks of current computer science therefore lies in the development of methods and tools to build such an understanding. A similar task is addressed by formal verification which gives various verifiable justifications for the functionality of a system. As these still only give knowledge that a system functions properly they only address a portion of the task to make systems easier to comprehend. It is widely believed that cause-effect reasoning plays an important role in forming human understanding of complex relations. Thus, there are already many accounts on causality in modern computer science. However, most of them are focusing on a form of backward looking actual causality. This variant of causality is concerned with actual events after their occurrence and tries to reason about causes mostly in a counterfactual manner. In this thesis we address a probabilistic form of causality which is forward looking by nature. As such, we define and discuss novel notions of probabilistic causes in discrete time Markov chains and Markov decision processes. For this we rely on two central principles of probabilistic causality. On one hand, the probability-raising principle states that a cause should raise the probability of its effect. On the other hand, temporal priority requires that a cause must occur before its effect. We build the mathematical and algorithmic foundations of our so called probability-raising causes. For this we work in a state-based setting where causes and effects are reachability properties of sets of states. In order to measure the predictive power of states we define quality-measures for which we interpret causes as binary classifiers. With these tools we address the algorithmic questions of checking cause-effect relations if both a cause candidate and an effect are given and finding quality-optimal causes if only the effect is given. We discuss possible extensions of this basic state-based framework to more general formulations of causes and effects as ω-regular properties.:Abstract 3 1 Introduction 7 1.1 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Preliminaries 21 2.1 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.1 Graph of an MDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.2 Schedulers and Probability Measure . . . . . . . . . . . . . . . . . . . . . 23 2.1.3 Maximal and Minimal Probabilities . . . . . . . . . . . . . . . . . . . . . . 25 2.1.4 Frequencies and Balance Equations . . . . . . . . . . . . . . . . . . . . . 26 2.1.5 MR Scheduler in MDPs without ECs . . . . . . . . . . . . . . . . . . . . . . 26 2.1.6 MEC-Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Automata and ω-Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3 Probability-Raising Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Probability-Raising Causality in Markov Decision Processes 33 3.1 Setting and Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.1 Related Work revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.1 Conceptual Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.2 Racetrack MDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Quality Measures for Probability-Raising Causes . . . . . . . . . . . . . . . . . . 52 3.3.1 PR Causes as Binary Classifiers in MDPs . . . . . . . . . . . . . . . . . . . 53 3.3.2 Quality Measures for a given PR Cause . . . . . . . . . . . . . . . . . . . . 54 4 Algorithmic Considerations 59 4.1 Checking the Probability-Raising Conditions . . . . . . . . . . . . . . . . . . . . . 60 4.1.1 Canonical MDP for given Cause Set . . . . . . . . . . . . . . . . . . . . . . 61 4.1.2 Checking the SPR Condition and the Existence of PR Causes . . . . . . . 72 4.1.3 Checking the GPR Condition . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 Computing the Quality of a PR Cause . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2.1 Max/Min Ratios for Disjoint Sets of Terminal States . . . . . . . . . . . . 92 4.3 Quality-Optimal Probability-Raising Causes . . . . . . . . . . . . . . . . . . . . . . 97 4.3.1 Quality-Optimal SPR Causes . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3.2 Quality-Optimal GPR Causes . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5 Variants of Probability-Raising Causality in MDPs 115 5.1 Probability-Raising Scheduler and Potential PR Causes . . . . . . . . . . . . . . . 116 5.1.1 Checking for SPR Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.2 Checking for GPR Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2 ω-Regular Effects - Reachability Causes . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2.1 Checking Reachability Probability-Raising Causes . . . . . . . . . . . . . . 128 5.2.2 Quality and Optimality of Reachability PR Causes . . . . . . . . . . . . . . 133 5.3 ω-Regular Co-Safety Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.1 Checking Co-Safety PR causes . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.2 Quality and Optimality of Co-Safety PR Causes . . . . . . . . . . . . . . . 142 6 Conclusion 147 info:eu-repo/classification/ddc/004 ddc:004
53	Dynamic resource allocation in manufacturing and service industries Yilmaz, Tuba 11 January 2012 (has links) In this thesis, we study three applications of dynamic resource allocation: the first two consider dynamic lead-time quotation in make-to-order (MTO) systems with substitutable products and order cancellations, respectively; and the third application is a manpower allocation problem with job-teaming constraints. Matching supply and demand for manufacturing and service industries has been a fundamental focus of operations management literature, which concentrated on optimizing or improving supply-side decisions since demand has generally been assumed to be exogenously determined. However, recent business trends and advances in consumer behavior modeling have shown that demand for goods and services can clearly be shaped by various decisions that a firm makes, such as price and lead-time. In fact, competition between companies is no longer mainly based on price or product features; lead-time is one of the strategic measures to evaluate suppliers. In MTO manufacturing or service environments that aim to satisfy the customers' unique needs, lead-time quotation impacts the actual demand of the products and the overall profitability of the firm. In the first two parts of the thesis, we study the dynamic lead-time quotation problem in pure MTO (or service) systems characterized by lead-time sensitive Poisson demand and exponentially distributed service times. We formulate the problem as an infinite horizon Markov decision process (MDP) with the objective of maximizing the long-run expected average profit per unit time, where profits are defined to specifically account for delays in delivery of the customer orders. We study dynamic lead-time quotation problem in two particular settings; one setting with the possibility of demand substitution and another setting with order cancellations. The fundamental trade-off in lead-time quotation is between quoting short lead-times and attaining them. In case of demand substitution, i.e., in presence of substitutable products and multiple customer classes with different requirements and margins, this trade-off also includes capacity allocation and order acceptance decisions. In particular, one needs to decide whether to allocate capacity to a low-margin order now, or whether to reserve capacity for potential future arrivals of high-margin orders by considering customer preferences, the current workload in the system, and the future arrivals. In the case of order cancellations, one needs to take into account the probability of cancellation of orders currently in the system and quote lead-times accordingly; otherwise quotation of a longer lead-time may result in the loss of customer order, lower utilization of resources, and, in turn, reduced in profits. In Chapter 2, we study a dynamic lead-time quotation problem in a MTO system with two (partially) substitutable products and two classes of customers. Customers decide to place an order on one of the products or not to place an order, based on the quoted lead-times. We analyze the optimal profit and the structure of the optimal lead-time policy. We also compare the lead-time quotes and profits for different quotation strategies (static vs. dynamic) with or without substitution. Numerical results show that substitution and dynamic quotation have synergetic effects, and higher benefits can be obtained by dynamic quotation and/or substitution when difference in product revenues or arrival rates, or total traffic intensity are higher. In Chapter 3, we study a dynamic lead-time quotation problem in a MTO system with single product considering the order cancellations. The order cancellations can take place during the period that the order is being processed (either waiting or undergoing processing), or after the processing is completed, at the delivery to the customer. We analyze the behavior of optimal profit in terms of cancellation parameters. We show that the optimal profit does not necessarily decrease as cancellation rate increases through a numerical study. When the profit from a cancelled order, arrival rate of customers, or lead-time sensitivity of customers are high, there is a higher probability that optimal profit increases as cancellation rate increases. We also compare the cancellation scenarios with the corresponding no-cancellation scenarios, and show that there exists a cancellation scenario that is at least as good in terms of profit than a no-cancellation scenario for most of the parameter settings. In Chapter 4, we study the Manpower Allocation Problem with Job-Teaming Constraints with the objective of minimizing the total completion time of all tasks. The problem arises in various contexts where tasks require cooperation between workers: a team of individuals with varied expertise required in different locations in a business environment, surgeries requiring different composition of doctors and nurses in a hospital, a combination of technicians with individual skills needed in a service company. A set of tasks at random locations require a set of capabilities to be accomplished, and workers have unique capabilities that are required by several tasks. Tasks require synchronization of workers to be accomplished, hence workers arriving early at a task have to wait for other required workers to arrive in order to start processing. We present a mixed integer programming formulation, strengthen it by adding cuts and propose heuristic approaches. Experimental results are reported for low and high coordination levels, i.e., number of workers that are required to work simultaneously on a given task. Lead-time quotation Markov decision processes Order cancellations Demand substitution Resource allocation Production planning Production scheduling Business logistics
54	Úlohy stochastického dynamického programování: teorie a aplikace / Stochastic Dynamic Programming Problems: Theory and Applications. Lendel, Gabriel January 2012 (has links) Title: Stochastic Dynamic Programming Problems: Theory and Applications Author: Gabriel Lendel Department: Department of Probability and Mathematical Statistics Supervisor: Ing. Karel Sladký CSc. Supervisor's e-mail address: sladky@utia.cas.cz Abstract: In the present work we study Markov decision processes which provide a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. We study iterative procedures for finding policy that is optimal or nearly optimal with respect to the selec- ted criteria. Specifically, we mainly examine the task of finding a policy that is optimal with respect to the total expected discounted reward or the average expected reward for discrete or continuous systems. In the work we study policy iteration algorithms and aproximative value iteration algorithms. We give numerical analysis of specific problems. Keywords: Stochastic dynamic programming, Markov decision process, policy ite- ration, value iteration
55	Programação dinâmica em tempo real para processos de decisão markovianos com probabilidades imprecisas / Real-time dynamic programming for Markov Decision Processes with Imprecise Probabilities Dias, Daniel Baptista 28 November 2014 (has links) Em problemas de tomada de decisão sequencial modelados como Processos de Decisão Markovianos (MDP) pode não ser possível obter uma medida exata para as probabilidades de transição de estados. Visando resolver esta situação os Processos de Decisão Markovianos com Probabilidades Imprecisas (Markov Decision Processes with Imprecise Transition Probabilities, MDP-IPs) foram introduzidos. Porém, enquanto estes MDP-IPs se mostram como um arcabouço robusto para aplicações de planejamento no mundo real, suas soluções consomem muito tempo na prática. Em trabalhos anteriores, buscando melhorar estas soluções foram propostos algoritmos de programação dinâmica síncrona eficientes para resolver MDP-IPs com uma representação fatorada para as funções de transição probabilística e recompensa, chamados de MDP-IP fatorados. Entretanto quando o estado inicial de um problema do Caminho mais Curto Estocástico (Stochastic Shortest Path MDP, SSP MDP) é dado, estas soluções não utilizam esta informação. Neste trabalho será introduzido o problema do Caminho mais Curto Estocástico com Probabilidades Imprecisas (Stochastic Shortest Path MDP-IP, SSP MDP-IP) tanto em sua forma enumerativa, quanto na fatorada. Um algoritmo de programação dinâmica assíncrona para SSP MDP-IP enumerativos com probabilidades dadas por intervalos foi proposto por Buffet e Aberdeen (2005). Entretanto, em geral um problema é dado de forma fatorada, i.e., em termos de variáveis de estado e nesse caso, mesmo se for assumida a imprecisão dada por intervalos sobre as variáveis, ele não poderá ser mais aplicado, pois as probabilidades de transição conjuntas serão multilineares. Assim, será mostrado que os SSP MDP-IPs fatorados são mais expressivos que os enumerativos e que a mudança do SSP MDP-IP enumerativo para o caso geral de um SSP MDP-IPs fatorado leva a uma mudança de resolução da função objetivo do Bellman backup de uma função linear para uma não-linear. Também serão propostos algoritmos enumerativos, chamados de RTDP-IP (Real-time Dynamic Programming with Imprecise Transition Probabilities), LRTDP-IP (Labeled Real-time Dynamic Programming with Imprecise Transition Probabilities), SSiPP-IP (Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) e LSSiPP-IP (Labeled Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) e fatorados chamados factRTDP-IP (factored RTDP-IP) e factLRTDP-IP (factored LRTDP-IP). Eles serão avaliados em relação aos algoritmos de programação dinâmica síncrona em termos de tempo de convergência da solução e de escalabilidade. / In sequential decision making problems modelled as Markov Decision Processes (MDP) we may not have the state transition probabilities. To solve this issue, the framework based in Markov Decision Processes with Imprecise Transition Probabilities (MDP-IPs) is introduced. Therefore, while MDP-IPs is a robust framework to use in real world planning problems, its solutions are time-consuming in practice. In previous works, efficient algorithms based in synchronous dynamic programming to solve MDP-IPs with factored representations of the probabilistic transition function and reward function, called factored MDP-IPs. However, given a initial state of a system, modeled as a Stochastic Shortest Path MDP (SSP MDP), solutions does not use this information. In this work we introduce the Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) in enumerative form and in factored form. An efficient asynchronous dynamic programming solution for SSP MDP-IPs with enumerated states has been proposed by Buffet e Aberdeen (2005) before which is restricted to interval-based imprecision. Nevertheless, in general the problem is given in a factored form, i.e., in terms of state variables and in this case even if we assume interval-based imprecision over the variables, the previous solution is no longer applicable since we have multilinear parameterized joint transition probabilities. In this work we show that the innocuous change from the enumerated SSP MDP-IP cases to the general case of factored SSP MDP-IPs leads to a switch from a linear to nonlinear objectives in the Bellman backup. Also we propose assynchronous dynamic programming enumerative algorithms, called RTDP-IP (Real-time Dynamic Programming with Imprecise Transition Probabilities), LRTDP-IP (Labeled Real-time Dynamic Programming with Imprecise Transition Probabilities), SSiPP-IP (Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) and LSSiPP-IP (Labeled Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities), and factored algorithms called factRTDP-IP (factored RTDP-IP) and factLRTDP-IP (factored LRTDP-IP). There algorithms will be evaluated with the synchronous dynamic programming algorithms previously proposed in terms of convergence time and scalability. planejamento probabilístico planejamento robusto probabilistic planning robust planning
56	Processos de decisão Markovianos fatorados com probabilidades imprecisas / Factored Markov decision processes with Imprecise Transition Probabilities Delgado, Karina Valdivia 19 January 2010 (has links) Em geral, quando modelamos problemas de planejamento probabilístico do mundo real, usando o arcabouço de Processos de Decisão Markovianos (MDPs), é difícil obter uma estimativa exata das probabilidades de transição. A incerteza surge naturalmente na especificação de um domínio, por exemplo, durante a aquisição das probabilidades de transição a partir de um especialista ou de dados observados através de técnicas de amostragem, ou ainda de distribuições de transição não estacionárias decorrentes do conhecimento insuficiente do domínio. Com o objetivo de se determinar uma política robusta, dada a incerteza nas transições de estado, Processos de Decisão Markovianos com Probabilidades Imprecisas (MDP-IPs) têm sido usados para modelar esses cenários. Infelizmente, apesar de existirem diversos algoritmos de solução para MDP-IPs, muitas vezes eles exigem chamadas externas de rotinas de otimização que podem ser extremamente custosas. Para resolver esta deficiência, nesta tese, introduzimos o MDP-IP fatorado e propomos métodos eficientes de programação matemática e programação dinâmica que permitem explorar a estrutura de um domínio de aplicação. O método baseado em programação matemática propõe soluções aproximadas eficientes para MDP-IPs fatorados, estendendo abordagens anteriores de programação linear para MDPs fatorados. Essa proposta, baseada numa formulação multilinear para aproximações robustas da função valor de estados, explora a representação fatorada de um MDP-IP, reduzindo em ordens de magnitude o tempo consumido em relação às abordagens não-fatoradas previamente propostas. O segundo método proposto, baseado em programação dinâmica, resolve o gargalo computacional existente nas soluções de programação dinâmica para MDP-IPs propostas na literatura: a necessidade de resolver múltiplos problemas de otimização não-linear. Assim, mostramos como representar a função valor de maneira compacta usando uma nova estrutura de dados chamada de Diagramas de Decisão Algébrica Parametrizados, e como aplicar técnicas de aproximação para reduzir drasticamente a sobrecarga computacional das chamadas a um otimizador não-linear, produzindo soluções ótimas aproximadas com erro limitado. Nossos resultados mostram uma melhoria de tempo e até duas ordens de magnitude em comparação às abordagens tradicionais enumerativas baseadas em programação dinâmica e uma melhoria de tempo de até uma ordem de magnitude sobre a extensão de técnicas de iteração de valor aproximadas para MDPs fatorados. Além disso, produzimos o menor erro de todos os algoritmos de aproximação avaliados. / When modeling real-world decision-theoretic planning problems with the framework of Markov Decision Processes(MDPs), it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, uncertainty arises in the specification of transitions due to elicitation of MDP transition models from an expert or data, or non-stationary transition distributions arising from insuficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, Markov Decision Processes with Imprecise Transition Probabilities (MDP-IPs) have been introduced. Unfortunately, while various solutions exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose eficient mathematical programming and dynamic programming methods to exploit its structure. First, we derive eficient approximate solutions for Factored MDP-IPs based on mathematical programming resulting in a multilinear formulation for robust maximin linear-value approximations in Factored MDP-IPs. By exploiting factored structure in MDP-IPs we are able to demonstrate orders of magnitude reduction in solution time over standard exact non-factored approaches. Second, noting that the key computational bottleneck in the dynamic programming solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional at dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error among all approximation algorithm evaluated. Markov decision processes planejamento probabilstico planejamento robusto. planejamento sob incerteza planning under uncertainty probabilistic planning Processos de decisão Markovianos robust planning.
57	Sur les abstractions et les projections des processus décisionnels de Markov de grande taille / On the abstractions and projections of Large Markov Decision Processes Tagorti, Manel 03 February 2015 (has links) Les processus décisionnels de Markov (MDP) sont un formalisme mathématique des domaines de l'intelligence artificielle telle que la planification, l'apprentissage automatique, l'apprentissage par renforcement... Résoudre un MDP permet d'identifier la stratégie (politique) optimale d'un agent en interaction avec un environnement stochastique. Lorsque la taille de ce système est très grande il devient difficile de résoudre ces processus par les moyens classiques. Cette thèse porte sur la résolution des MDP de grande taille. Elle étudie certaines méthodes de résolutions: comme les abstractions et les méthodes dites de projection. Elle montre les limites de certaines abstractions et identifie certaines structures "les bisimulations" qui peuvent s'avérer intéressantes pour une résolution approchée du problème. Cette thèse s'est également intéressée à une méthode de projection l'algorithme Least square temporal difference LSTD(λ). Une estimation de la borne sur la vitesse de convergence de cet algorithme a été établie avec une mise en valeur du rôle joué par le paramètre [lambda]. Cette analyse a été étendue pour déduire une borne de performance pour l'algorithme Least square non stationary policy iteration LS(λ)NSPI en estimant la borne d'erreur entre la valeur calculée à une itération fixée et la valeur sous la politique optimale qu'on cherche à identifier / Markov Decision Processes (MDP) are a mathematical formalism of many domains of artifical intelligence such as planning, machine learning, reinforcement learning... Solving an MDP means finding the optimal strategy or policy of an agent interacting in a stochastic environment. When the size of this system becomes very large it becomes hard to solve this problem with classical methods. This thesis deals with the resolution of MDPs with large state space. It studies some resolution methods such as: abstractions and the projection methods. It shows the limits of some approachs and identifies some structures that may be interesting for the MDP resolution. This thesis focuses also on projection methods, the Least square temporal difference algorithm LSTD(λ). An estimate of the rate of the convergence of this algorithm has been derived with an emphasis on the role played by the parameter [lambda]. This analysis has then been generalized to the case of Least square non stationary policy iteration LS(λ)NSPI . We compute a performance bound for LS([lambda])NSPI by bounding the error between the value computed given a fixed iteration and the value computed under the optimal policy, that we aim to determine Processus décisionnels de Markov Abstractions Vitesse de convergence Borne de performance Markov Decision Processes Abstractions Rate of convergence Performance bound 006.3
58	Soluções eficientes para processos de decisão markovianos baseadas em alcançabilidade e bissimulações estocásticas / Efficient solutions to Markov decision processes based on reachability and stochastic bisimulations Santos, Felipe Martins dos 09 December 2013 (has links) Planejamento em inteligência artificial é a tarefa de determinar ações que satisfaçam um dado objetivo. Nos problemas de planejamento sob incerteza, as ações podem ter efeitos probabilísticos. Esses problemas são modelados como Processos de Decisão Markovianos (Markov Decision Processes - MDPs), modelos que permitem o cálculo de soluções ótimas considerando o valor esperado de cada ação em cada estado. Contudo, resolver problemas grandes de planejamento probabilístico, i.e., com um grande número de estados e ações, é um enorme desafio. MDPs grandes podem ser reduzidos através da computação de bissimulações estocásticas, i.e., relações de equivalência sobre o conjunto de estados do MDP original. A partir das bissimulações estocásticas, que podem ser exatas ou aproximadas, é possível obter um modelo abstrato reduzido que pode ser mais fácil de resolver do que o MDP original. No entanto, para problemas de alguns domínios, a computação da bissimulação estocástica sobre todo o espaço de estados é inviável. Os algoritmos propostos neste trabalho estendem os algoritmos usados para a computação de bissimulações estocásticas para MDPs de forma que elas sejam computadas sobre o conjunto de estados alcançáveis a partir de um dado estado inicial, que pode ser muito menor do que o conjunto de estados completo. Os resultados experimentais mostram que é possível resolver problemas grandes de planejamento probabilístico com desempenho superior às técnicas conhecidas de bissimulação estocástica. / Planning in artificial intelligence is the task of finding actions to reach a given goal. In planning under uncertainty, the actions can have probabilistic effects. This problems are modeled using Markov Decision Processes (MDPs), models that enable the computation of optimal solutions considering the expected value of each action when applied in each state. However, to solve big probabilistic planning problems, i.e., those with a large number of states and actions, is still a challenge. Large MDPs can be reduced by computing stochastic bisimulations, i.e., equivalence relations over the original MDP states. From the stochastic bisimulations, that can be exact or approximated, it is possible to get an abstract reduced model that can be easier to solve than the original MDP. But, for some problems, the stochastic bisimulation computation over the whole state space is unfeasible. The algorithms proposed in this work extend the algorithms that are used to compute stochastic bisimulations for MDPs in a way that they can be computed over the reachable set of states with a given initial state, which can be much smaller than the complete set of states. The empirical results show that it is possible to solve large probabilistic planning problems with better performance than the known techniques of stochastic bisimulation. Análise de Alcançabilidade Bissimulação Estocástica Markov Decision Processes Planejamento Probabilístico Probabilistic Planning Processo de Decisão Markoviano Reachability Analysis Stochastic Bisimulation
59	Dynamic Coordination in Manufacturing and Healthcare Systems Zhongjie Ma (5930012) 16 January 2019 (has links) <div>As the manufacturing and healthcare systems becomes more complex, efficiently managing these systems requires cooperation and coordination between different parties. This dissertation examines the coordination issues in a supply chain problem and diagnostic decision making in the healthcare system. Below, we provide a brief description of the problem and results achieved. </div><div> </div><div>With supply chain becoming increasingly extended, the uncertainty in the upstream production process can greatly affect the material flow that aims toward meeting the uncertain demand at the downstream. In Chapter 2, we analyze a two-location system in which the upstream production facility experiences random capacities and the downstream store faces random demands. Instead of decomposing the profit function widely used to treat multi-echelon systems, our approach builds on the notions of stochastic functions, in particular, the stochastic linearity in midpoint and the directional concavity in midpoint, which establishes the concavity and submodularity of the profit functions. In general, it is optimal to follow a two-level state-dependent threshold policy such that an order is issued at a location if and only if the inventory position of that location is below the corresponding threshold. When the salvage values of the ending inventories are linear, the profit function becomes decomposable in the inventory positions at different locations and the optimal threshold policy reduces to the echelon base-stock policy. The effect of production and demand uncertainty on inventory levels depends critically on whether the production capacity is limited or ample in relation to the demand. Only when the capacity is about the demand, the upstream facility holds positive inventory; otherwise, all units produced are immediately shipped to the downstream. We further extend our analysis to situations with general stochastic production functions and with multiple locations.</div><div> </div><div> </div><div>In Chapter 3, we examine the two-stage supply chain problem (described in Chapter 2) under the decentralized control. We consider two scenarios. In the first scenario, the retail store does not have any supply information including the inventory level at the manufacturing facility. We show that the upstream and downstream can be dynamically coordinated with proper transfer payment defined on local inventories and their own value function in the dynamic recursion. In the second scenario, the demand distribution is unknown to the manufacturing facility as well as the retail store does not know the supply information. We characterize the optimal transfer contracts under which coordination can be achieved, and propose an iterative algorithm to compute the optimal transfer contracts in the decentralized setting. The total profit of the decentralized system under our algorithm is guaranteed to converge to the centralized optimal channel profit for any demand and supply distribution functions. </div><div> </div><div>In Chapter 4, we provide a case study for the framework developed in [1]. The authors study the evaluation and integration of new medical research considering the operational impacts. As a case study, we first describe their two-station queueing control model using the MDP framework. We then present the structural properties of the MDP model. Since multiple classes of patients are considered in the MDP model, it becomes challenging to solve when the the number of patient classes increases. We describe an efficient heuristic algorithm developed by [1] to overcome the curse of dimensionality. We also test the numerical performance of their heuristic algorithm, and find that the largest optimality gap is less than 1.50% among all the experiments. </div><div> </div> Logistics and Supply Chain Management Inventory Management Supply chain management Supply chain coordination Healthcare management Markov decision processes (MDPs)
60	Mécanismes de retransmission Hybrid-ARQ en radio-cognitive. / Hybrid-ARQ mechanisms in a radio-cognitive context. Tajan, Romain 05 December 2013 (has links) Dans les standards actuels tels que HSDPA ou LTE, des protocoles de retransmissions (ARQ: Automatic Repeat reQuest) sont utilisés conjointement au codage de canal afin de palier aux erreurs dues à l'absence ou la mauvaise de connaissance de canal à la transmission. On garantit ainsi la fiabilité du lien physique pour les couches OSI supérieures (du moins un taux d'erreur paquet faible). De tels protocoles sont appelés protocoles de retransmission hybrides (HARQ). L'objet de cette thèse est de proposer des outils permettant l'analyse et l'optimisation des systèmes de communication en présences de protocoles HARQ avec une emphase particulière sur les systèmes cognitifs.Dans la première partie, nous étudierons un système point-à-point dans lequel trois différents protocoles HARQ adaptatifs seront considérés. Dans un premier temps, nous considérerons le régime asymptotique (i.e. codes optimaux gaussiens). Nous proposerons, dans ce cas, deux optimisations possibles : la minimisation de la puissance moyenne sous la contrainte de débit moyen et la maximisation du débit moyen sous une contrainte de puissance moyenne. Nous montrerons que les Processus de Décision Markoviens (MDP) sont des outils adaptés aux problèmes d'optimisation considérés.Dans les standards actuels tels que HSDPA ou LTE, des protocoles de retransmissions (ARQ: Automatic Repeat reQuest) sont utilisés conjointement au codage de canal afin de palier aux erreurs dues à l'absence ou la mauvaise de connaissance de canal à la transmission. On garantit ainsi la fiabilité du lien physique pour les couches OSI supérieures (du moins un taux d'erreur paquet faible). De tels protocoles sont appelés protocoles de retransmission hybrides (HARQ). L'objectif de cette thèse est de proposer des outils permettant l'analyse et l'optimisation des systèmes de communication en présences de protocoles HARQ avec une emphase particulière sur les systèmes cognitifs. La radio cognitive est une approche permettant à des utilisateurs non-licenciés de communiquer dans les mêmes bandes de fréquences que des utilisateurs licenciés afin d'augmenter l'efficacité spectrale des réseaux sans fil. Les utilisateurs secondaires doivent néanmoins limiter les interférences générées sur les signaux des utilisateurs primaires. Dans ce contexte, nous étudierons les débits atteignables par un utilisateur secondaire utilisant l'observation du protocole HARQ de l'utilisateur primaire afin de contrôler son interférence. / Automatic Repeat Request protocols (ARQ) are widely implemented in current mobile wireless standards such as HSDPA and LTE. In general, ARQ protocols are combined with channel coding to overcome errors caused by the lack of channel knowledge at the transmitter side. These protocols are called Hybrid ARQ protocols (HARQ). HARQ protocols ensure a good reliability (at least a small packet error rate) of the physical layer for the OSI upper layers. The purpose of this thesis is to provide tools for the analysis and the optimization of HARQ communication systems with an emphasis on cognitive systems. Cognitive Radio (CR) is an approach aiming to increase the spectral efficiency of wireless networks. In a CR context, unlicensed users are allowed to communicate within the same frequency bands and at the same time as licensed users. Secondary users must however limit the amount of interference generated on the primary users signals. In this thesis, we consider a scenario in which the secondary user interferes a primary user employing a HARQ protocol. When the secondary user knows the state of the primary HARQ protocol, we show that a joint power and rate allocation can be performed to limit the interference. Radio-cognitive Hybrid-ARQ Limitation d'interférence Processus de Décisions Markoviens Radio-cognitive Hybrid-ARQ Interference mitigation Markov Decision Processes

Search results