Spelling suggestions: "subject:"markov codecision canprocess"" "subject:"markov codecision 3.3vprocess""
21 |
Off the Beaten Path: Modelling Path Uncertainty using Markov Decision Processesde Graaf, Anaïs January 2024 (has links)
Uncertainty has been an important topic, in research, as well as a social concern. The notion of path uncertainty is introduced as the likelihood of encountering a wide variety of possible trajectories when following a given strategy. The research question is: “How can path uncertainty be modelled?”. This thesis proposes the Path Uncertainty Aware Markov Decision Process (PUA-MDP), based on other types of MDPs related to other types of uncertainty. Its algorithm finds optimal policies for balancing maximal reward with minimal cumulative path uncertainty exposure. Experimental validation demonstrates that the algorithm’s behaviour resembles human behavioural responses to uncertainty. It also demonstrates that a small decrease in reward can result in a drastic decrease in uncertainty. If such a method is applied to any classic MDP, path uncertainty could be reduced greatly.
|
22 |
A review of Q-learning methods for Markov decision processesBlizzard, Christopher, Wiktorsson, Emil January 2024 (has links)
This paper discusses how Q-Learning and Deep Q-Networks (DQN) canbe applied to state-action problems described by a Markov decision process(MDP). These are machine learning methods for finding the optimal choiceof action at each time step, resulting in the optimal policy. The limitationsand advantages for the two methods are discussed, with the main limitationbeing the fact that Q-learning is unable to be used on problems with infinitestate spaces. Q-learning, however, has an advantage in the simplicity of thealgorithm, leading to a better understanding of what the algorithm is actuallydoing. Q-Learning did manage to find the optimal policy for the simpleproblem studied in this paper, but was unable to do so for the advancedproblem. The Deep Q-Network (DQN) approach was able to solve bothproblems, with a drawback in it being harder to understand what the algorithmactually is doing.
|
23 |
The Value Of Information In A Manufacturing Facility Taking Production And Lead Time Quotation DecisionsKaman, Cumhur 01 June 2011 (has links) (PDF)
Advancements in information technology enabled to track real time data in a more accurate and precise way in many manufacturing facilities. However, before obtaining the more accurate and precise data, the investment in information technology should be validated. Value of information may be adopted as a criterion in this investment. In this study, we analyze the value of information in a manufacturing facility where production and lead time quotation decisions are taken. In order to assess the value of information, two settings are analyzed. Under the first setting, the manufacturer takes decisions under perfect information. To find the optimal decisions under perfect information, a stochastic model is introduced. Under the second setting, the manufacturer takes decisions under imperfect information. To obtain a solution for this problem, Partially Observable Markov Decision Process is employed. Under the second setting, we study two approaches. In the first approach, we introduce a nonlinear programming model to find the optimal decisions. In the second approach, a heuristic approach, constructed on optimal actions taken under perfect information is presented. We examine the value of information under different parameters by considering the policies under nonlinear programming model and heuristic approach. The profit gap between the two policies is investigated. The effect of Make-to-Order (MTO) and Make-to-Stock (MTS) schemes on the value of information is analyzed. Lastly, different lead time quotation schemes / accept-all, accept-reject and precise lead time / are compared to find under which quotation scheme value of information is highest.
|
24 |
Goal-seeking Decision Support System to Empower Personal Wellness ManagementChippa, Mukesh K. January 2016 (has links)
No description available.
|
25 |
Transformação de redes de Petri coloridas em processos de decisão markovianos com probabilidades imprecisas. / Conversion from colored Petri nets into Markov decision processes with imprecise probabilities.Eboli, Mônica Goes 01 July 2010 (has links)
Este trabalho foi motivado pela necessidade de considerar comportamento estocástico durante o planejamento da produção de sistemas de manufatura, ou seja, o que produzir e em que ordem. Estes sistemas possuem um comportamento estocástico geralmente não considerado no planejamento da produção. O principal objetivo deste trabalho foi obter um método que modelasse sistemas de manufatura e representasse seu comportamento estocástico durante o planejamento de produção destes sistemas. Como os métodos que eram ideais para planejamento não forneciam a modelagem adequada dos sistemas, e os com modelagem adequada não forneciam a capacidade de planejamento necessária, decidiu-se combinar dois métodos para atingir o objetivo desejado. Decidiu-se modelar os sistemas em rede de Petri e convertê-los em processos de decisão markovianos, e então realizar o planejamento com o ultimo. Para que fosse possível modelar as probabilidades envolvidas nos processos, foi proposto um tipo especial de rede de Petri, nomeada rede de Petri fatorada. Utilizando este tipo de rede de Petri, foi desenvolvido o método de conversão em processos de decisão markovianos. A conversão ocorreu com sucesso, conforme testes que mostraram que planos podem ser produzidos utilizando-se algoritmos de ponta para processos de decisão markovianos. / The present work was motivated by the need to consider stochastic behavior when planning the production mix in a manufacturing system. These systems are exposed to stochastic behavior that is usually not considered during production planning. The main goal of this work was to obtain a method to model manufacturing systems and to represent their stochastic behavior when planning the production for these systems. Because the methods that were suitable for planning were not adequate for modeling the systems and vice-versa, two methods were combined to achieve the main goal. It was decided to model the systems in Petri nets and to convert them into Markov decision processes, to do the planning with the latter. In order to represent probabilities in the process, a special type of Petri nets, named Factored Petri nets, were proposed. Using this kind of Petri nets, a conversion method into Markov decision processes was developed. The conversion is successful as tests showed that plans can be produced within seconds using state-of-art algorithms for Markov decision processes.
|
26 |
Transformação de redes de Petri coloridas em processos de decisão markovianos com probabilidades imprecisas. / Conversion from colored Petri nets into Markov decision processes with imprecise probabilities.Mônica Goes Eboli 01 July 2010 (has links)
Este trabalho foi motivado pela necessidade de considerar comportamento estocástico durante o planejamento da produção de sistemas de manufatura, ou seja, o que produzir e em que ordem. Estes sistemas possuem um comportamento estocástico geralmente não considerado no planejamento da produção. O principal objetivo deste trabalho foi obter um método que modelasse sistemas de manufatura e representasse seu comportamento estocástico durante o planejamento de produção destes sistemas. Como os métodos que eram ideais para planejamento não forneciam a modelagem adequada dos sistemas, e os com modelagem adequada não forneciam a capacidade de planejamento necessária, decidiu-se combinar dois métodos para atingir o objetivo desejado. Decidiu-se modelar os sistemas em rede de Petri e convertê-los em processos de decisão markovianos, e então realizar o planejamento com o ultimo. Para que fosse possível modelar as probabilidades envolvidas nos processos, foi proposto um tipo especial de rede de Petri, nomeada rede de Petri fatorada. Utilizando este tipo de rede de Petri, foi desenvolvido o método de conversão em processos de decisão markovianos. A conversão ocorreu com sucesso, conforme testes que mostraram que planos podem ser produzidos utilizando-se algoritmos de ponta para processos de decisão markovianos. / The present work was motivated by the need to consider stochastic behavior when planning the production mix in a manufacturing system. These systems are exposed to stochastic behavior that is usually not considered during production planning. The main goal of this work was to obtain a method to model manufacturing systems and to represent their stochastic behavior when planning the production for these systems. Because the methods that were suitable for planning were not adequate for modeling the systems and vice-versa, two methods were combined to achieve the main goal. It was decided to model the systems in Petri nets and to convert them into Markov decision processes, to do the planning with the latter. In order to represent probabilities in the process, a special type of Petri nets, named Factored Petri nets, were proposed. Using this kind of Petri nets, a conversion method into Markov decision processes was developed. The conversion is successful as tests showed that plans can be produced within seconds using state-of-art algorithms for Markov decision processes.
|
27 |
Autonomous UAV Path Planning using RSS signals in Search and Rescue OperationsAnhammer, Axel, Lundeberg, Hugo January 2022 (has links)
Unmanned aerial vehicles (UAVs) have emerged as a promising technology in search and rescue operations (SAR). UAVs have the ability to provide more timely localization, thus decreasing the crucial duration of SAR operations. Previous work have demonstrated proof-of-concept in regard to localizing missing people by utilizing received signal strength (RSS) and UAVs. The localization system is based on the assumption that the missing person wears an enabled smartphone whose Wi-Fi signal can be intercepted. This thesis proposes a two-staged path planner for UAVs, utilizing RSS-signals and an initial belief regarding the missing person's location. The objective of the first stage is to locate an RSS-signal. By dividing the search area into grids, a hierarchical solution based on several Markov decision processes (MDPs) can be formulated which takes different areas probabilities into consideration. The objective of the second stage is to isolate the RSS-signal and provide a location estimate. The environment is deemed to be partially observable, and the problem is formulated as a partially observable Markov decision process (POMDP). Two different filters, a point mass filter (PMF) and a particle filter (PF), are evaluated in regard to their ability to correctly estimate the state of the environment. The state of the environment then acts as input to a deep Q-network (DQN) which selects appropriate actions for the UAV. Thus, the DQN becomes a path planner for the UAV and the trajectory it generates is compared to trajectories generated by, among others, a greedy-policy. Results for Stage 1 demonstrate that the path generated by the MDPs prioritizes areas with higher probability, and intuitively seems very reasonable. The results also illustrate potential drawbacks with a hierarchical solution, which potentially can be addressed by considering more factors into the problem. Simulation results for Stage 2 show that both a PMF and a PF can successfully be used to estimate the state of the environment and provide an accurate localization estimate. The PMF generated slightly more accurate estimations compared to the PF. The DQN is successful in isolating the missing person's probable location, by relatively few actions. However, it only performs marginally better than the greedy policy, indicating that it may be a complicated solution to a simpler problem.
|
28 |
Data-Driven Policies for Manufacturing Systems and Cyber Vulnerability MaintenanceRoychowdhury, Sayak 12 October 2017 (has links)
No description available.
|
29 |
A leader-follower partially observed Markov gameChang, Yanling 07 January 2016 (has links)
The intent of this dissertation is to generate a set of non-dominated finite-memory policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon total discounted reward, partially observed Markov game (POMG). Each agent’s policy assumes that the agent knows its current and recent state values, its recent actions, and the current and recent possibly inaccurate observations of the other agent’s state. For each candidate finite-memory leader policy, we assume the follower, fully aware of the leader policy, determines a policy that optimizes the follower’s criterion. The leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process that we use to determine the follower’s best response policy for a given leader policy. We then present a value determination procedure to evaluate the performance of the leader for a given leader policy, based on which non-dominated set of leader polices can be selected by existing heuristic approaches.
We then analyze how the value of the leader’s criterion changes due to changes in the leader’s quality of observation of the follower. We give conditions that insure improved observation quality will improve the leader’s value function, assuming that changes in the observation quality do not cause the follower to change its policy. We show that discontinuities in the value of the leader’ criterion, as a function of observation quality, can occur when the change of observation quality is significant enough for the follower to change its policy. We present conditions that determine when a discontinuity may occur and conditions that guarantee a discontinuity will not degrade the leader’s performance. This framework has been used to develop a dynamic risk analysis approach for U.S. food supply chains and to compare and create supply chain designs and sequential control strategies for risk mitigation.
|
30 |
Algoritmos eficientes para o problema do orçamento mínimo em processos de decisão Markovianos sensíveis ao risco / Efficient algorithms for the minimum budget problem in risk-sensitive Markov decision processesMoreira, Daniel Augusto de Melo 06 November 2018 (has links)
O principal critério de otimização utilizado em Processos de Decisão Markovianos (mdps) é minimizar o custo acumulado esperado. Embora esse critério de otimização seja útil, em algumas aplicações, o custo gerado por algumas execuções pode exceder um limite aceitável. Para lidar com esse problema foram propostos os Processos de Decisão Markovianos Sensíveis ao Risco (rs-mdps) cujo critério de otimização é maximizar a probabilidade do custo acumulado não ser maior que um orçamento limite definido pelo usuário, portanto garantindo que execuções custosas de um mdp ocorram com menos probabilidade. Algoritmos para rs-mdps possuem problemas de escalabilidade quando lidam com intervalos de custo amplos, uma vez que operam no espaço aumentado que enumera todos os possíveis orçamentos restantes. Neste trabalho é proposto um novo problema que é encontrar o orçamento mínimo para o qual a probabilidade de que o custo acumulado não exceda esse orçamento converge para um máximo. Para resolver esse problema são propostas duas abordagens: (i) uma melhoria no algoritmo tvi-dp (uma solução previamente proposta para rsmdps) e (ii) o primeiro algoritmo de programação dinâmica simbólica para rs-mdps que explora as independências condicionais da função de transição no espaço de estados aumentado. Os algoritmos propostos eliminam estados inválidos e adicionam uma nova condição de parada. Resultados empíricos mostram que o algoritmo rs-spudd é capaz de resolver problemas até 103 vezes maior que o algoritmo tvi-dp e é até 26.2 vezes mais rápido que tvi-dp (nas instâncias que o algoritmo tvi-dp conseguiu resolver). De fato, é mostrado que o algoritmo rs-spudd é o único que consegue resolver instâncias grandes dos domínios analisados. Outro grande desafio em rs-mdps é lidar com custos contínuos. Para resolver esse problema são definidos os rs-mdps híbridos que incluem variáveis contínuas e discretas, além do orçamento limite definido pelo usuário. É mostrado que o algoritmo de programação dinâmica simbólica (sdp), existente na literatura, pode ser usado para resolver esse tipo de mdps. Esse algoritmo foi empiricamente testado de duas maneiras diferentes: (i) comparado com os demais algoritmos propostos em um domínio em que todos são capazes de resolver e (ii) testado em um domínio que somente ele é capaz de resolver. Os resultados mostram que o algoritmo sdp para rs-mdp híbridos é capaz de resolver domínios com custos contínuos sem a necessidade de enumeração de estados, porém em troca do aumento do custo computacional. / The main optimization criterion used in Markovian Decision Processes (mdps) is to minimize the expected cumulative cost. Although this optimization criterion is useful, in some applications the cost generated by some executions may exceed an acceptable threshold. In order to deal with this problem, the Risk-Sensitive Markov Decision Processes (rs-mdps) were proposed whose optimization criterion is to maximize the probability of the cumulative cost not to be greater than an user-defined budget, thus guaranteeing that costly executions of an mdp occur with least probability. Algorithms for rs-mdps face scalability issues when handling large cost intervals, since they operate in an augmented state space which enumerates the possible remaining budgets. In this work, we propose a new challenging problem of finding the minimum budget for which the probability that the cumulative cost does not exceed this budget converges to a maximum. To solve this problem, we propose: (i) an improved version of tvi-dp (a previous solution for rs-mdps) and (ii) the first symbolic dynamic programming algorithm for rs-mdps that explores conditional independence of the transition function in the augmented state space. The proposed algorithms prune invalid states and perform early termination. Empirical results show that rs-spudd is able to solve problems up to 103 times larger than tvi-dp and is up to 26.2 times faster than tvi-dp (in the instances tvi-dp was able to solve). In fact, we show that rs-spudd is the only one that can solve large instances of the analyzed domains. Another challenging problem for rs-mdps is handle continous costs. To solve this problem, we define Hybrid rs-mdps which include continous and discrete variables, and the user-defined budget. In this work, we show that Symbolic Dynamic Programming (sdp) algorithm can be used to solve this kind of mdps. We empirically evaluated the sdp algorithm: (i) in a domain that can be solved with the previously proposed algorithms and (ii) in a domain that only sdp can solve. Results shown that sdp algorithm for Hybrid rs-mdps is capable of solving domains with continous costs, but with a higher computational cost.
|
Page generated in 0.0762 seconds