Spelling suggestions: "subject:"pomdp"" "subject:"fomdp""
31 |
Stochastic inventory control with partial demand observabilityOrtiz, Olga L. 01 April 2008 (has links)
This dissertation focuses on issues associated with the value of information in models of sequential decision making under uncertainty. All of these issues are motivated by inventory management problems. First, we study the effect of the accuracy of inventory counts on system performance when using a zero-memory controller in an inventory system that is modeled as a partially observed Markov decision process (POMDP). We derive conditions for which improving the accuracy of inventory counts will either (i) improve system performance, (ii) degrade system performance or (iii) will not affect system performance. With a computational study, we determine the range of profitability impacts that result from inaccurate inventory counts when using reasonable zero-memory control policies.
Second, we assess the value of demand observation quality in an inventory system with Markovian demand and lost sales. Again, the POMDP serves as a problem model, and we develop computationally tractable suboptimal algorithms to enable the computation of effective lower bounds on system profitability when demand observations are noise-corrupted. We then extend our results toconsider the effects that product substitution has on system performance. We show that systems with low demand variability, high holding cost levels, and high levels of substitution benefit more from demand bservability than systems with high demand variability, low holding cost levels, and low levels of substitution.
Third, to enhance our understanding of sequential inventory control with substitutable products, we analyze a two-item inventory problem with known deterministic primary demand, but stochastic one-way substitution. We model this problem as a MDP and show that a decision rule that minimizes the single period cost function, when applied at every decision epoch over the infinite horizon, is an optimal policy for the infinite horizon problem. A definition of increased substitutability is presented, and it is shown that increased substitutability never increases optimal expected total discounted cost.
|
32 |
Countering murphys law: the use of anticipation and improvisation via an episodic memory in support of intelligent robot behaviorEndo, Yoichiro 21 October 2008 (has links)
Recently in robotics, substantial efforts have been invested on critical applications such as military, nursing, and search-and-rescue. These applications are critical in a sense that the robots may directly deal with human lives in life-or-death situations, and they are therefore required to make highly intelligent decisions as rapidly as possible. The intelligence we are looking for in this type of situations is proactiveness: the ability to anticipate as well as improvise.
Anticipation here means that the robot can assess the current situation, predict the future consequence of the situation, and execute an action to have desired outcome based on the determined assessment and prediction. On the other hand, improvisation is performed when the consequence of the situation is not fully known. In other words, it is the ability to deal with a novel situation based on knowledge or skill being acquired before.
In this presentation, we introduce a biologically inspired computational model of proactive intelligent behavior for robots. Integrating multiple levels of machine learning techniques such as temporal difference learning, instance-based learning, and partially observable Markov decision process, aggregated episodic memories are processed in order to accomplish anticipation as well as improvisation. How this model can be implemented within a software architectural framework and integrated into a physically realized robotic system is also explained. The experimental results using a real robot and high fidelity 3D simulators are then presented in order to help us understand how extended experience of a robot influences its ability to behave proactively.
|
33 |
Reinforcement Learning with History ListsTimmer, Stephan 13 March 2009 (has links)
A very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.
|
34 |
Partially Observable Markov Decision Processes for Faster Object RecognitionOlafsson, Björgvin January 2016 (has links)
Object recognition in the real world is a big challenge in the field of computer vision. Given the potentially enormous size of the search space it is essential to be able to make intelligent decisions about where in the visual field to obtain information from to reduce the computational resources needed. In this report a POMDP (Partially Observable Markov Decision Process) learning framework, using a policy gradient method and information rewards as a training signal, has been implemented and used to train fixation policies that aim to maximize the information gathered in each fixation. The purpose of such policies is to make object recognition faster by reducing the number of fixations needed. The trained policies are evaluated by simulation and comparing them with several fixed policies. Finally it is shown that it is possible to use the framework to train policies that outperform the fixed policies for certain observation models.
|
35 |
[pt] O USO DE VANTS EM AJUDA HUMANITÁRIA: UMA METODOLOGIA BASEADA EM POMDP PARA ENCONTRAR VÍTIMAS / [en] THE USE OF UAVS IN HUMANITARIAN RELIEF: A POMDP BASED METHODOLOGY FOR FINDING VICTIMSRAISSA ZURLI BITTENCOURT BRAVO 23 June 2017 (has links)
[pt] O uso de Veículos Aéreos Não Tripulados (VANTs) na ajuda humanitária tem sido proposto por pesquisadores para localizar vítimas em áreas afetadas por desastres. A urgência desse tipo de operação é encontrar pessoas afetadas o mais rápido possível, o que significa que determinar a roteirização ótima para os VANTs é muito importante para salvar vidas. Como os VANTs tem que percorrer toda a área afetada para encontrar vítimas, a operação de roteirização se torna equivalente a um problema de cobertura. Neste trabalho, uma metodologia para resolver o problema de cobertura é proposta, baseada na heurística do Processo de Decisão de Markov Parcialmente Observável (POMDP), onde as observações feitas pelos VANTs são consideradas. Essa heurística escolhe as ações baseando-se nas informações disponíveis, essas informações são as ações e observações anteriores. A formulação da roteirização do VANT é baseada na ideia de dar prioridades mais altas às áreas mais propensas a terem vítimas. Para aplicar esta técnica em casos reais, foi criada uma metodologia que consiste em quatro etapas. Primeiramente, o problema é modelado em relação à área afetada, tipo de drone que será utilizado, resolução da câmera, altura média do voo, ponto de partida ou decolagem, além do tamanho e prioridade dos estados. Em seguida, a fim de testar a eficiência do algoritmo através de simulações, grupos de vítimas são distribuídos pela área a ser sobrevoada. Então, o algoritmo é iniciado e o drone, a cada iteração, muda de estado de acordo com a heurística POMDP, até percorrer toda a área afetada. Por fim, a eficiência do algoritmo é testada através de quatro estatísticas: distância percorrida, tempo de operação, percentual de cobertura e tempo para encontrar grupos de vítimas. Essa metodologia foi aplicada em dois exemplos ilustrativos: um tornado em Xanxerê, no Brasil, que foi um desastre de início súbito em Abril de 2015, e em um campo de refugiados no Sudão do Sul, um desastre de início lento que começou em 2013. Depois de fazer simulações, foi demonstrado que a solução cobre toda a área afetada por desastres em um período de tempo razoável. A distância percorrida pelo VANT e a duração da operação, que dependem do número de estados, não tiveram um desvio padrão significativo entre as simulações, o que significa que, ainda que existam vários caminhos possíveis devido ao empate das prioridades, o algoritmo tem resultados homogêneos. O tempo para encontrar grupos de vítimas, e portanto o sucesso da operação de resgate, depende da definição das prioridades dos estados, estabelecidas por um especialista. Caso as prioridades sejam mal definidas, o VANT começará a sobrevoar áreas sem vítimas, o que levará ao fracasso da operação de resgate, uma vez que o algoritmo não estará salvando vidas o mais rápido possível. Ainda foi feita uma comparação do algoritmo proposto com o método guloso. A princípio, esse método não cobriu 100 por cento da área afetada, o que tornou a comparação injusta. Para contornar esse problema, o algoritmo guloso foi forçado a percorrer 100 por cento da área afetada e os resultados mostram que o POMDP tem resultados melhores em relação ao tempo para salvar vítimas. Já em relação a distância percorrida e tempo de operação, os resultados são iguais ou melhores para o POMDP. Isso ocorre porque o algoritmo guloso tem o viés de otimizar distância percorrida e, logo, otimiza o tempo de operação. Já o POMDP tem como objetivo, nesta dissertação, salvar vidas e faz isso de forma dinâmica, atualizando sua distribuição de probabilidades a cada observação feita. O ineditismo desta metodologia é ressaltado no capítulo 3, onde mais de 139 trabalhos foram lidos e classificados com o intuito de mostrar quais são as aplicações que drones em logística humanitária, como o POMDP é usado em drones e como a técnica de simulação é utilizada em logística humanitária. Apenas um artigo propõe o uso de POMDP em operações de resgate com drones mas não aplica a técnica a casos reais. Pesquisas futuras podem aplicar a metodologia em desastres em áreas maiores, o que tornará necessário o uso de mais de um drone, pois a autonomia passará a ser uma restrição em termos de distância percorrida e tempo de operação. Outra sugestão é a aplicação da metodologia proposta em casos reais já que os pequenos VANTs são programáveis. Nesse caso, o experimento deve ocorrer em terrenos privados ou em áreas militares, para atender aos requisitos legais. / [en] The use of Unmanned Aerial Vehicles (UAVs) in humanitarian relief has been proposed by researchers for searching victims in disaster affected areas. The urgency of this type of operation is to find the affected people as soon as possible, which means that determining the optimal flight path for UAVs is very important to save lifes. Since the UAVs have to search through the entire affected area to find victims, the path planning operation becomes equivalent to an area coverage problem. In this study, a methodology to solve the coverage problem is proposed, based on a Partially Observable Markov Decision Processes (POMDP) heuristic, which considers the observations made from UAVs. The formulation of the UAV path planning is based on the idea of assigning higher priorities to the areas which are more likely to contain victims. The methodology was applied in two illustrative examples: a tornado in Xanxerê, Brazil, which was a rapid-onset disaster in April 2015 and a refugee s camp in South Sudan, a slow-onset disaster that started in 2013. After simulations, it is demonstrated that this solution achieves full coverage of disaster affected areas in a reasonable time span. The traveled distance and the operation s durations, which are dependent on the number of states, did not have a significative standard deviation between the simulations. It means that even if there were many possible paths, due to the tied priorities, the algorithm has homogeneous results. The time to find groups of victims, and so the success of the search and rescue operation, depends on the specialist s definition of states priorities. A comparison with a greedy algorithm showed that POMDP is faster to find victims while greedy s performance focuses on minimizing the traveled distance. Future research indicates a practical application of the methodology proposed.
|
36 |
Exploiting imprecise information sources in sequential decision making problems under uncertainty / Tirer profit de sources d'information imprécises pour la décision séquentielle dans l'incertainDrougard, Nicolas 18 December 2015 (has links)
Les Processus Décisionnels de Markov Partiellement Observables (PDMPOs) permettent de modéliser facilement lesproblèmes probabilistes de décision séquentielle dans l'incertain. Lorsqu'il s'agit d'une mission robotique, lescaractéristiques du robot et de son environnement nécessaires à la définition de la mission constituent le système. Son étatn'est pas directement visible par l'agent (le robot). Résoudre un PDMPO revient donc à calculer une stratégie qui remplit lamission au mieux en moyenne, i.e. une fonction prescrivant les actions à exécuter selon l'information reçue par l'agent. Cetravail débute par la mise en évidence, dans le contexte robotique, de limites pratiques du modèle PDMPO: ellesconcernent l'ignorance de l'agent, l'imprécision du modèle d'observation ainsi que la complexité de résolution. Unhomologue du modèle PDMPO appelé pi-PDMPO, simplifie la représentation de l'incertitude: il vient de la Théorie desPossibilités Qualitatives qui définit la plausibilité des événements de manière qualitative, permettant la modélisation del'imprécision et de l'ignorance. Une fois les modèles PDMPO et pi-PDMPO présentés, une mise à jour du modèle possibilisteest proposée. Ensuite, l'étude des pi-PDMPOs factorisés permet de mettre en place un algorithme appelé PPUDD utilisantdes Arbres de Décision Algébriques afin de résoudre plus facilement les problèmes structurés. Les stratégies calculées parPPUDD, testées par ailleurs lors de la compétition IPPC 2014, peuvent être plus efficaces que celles des algorithmesprobabilistes dans un contexte d'imprécision ou de grande dimension. Cette thèse propose d'utiliser les possibilitésqualitatives dans le but d'obtenir des améliorations en termes de temps de calcul et de modélisation. / Partially Observable Markov Decision Processes (POMDPs) define a useful formalism to express probabilistic sequentialdecision problems under uncertainty. When this model is used for a robotic mission, the system is defined as the featuresof the robot and its environment, needed to express the mission. The system state is not directly seen by the agent (therobot). Solving a POMDP consists thus in computing a strategy which, on average, achieves the mission best i.e. a functionmapping the information known by the agent to an action. Some practical issues of the POMDP model are first highlightedin the robotic context: it concerns the modeling of the agent ignorance, the imprecision of the observation model and thecomplexity of solving real world problems. A counterpart of the POMDP model, called pi-POMDP, simplifies uncertaintyrepresentation with a qualitative evaluation of event plausibilities. It comes from Qualitative Possibility Theory whichprovides the means to model imprecision and ignorance. After a formal presentation of the POMDP and pi-POMDP models,an update of the possibilistic model is proposed. Next, the study of factored pi-POMDPs allows to set up an algorithmnamed PPUDD which uses Algebraic Decision Diagrams to solve large structured planning problems. Strategies computedby PPUDD, which have been tested in the context of the competition IPPC 2014, can be more efficient than those producedby probabilistic solvers when the model is imprecise or for high dimensional problems. This thesis proposes some ways ofusing Qualitative Possibility Theory to improve computation time and uncertainty modeling in practice.
|
37 |
Relevance-based Online Planning in Complex POMDPsSaborío Morales, Juan Carlos 17 July 2020 (has links)
Planning under uncertainty is a central topic at the intersection of disciplines such as artificial intelligence, cognitive science and robotics, and its aim is to enable artificial agents to solve challenging problems through a systematic approach to decision-making. Some of these challenges include generating expectations about different outcomes governed by a probability distribution and estimating the utility of actions based only on partial information. In addition, an agent must incorporate observations or information from the environment into its deliberation process and produce the next best action to execute, based on an updated understanding of the world. This process is commonly modeled as a POMDP, a discrete stochastic system that becomes intractable very quickly. Many real-world problems, however, can be simplified following cues derived from contextual information about the relative expected value of actions. Based on an intuitive approach to problem solving, and relying on ideas related to attention and relevance estimation, we propose a new approach to planning supported by our two main contributions: PGS grants an agent the ability to generate internal preferences and biases to guide action selection, and IRE allows the agent to reduce the dimensionality of complex problems while planning online. Unlike existing work that improves the performance of planning on POMDPs, PGS and IRE do not rely on detailed heuristics or domain knowledge, explicit action hierarchies or manually designed dependencies for state factoring. Our results show that this level of autonomy is important to solve increasingly more challenging problems, where manually designed simplifications scale poorly.
|
38 |
Autonomous UAV Path Planning using RSS signals in Search and Rescue OperationsAnhammer, Axel, Lundeberg, Hugo January 2022 (has links)
Unmanned aerial vehicles (UAVs) have emerged as a promising technology in search and rescue operations (SAR). UAVs have the ability to provide more timely localization, thus decreasing the crucial duration of SAR operations. Previous work have demonstrated proof-of-concept in regard to localizing missing people by utilizing received signal strength (RSS) and UAVs. The localization system is based on the assumption that the missing person wears an enabled smartphone whose Wi-Fi signal can be intercepted. This thesis proposes a two-staged path planner for UAVs, utilizing RSS-signals and an initial belief regarding the missing person's location. The objective of the first stage is to locate an RSS-signal. By dividing the search area into grids, a hierarchical solution based on several Markov decision processes (MDPs) can be formulated which takes different areas probabilities into consideration. The objective of the second stage is to isolate the RSS-signal and provide a location estimate. The environment is deemed to be partially observable, and the problem is formulated as a partially observable Markov decision process (POMDP). Two different filters, a point mass filter (PMF) and a particle filter (PF), are evaluated in regard to their ability to correctly estimate the state of the environment. The state of the environment then acts as input to a deep Q-network (DQN) which selects appropriate actions for the UAV. Thus, the DQN becomes a path planner for the UAV and the trajectory it generates is compared to trajectories generated by, among others, a greedy-policy. Results for Stage 1 demonstrate that the path generated by the MDPs prioritizes areas with higher probability, and intuitively seems very reasonable. The results also illustrate potential drawbacks with a hierarchical solution, which potentially can be addressed by considering more factors into the problem. Simulation results for Stage 2 show that both a PMF and a PF can successfully be used to estimate the state of the environment and provide an accurate localization estimate. The PMF generated slightly more accurate estimations compared to the PF. The DQN is successful in isolating the missing person's probable location, by relatively few actions. However, it only performs marginally better than the greedy policy, indicating that it may be a complicated solution to a simpler problem.
|
39 |
Leveraging Help Requests In Pomdp Intelligent TutorsFolsom-Kovarik, Jeremiah 01 January 2012 (has links)
Intelligent tutoring systems (ITSs) are computer programs that model individual learners and adapt instruction to help each learner differently. One way ITSs differ from human tutors is that few ITSs give learners a way to ask questions. When learners can ask for help, their questions have the potential to improve learning directly and also act as a new source of model data to help the ITS personalize instruction. Inquiry modeling gives ITSs the ability to answer learner questions and refine their learner models with an inexpensive new input channel. In order to support inquiry modeling, an advanced planning formalism is applied to ITS learner modeling. Partially observable Markov decision processes (POMDPs) differ from more widely used ITS architectures because they can plan complex action sequences in uncertain situations with machine learning. Tractability issues have previously precluded POMDP use in ITS models. This dissertation introduces two improvements, priority queues and observation chains, to make POMDPs scale well and encompass the large problem sizes that real-world ITSs must confront. A new ITS was created to support trainees practicing a military task in a virtual environment. The development of the Inquiry Modeling POMDP Adaptive Trainer (IMP) began with multiple formative studies on human and simulated learners that explored inquiry modeling and POMDPs in intelligent tutoring. The studies suggest the new POMDP representations will be effective in ITS domains having certain common characteristics. iv Finally, a summative study evaluated IMP’s ability to train volunteers in specific practice scenarios. IMP users achieved post-training scores averaging up to 4.5 times higher than users who practiced without support and up to twice as high as trainees who used an ablated version of IMP with no inquiry modeling. IMP’s implementation and evaluation helped explore questions about how inquiry modeling and POMDP ITSs work, while empirically demonstrating their efficacy
|
40 |
Utilisation des communications Device-to-Device pour améliorer l'efficacité des réseaux cellulaires / Use of Device-to-Device communications for efficient cellular networksIbrahim, Rita 04 February 2019 (has links)
Cette thèse étudie les communications directes entre les mobiles, appelées communications D2D, en tant que technique prometteuse pour améliorer les futurs réseaux cellulaires. Cette technologie permet une communication directe entre deux terminaux mobiles sans passer par la station de base. La modélisation, l'évaluation et l'optimisation des différents aspects des communications D2D constituent les objectifs fondamentaux de cette thèse et sont réalisés principalement à l'aide des outils mathématiques suivants: la théorie des files d'attente, l'optimisation de Lyapunov et les processus de décision markovien partiellement observable POMDP. Les résultats de cette étude sont présentés en trois parties. Dans la première partie, nous étudions un schéma de sélection entre mode cellulaire et mode D2D. Nous dérivons les régions de stabilité des scénarios suivants: réseaux cellulaires purs et réseaux cellulaires où les communications D2D sont activées. Une comparaison entre ces deux scénarios conduit à l'élaboration d'un algorithme de sélection entre le mode cellulaire et le mode D2D qui permet d'améliorer la capacité du réseau. Dans la deuxième partie, nous développons un algorithme d'allocation de ressources des communications D2D. Les utilisateurs D2D sont en mesure d'estimer leur propre qualité de canal, cependant la station de base a besoin de recevoir des messages de signalisation pour acquérir cette information. Sur la base de cette connaissance disponibles au niveau des utilisateurs D2D, une approche d'allocation des ressources est proposée afin d'améliorer l'efficacité énergétique des communications D2D. La version distribuée de cet algorithme s'avère plus performante que celle centralisée. Dans le schéma distribué des collisions peuvent se produire durant la transmission de l'état des canaux D2D ; ainsi un algorithme de réduction des collisions est élaboré. En outre, la mise en œuvre des algorithmes centralisé et distribué dans un réseau cellulaire, type LTE, est décrite en détails. Dans la troisième partie, nous étudions une politique de sélection des relais D2D mobiles. La mobilité des relais représente un des principaux défis que rencontre toute stratégie de sélection de relais. Le problème est modélisé par un processus contraint de décision markovien partiellement observable qui prend en compte le dynamisme des relais et vise à trouver la politique de sélection de relais qui optimise la performance du réseau cellulaire sous des contraintes de coût. / This thesis considers Device-to-Device (D2D) communications as a promising technique for enhancing future cellular networks. Modeling, evaluating and optimizing D2D features are the fundamental goals of this thesis and are mainly achieved using the following mathematical tools: queuing theory, Lyapunov optimization and Partially Observed Markov Decision Process (POMDP). The findings of this study are presented in three parts. In the first part, we investigate a D2D mode selection scheme. We derive the queuing stability regions of both scenarios: pure cellular networks and D2D-enabled cellular networks. Comparing both scenarios leads us to elaborate a D2D vs cellular mode selection design that improves the capacity of the network. In the second part, we develop a D2D resource allocation algorithm. We observe that D2D users are able to estimate their local Channel State Information (CSI), however the base station needs some signaling exchange to acquire this information. Based on the D2D users' knowledge of their local CSI, we provide an energy efficient resource allocation framework that shows how distributed scheduling outperforms centralized one. In the distributed approach, collisions may occur between the different CSI reporting; thus, we propose a collision reduction algorithm. Moreover, we give a detailed description on how both centralized and distributed algorithms can be implemented in practice. In the third part, we propose a mobile relay selection policy in a D2D relay-aided network. Relays' mobility appears as a crucial challenge for defining the strategy of selecting the optimal D2D relays. The problem is formulated as a constrained POMDP which captures the dynamism of the relays and aims to find the optimal relay selection policy that maximizes the performance of the network under cost constraints.
|
Page generated in 0.0362 seconds