Spelling suggestions: "subject:"pomar"" "subject:"pombe""
31 |
Stochastic inventory control with partial demand observabilityOrtiz, Olga L. 01 April 2008 (has links)
This dissertation focuses on issues associated with the value of information in models of sequential decision making under uncertainty. All of these issues are motivated by inventory management problems. First, we study the effect of the accuracy of inventory counts on system performance when using a zero-memory controller in an inventory system that is modeled as a partially observed Markov decision process (POMDP). We derive conditions for which improving the accuracy of inventory counts will either (i) improve system performance, (ii) degrade system performance or (iii) will not affect system performance. With a computational study, we determine the range of profitability impacts that result from inaccurate inventory counts when using reasonable zero-memory control policies.
Second, we assess the value of demand observation quality in an inventory system with Markovian demand and lost sales. Again, the POMDP serves as a problem model, and we develop computationally tractable suboptimal algorithms to enable the computation of effective lower bounds on system profitability when demand observations are noise-corrupted. We then extend our results toconsider the effects that product substitution has on system performance. We show that systems with low demand variability, high holding cost levels, and high levels of substitution benefit more from demand bservability than systems with high demand variability, low holding cost levels, and low levels of substitution.
Third, to enhance our understanding of sequential inventory control with substitutable products, we analyze a two-item inventory problem with known deterministic primary demand, but stochastic one-way substitution. We model this problem as a MDP and show that a decision rule that minimizes the single period cost function, when applied at every decision epoch over the infinite horizon, is an optimal policy for the infinite horizon problem. A definition of increased substitutability is presented, and it is shown that increased substitutability never increases optimal expected total discounted cost.
|
32 |
Countering murphys law: the use of anticipation and improvisation via an episodic memory in support of intelligent robot behaviorEndo, Yoichiro 21 October 2008 (has links)
Recently in robotics, substantial efforts have been invested on critical applications such as military, nursing, and search-and-rescue. These applications are critical in a sense that the robots may directly deal with human lives in life-or-death situations, and they are therefore required to make highly intelligent decisions as rapidly as possible. The intelligence we are looking for in this type of situations is proactiveness: the ability to anticipate as well as improvise.
Anticipation here means that the robot can assess the current situation, predict the future consequence of the situation, and execute an action to have desired outcome based on the determined assessment and prediction. On the other hand, improvisation is performed when the consequence of the situation is not fully known. In other words, it is the ability to deal with a novel situation based on knowledge or skill being acquired before.
In this presentation, we introduce a biologically inspired computational model of proactive intelligent behavior for robots. Integrating multiple levels of machine learning techniques such as temporal difference learning, instance-based learning, and partially observable Markov decision process, aggregated episodic memories are processed in order to accomplish anticipation as well as improvisation. How this model can be implemented within a software architectural framework and integrated into a physically realized robotic system is also explained. The experimental results using a real robot and high fidelity 3D simulators are then presented in order to help us understand how extended experience of a robot influences its ability to behave proactively.
|
33 |
Reinforcement Learning with History ListsTimmer, Stephan 13 March 2009 (has links)
A very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.
|
34 |
Partially Observable Markov Decision Processes for Faster Object RecognitionOlafsson, Björgvin January 2016 (has links)
Object recognition in the real world is a big challenge in the field of computer vision. Given the potentially enormous size of the search space it is essential to be able to make intelligent decisions about where in the visual field to obtain information from to reduce the computational resources needed. In this report a POMDP (Partially Observable Markov Decision Process) learning framework, using a policy gradient method and information rewards as a training signal, has been implemented and used to train fixation policies that aim to maximize the information gathered in each fixation. The purpose of such policies is to make object recognition faster by reducing the number of fixations needed. The trained policies are evaluated by simulation and comparing them with several fixed policies. Finally it is shown that it is possible to use the framework to train policies that outperform the fixed policies for certain observation models.
|
35 |
Exploiting imprecise information sources in sequential decision making problems under uncertainty / Tirer profit de sources d'information imprécises pour la décision séquentielle dans l'incertainDrougard, Nicolas 18 December 2015 (has links)
Les Processus Décisionnels de Markov Partiellement Observables (PDMPOs) permettent de modéliser facilement lesproblèmes probabilistes de décision séquentielle dans l'incertain. Lorsqu'il s'agit d'une mission robotique, lescaractéristiques du robot et de son environnement nécessaires à la définition de la mission constituent le système. Son étatn'est pas directement visible par l'agent (le robot). Résoudre un PDMPO revient donc à calculer une stratégie qui remplit lamission au mieux en moyenne, i.e. une fonction prescrivant les actions à exécuter selon l'information reçue par l'agent. Cetravail débute par la mise en évidence, dans le contexte robotique, de limites pratiques du modèle PDMPO: ellesconcernent l'ignorance de l'agent, l'imprécision du modèle d'observation ainsi que la complexité de résolution. Unhomologue du modèle PDMPO appelé pi-PDMPO, simplifie la représentation de l'incertitude: il vient de la Théorie desPossibilités Qualitatives qui définit la plausibilité des événements de manière qualitative, permettant la modélisation del'imprécision et de l'ignorance. Une fois les modèles PDMPO et pi-PDMPO présentés, une mise à jour du modèle possibilisteest proposée. Ensuite, l'étude des pi-PDMPOs factorisés permet de mettre en place un algorithme appelé PPUDD utilisantdes Arbres de Décision Algébriques afin de résoudre plus facilement les problèmes structurés. Les stratégies calculées parPPUDD, testées par ailleurs lors de la compétition IPPC 2014, peuvent être plus efficaces que celles des algorithmesprobabilistes dans un contexte d'imprécision ou de grande dimension. Cette thèse propose d'utiliser les possibilitésqualitatives dans le but d'obtenir des améliorations en termes de temps de calcul et de modélisation. / Partially Observable Markov Decision Processes (POMDPs) define a useful formalism to express probabilistic sequentialdecision problems under uncertainty. When this model is used for a robotic mission, the system is defined as the featuresof the robot and its environment, needed to express the mission. The system state is not directly seen by the agent (therobot). Solving a POMDP consists thus in computing a strategy which, on average, achieves the mission best i.e. a functionmapping the information known by the agent to an action. Some practical issues of the POMDP model are first highlightedin the robotic context: it concerns the modeling of the agent ignorance, the imprecision of the observation model and thecomplexity of solving real world problems. A counterpart of the POMDP model, called pi-POMDP, simplifies uncertaintyrepresentation with a qualitative evaluation of event plausibilities. It comes from Qualitative Possibility Theory whichprovides the means to model imprecision and ignorance. After a formal presentation of the POMDP and pi-POMDP models,an update of the possibilistic model is proposed. Next, the study of factored pi-POMDPs allows to set up an algorithmnamed PPUDD which uses Algebraic Decision Diagrams to solve large structured planning problems. Strategies computedby PPUDD, which have been tested in the context of the competition IPPC 2014, can be more efficient than those producedby probabilistic solvers when the model is imprecise or for high dimensional problems. This thesis proposes some ways ofusing Qualitative Possibility Theory to improve computation time and uncertainty modeling in practice.
|
36 |
Relevance-based Online Planning in Complex POMDPsSaborío Morales, Juan Carlos 17 July 2020 (has links)
Planning under uncertainty is a central topic at the intersection of disciplines such as artificial intelligence, cognitive science and robotics, and its aim is to enable artificial agents to solve challenging problems through a systematic approach to decision-making. Some of these challenges include generating expectations about different outcomes governed by a probability distribution and estimating the utility of actions based only on partial information. In addition, an agent must incorporate observations or information from the environment into its deliberation process and produce the next best action to execute, based on an updated understanding of the world. This process is commonly modeled as a POMDP, a discrete stochastic system that becomes intractable very quickly. Many real-world problems, however, can be simplified following cues derived from contextual information about the relative expected value of actions. Based on an intuitive approach to problem solving, and relying on ideas related to attention and relevance estimation, we propose a new approach to planning supported by our two main contributions: PGS grants an agent the ability to generate internal preferences and biases to guide action selection, and IRE allows the agent to reduce the dimensionality of complex problems while planning online. Unlike existing work that improves the performance of planning on POMDPs, PGS and IRE do not rely on detailed heuristics or domain knowledge, explicit action hierarchies or manually designed dependencies for state factoring. Our results show that this level of autonomy is important to solve increasingly more challenging problems, where manually designed simplifications scale poorly.
|
37 |
Autonomous UAV Path Planning using RSS signals in Search and Rescue OperationsAnhammer, Axel, Lundeberg, Hugo January 2022 (has links)
Unmanned aerial vehicles (UAVs) have emerged as a promising technology in search and rescue operations (SAR). UAVs have the ability to provide more timely localization, thus decreasing the crucial duration of SAR operations. Previous work have demonstrated proof-of-concept in regard to localizing missing people by utilizing received signal strength (RSS) and UAVs. The localization system is based on the assumption that the missing person wears an enabled smartphone whose Wi-Fi signal can be intercepted. This thesis proposes a two-staged path planner for UAVs, utilizing RSS-signals and an initial belief regarding the missing person's location. The objective of the first stage is to locate an RSS-signal. By dividing the search area into grids, a hierarchical solution based on several Markov decision processes (MDPs) can be formulated which takes different areas probabilities into consideration. The objective of the second stage is to isolate the RSS-signal and provide a location estimate. The environment is deemed to be partially observable, and the problem is formulated as a partially observable Markov decision process (POMDP). Two different filters, a point mass filter (PMF) and a particle filter (PF), are evaluated in regard to their ability to correctly estimate the state of the environment. The state of the environment then acts as input to a deep Q-network (DQN) which selects appropriate actions for the UAV. Thus, the DQN becomes a path planner for the UAV and the trajectory it generates is compared to trajectories generated by, among others, a greedy-policy. Results for Stage 1 demonstrate that the path generated by the MDPs prioritizes areas with higher probability, and intuitively seems very reasonable. The results also illustrate potential drawbacks with a hierarchical solution, which potentially can be addressed by considering more factors into the problem. Simulation results for Stage 2 show that both a PMF and a PF can successfully be used to estimate the state of the environment and provide an accurate localization estimate. The PMF generated slightly more accurate estimations compared to the PF. The DQN is successful in isolating the missing person's probable location, by relatively few actions. However, it only performs marginally better than the greedy policy, indicating that it may be a complicated solution to a simpler problem.
|
38 |
Leveraging Help Requests In Pomdp Intelligent TutorsFolsom-Kovarik, Jeremiah 01 January 2012 (has links)
Intelligent tutoring systems (ITSs) are computer programs that model individual learners and adapt instruction to help each learner differently. One way ITSs differ from human tutors is that few ITSs give learners a way to ask questions. When learners can ask for help, their questions have the potential to improve learning directly and also act as a new source of model data to help the ITS personalize instruction. Inquiry modeling gives ITSs the ability to answer learner questions and refine their learner models with an inexpensive new input channel. In order to support inquiry modeling, an advanced planning formalism is applied to ITS learner modeling. Partially observable Markov decision processes (POMDPs) differ from more widely used ITS architectures because they can plan complex action sequences in uncertain situations with machine learning. Tractability issues have previously precluded POMDP use in ITS models. This dissertation introduces two improvements, priority queues and observation chains, to make POMDPs scale well and encompass the large problem sizes that real-world ITSs must confront. A new ITS was created to support trainees practicing a military task in a virtual environment. The development of the Inquiry Modeling POMDP Adaptive Trainer (IMP) began with multiple formative studies on human and simulated learners that explored inquiry modeling and POMDPs in intelligent tutoring. The studies suggest the new POMDP representations will be effective in ITS domains having certain common characteristics. iv Finally, a summative study evaluated IMP’s ability to train volunteers in specific practice scenarios. IMP users achieved post-training scores averaging up to 4.5 times higher than users who practiced without support and up to twice as high as trainees who used an ablated version of IMP with no inquiry modeling. IMP’s implementation and evaluation helped explore questions about how inquiry modeling and POMDP ITSs work, while empirically demonstrating their efficacy
|
39 |
Utilisation des communications Device-to-Device pour améliorer l'efficacité des réseaux cellulaires / Use of Device-to-Device communications for efficient cellular networksIbrahim, Rita 04 February 2019 (has links)
Cette thèse étudie les communications directes entre les mobiles, appelées communications D2D, en tant que technique prometteuse pour améliorer les futurs réseaux cellulaires. Cette technologie permet une communication directe entre deux terminaux mobiles sans passer par la station de base. La modélisation, l'évaluation et l'optimisation des différents aspects des communications D2D constituent les objectifs fondamentaux de cette thèse et sont réalisés principalement à l'aide des outils mathématiques suivants: la théorie des files d'attente, l'optimisation de Lyapunov et les processus de décision markovien partiellement observable POMDP. Les résultats de cette étude sont présentés en trois parties. Dans la première partie, nous étudions un schéma de sélection entre mode cellulaire et mode D2D. Nous dérivons les régions de stabilité des scénarios suivants: réseaux cellulaires purs et réseaux cellulaires où les communications D2D sont activées. Une comparaison entre ces deux scénarios conduit à l'élaboration d'un algorithme de sélection entre le mode cellulaire et le mode D2D qui permet d'améliorer la capacité du réseau. Dans la deuxième partie, nous développons un algorithme d'allocation de ressources des communications D2D. Les utilisateurs D2D sont en mesure d'estimer leur propre qualité de canal, cependant la station de base a besoin de recevoir des messages de signalisation pour acquérir cette information. Sur la base de cette connaissance disponibles au niveau des utilisateurs D2D, une approche d'allocation des ressources est proposée afin d'améliorer l'efficacité énergétique des communications D2D. La version distribuée de cet algorithme s'avère plus performante que celle centralisée. Dans le schéma distribué des collisions peuvent se produire durant la transmission de l'état des canaux D2D ; ainsi un algorithme de réduction des collisions est élaboré. En outre, la mise en œuvre des algorithmes centralisé et distribué dans un réseau cellulaire, type LTE, est décrite en détails. Dans la troisième partie, nous étudions une politique de sélection des relais D2D mobiles. La mobilité des relais représente un des principaux défis que rencontre toute stratégie de sélection de relais. Le problème est modélisé par un processus contraint de décision markovien partiellement observable qui prend en compte le dynamisme des relais et vise à trouver la politique de sélection de relais qui optimise la performance du réseau cellulaire sous des contraintes de coût. / This thesis considers Device-to-Device (D2D) communications as a promising technique for enhancing future cellular networks. Modeling, evaluating and optimizing D2D features are the fundamental goals of this thesis and are mainly achieved using the following mathematical tools: queuing theory, Lyapunov optimization and Partially Observed Markov Decision Process (POMDP). The findings of this study are presented in three parts. In the first part, we investigate a D2D mode selection scheme. We derive the queuing stability regions of both scenarios: pure cellular networks and D2D-enabled cellular networks. Comparing both scenarios leads us to elaborate a D2D vs cellular mode selection design that improves the capacity of the network. In the second part, we develop a D2D resource allocation algorithm. We observe that D2D users are able to estimate their local Channel State Information (CSI), however the base station needs some signaling exchange to acquire this information. Based on the D2D users' knowledge of their local CSI, we provide an energy efficient resource allocation framework that shows how distributed scheduling outperforms centralized one. In the distributed approach, collisions may occur between the different CSI reporting; thus, we propose a collision reduction algorithm. Moreover, we give a detailed description on how both centralized and distributed algorithms can be implemented in practice. In the third part, we propose a mobile relay selection policy in a D2D relay-aided network. Relays' mobility appears as a crucial challenge for defining the strategy of selecting the optimal D2D relays. The problem is formulated as a constrained POMDP which captures the dynamism of the relays and aims to find the optimal relay selection policy that maximizes the performance of the network under cost constraints.
|
40 |
Deep Reinforcement Learning for Autonomous Highway Driving ScenarioPradhan, Neil January 2021 (has links)
We present an autonomous driving agent on a simulated highway driving scenario with vehicles such as cars and trucks moving with stochastically variable velocity profiles. The focus of the simulated environment is to test tactical decision making in highway driving scenarios. When an agent (vehicle) maintains an optimal range of velocity it is beneficial both in terms of energy efficiency and greener environment. In order to maintain an optimal range of velocity, in this thesis work I proposed two novel reward structures: (a) gaussian reward structure and (b) exponential rise and fall reward structure. I trained respectively two deep reinforcement learning agents to study their differences and evaluate their performance based on a set of parameters that are most relevant in highway driving scenarios. The algorithm implemented in this thesis work is double-dueling deep-Q-network with prioritized experience replay buffer. Experiments were performed by adding noise to the inputs, simulating Partially Observable Markov Decision Process in order to obtain reliability comparison between different reward structures. Velocity occupancy grid was found to be better than binary occupancy grid as input for the algorithm. Furthermore, methodology for generating fuel efficient policies has been discussed and demonstrated with an example. / Vi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel.
|
Page generated in 0.0501 seconds