Spelling suggestions: "subject:"partially observable"" "subject:"partially observables""
1 |
Paying Attention to What Matters: Observation Abstraction in Partially Observable EnvironmentsWolfe, Alicia Peregrin 01 February 2010 (has links)
Autonomous agents may not have access to complete information about the state of the environment. For example, a robot soccer player may only be able to estimate the locations of other players not in the scope of its sensors. However, even though all the information needed for ideal decision making cannot be sensed, all that is sensed is usually not needed. The noise and motion of spectators, for example, can be ignored in order to focus on the game field. Standard formulations do not consider this situation, assuming that all the can be sensed must be included in any useful abstraction. This dissertation extends the Markov Decision Process Homomorphism framework (Ravindran, 2004) to partially observable domains, focusing specically on reducing Partially Observable Markov Decision Processes (POMDPs) when the model is known. This involves ignoring aspects of the observation function which are irrelevant to a particular task. Abstraction is particularly important in partially observable domains, as it enables the formation of a smaller domain model and thus more efficient use of the observed features.
|
2 |
Evolutionarily Stable Learning and Foraging StrategiesCOWNDEN, DANIEL 01 February 2012 (has links)
This thesis examines a series of problems with the goal of better understanding the fundamental dilemma of whether to invest effort in obtaining information that may lead to better opportunities in the future versus exploiting immediately available opportunities. In particular this work investigates how this dilemma is affected by competition in an evolutionary setting. To achieve this requires both the use of evolutionary game theory, and Markov decision procesess or stochastic dynamic programming. This thesis grows directly out of earlier work on the Social Learning Strategies Tournament. Although I cast the problem in the biological setting of optimal foraging theory, where it fills an obvious gap, this fundamental dilemma should also be of some interest to economists, operations researchers, as well as those working in ecology, evolution and behaviour. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2012-01-31 19:55:25.11
|
3 |
Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worldsCrook, Paul A. January 2007 (has links)
In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l).
|
4 |
Finite Memory Policies for Partially Observable Markov Decision ProessesLusena, Christopher 01 January 2001 (has links)
This dissertation makes contributions to areas of research on planning with POMDPs: complexity theoretic results and heuristic techniques. The most important contributions are probably the complexity of approximating the optimal history-dependent finite-horizon policy for a POMDP, and the idea of heuristic search over the space of FFTs.
|
5 |
Managing populations in the face of uncertainty: adaptive management, partial observability and the dynamic value of information.Moore, Alana L. January 2008 (has links)
The work presented in this thesis falls naturally into two parts. The first part (Chapter 2), is concerned with the benefit of perturbing a population into an immediately undesirable state, in order to improve estimates of a static probability which may improve long-term management. We consider finding the optimal harvest policy for a theoretical harvested population when a key parameter is unknown. We employ an adaptive management framework to study when it is worth sacrificing short term rewards in order to increase long term profits. / Active adaptive management has been increasingly advocated in natural resource management and conservation biology as a methodology for resolving key uncertainties about population dynamics and responses to management. However, when comparing management policies it is traditional to weigh future rewards geometrically (at a constant discount rate) which results in far-distant rewards making a negligible contribution to the total benefit. Under such a discounting scheme active adaptive management is rarely of much benefit, especially if learning is slow. In Chapter 2, we consider two proposed alternative forms of discounting for evaluating optimal policies for long term decisions which have a social component. / We demonstrate that discount functions which weigh future rewards more heavily result in more conservative harvesting strategies, but do not necessarily encourage active learning. Furthermore, the optimal management strategy is not equivalent to employing geometric discounting at a lower rate. If alternative discount functions are made mandatory in calculating optimal management policies for environmental management, then this will affect the structure of optimal management regimes and change when and how much we are willing to invest in learning. / The second part of this thesis is concerned with how to account for partial observability when calculating optimal management policies. We consider the problem of controlling an invasive pest species when only partial observations are available at each time step. In the model considered, the monitoring data available are binomial observations of a probability which is an index of the population size. We are again concerned with estimating a probability, however, in this model the probability is changing over time. / Before including partial observability explicitly, we consider a model in which perfect observations of the population are available at each time step (Chapter 3). It is intuitive that monitoring will be beneficial only if the management decision depends on the outcome. Hence, a necessary condition for monitoring to be worthwhile is that control polices which are specified in terms of the system state, out-perform simpler time-based control policies. Consequently, in addition to providing a benchmark against which we can compare the optimal management policy in the case of partial observations, analysing the perfect observation case also provides insight into when monitoring is likely to be most valuable. / In Chapters 4 and 5 we include partial observability by modelling the control problem as a partially observable Markov decision process (POMDP). We outline several tests which stem from a property of conservation of expected utility under monitoring, which aid in validating the model. We discuss the optimal management policy prescribed by the POMDP for a range of model scenarios, and use simulation to compare the POMDP management policy to several alternative policies, including controlling with perfect observations and no observations. / In Chapter 6 we propose an alternative model, developed in the spirit of a POMDP, that does not strictly satisfy the definition of a POMDP. We find that although the second model has some conceptually appealing attributes, it makes an undesirable implicit assumption about the underlying population dynamics.
|
6 |
Innovative Simulation and Tree Models and Reinforcement Learning Methods with Applications in CybersecurityLiu, Enhao January 2021 (has links)
No description available.
|
7 |
Localização multirrobo cooperativa com planejamento / Planning for multi-robot localizationPinheiro, Paulo Gurgel, 1983- 11 September 2018 (has links)
Orientador: Jacques Wainer / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-09-11T21:14:07Z (GMT). No. of bitstreams: 1
Pinheiro_PauloGurgel_M.pdf: 1259816 bytes, checksum: a4783df9aa3755becb68ee233ad43e3c (MD5)
Previous issue date: 2009 / Resumo: Em um problema de localização multirrobô cooperativa, um grupo de robôs encontra-se em um determinado ambiente, cuja localização exata de cada um dos robôs é desconhecida. Neste cenário, uma distribuição de probabilidades aponta as chances de um robô estar em um determinado estado. É necessário então, que os robôs se movimentem pelo ambiente e gerem novas observações que serão compartilhadas, para calcular novas estimativas. Nos últimos anos, muitos trabalhos têm focado no estudo de técnicas probabilísticas, modelos de comunicação e modelos de detecções, para resolver o problema de localização. No entanto, a movimentação dos robôs é, em geral, definida por ações aleatórias. Ações aleatórias geram observações que podem ser inúteis para a melhoria da estimativa. Este trabalho apresenta uma proposta de localização com suporte a planejamento de ações. O objetivo é apresentar um modelo cujas ações realizadas pelos robôs são definidas por políticas. Escolhendo a melhor ação a ser realizada, é possível receber informações mais úteis dos sensores internos e externos e estimar as posturas mais rapidamente. O modelo proposto, denominado Modelo de Localização Planejada - MLP, utiliza POMDPs para modelar os problemas de localização e algoritmos específicos de geração de políticas. Foi utilizada a localização de Markov como técnica probabilística de localização e implementadas versões de modelos de detecção e propagação de informação. Neste trabalho, um simulador de problemas de localização multirrobô foi desenvolvido, no qual foram realizados experimentos em que o modelo proposto foi comparado a um modelo que não faz uso de planejamento de ações. Os resultados obtidos apontam que o modelo proposto é capaz de estimar as posturas dos robôs com uma menor quantidade de passos, sendo significativamente mais e ciente do que o modelo comparado sem planejamento. / Abstract: In a cooperative multi-robot localization problem, a group of robots is in a certain environment, where the exact location of each robot is unknown. In this scenario, there is only a distribution of probabilities indicating the chance of a robot to be in a particular state. It is necessary for the robots to move in the environment generating new observations, which will be shared to calculate new estimates. Currently, many studies have
focused on the study of probabilistic techniques, models of communication and models of detection to solve the localization problem. However, the movement of robots is generally defined by random actions. Random actions generate observations that can be useless for improving the estimate. This work describes a proposal for multi-robot localization with support planning of actions. The objective is to describe a model whose actions performed by robots are defined by policies. Choosing the best action to be performed, the robot gets more useful information from internal and external sensors and estimates the posture more quickly. The proposed model, called Model of Planned Localization - MPL, uses POMDPs to model the problems of location and specific algorithms to generate policies. The Markov localization was used as probabilistic technique of localization and implemented versions of detection models and information propagation model. In this work, a simulator to multi-robot localization problems was developed, in which experiments were performed. The proposed model was compared to a model that does not make use of planning actions. The results showed that the proposed model is able to estimate the positions of robots with lower number of steps, being more e-cient than model compared. / Mestrado / Inteligencia Artificial / Mestre em Ciência da Computação
|
8 |
The Value Of Information In A Manufacturing Facility Taking Production And Lead Time Quotation DecisionsKaman, Cumhur 01 June 2011 (has links) (PDF)
Advancements in information technology enabled to track real time data in a more accurate and precise way in many manufacturing facilities. However, before obtaining the more accurate and precise data, the investment in information technology should be validated. Value of information may be adopted as a criterion in this investment. In this study, we analyze the value of information in a manufacturing facility where production and lead time quotation decisions are taken. In order to assess the value of information, two settings are analyzed. Under the first setting, the manufacturer takes decisions under perfect information. To find the optimal decisions under perfect information, a stochastic model is introduced. Under the second setting, the manufacturer takes decisions under imperfect information. To obtain a solution for this problem, Partially Observable Markov Decision Process is employed. Under the second setting, we study two approaches. In the first approach, we introduce a nonlinear programming model to find the optimal decisions. In the second approach, a heuristic approach, constructed on optimal actions taken under perfect information is presented. We examine the value of information under different parameters by considering the policies under nonlinear programming model and heuristic approach. The profit gap between the two policies is investigated. The effect of Make-to-Order (MTO) and Make-to-Stock (MTS) schemes on the value of information is analyzed. Lastly, different lead time quotation schemes / accept-all, accept-reject and precise lead time / are compared to find under which quotation scheme value of information is highest.
|
9 |
A MULTI-FUNCTIONAL PROVENANCE ARCHITECTURE: CHALLENGES AND SOLUTIONS2013 December 1900 (has links)
In service-oriented environments, services are put together in the form of a workflow with the aim of distributed problem solving.
Capturing the execution details of the services' transformations is a significant advantage of using workflows. These execution details, referred to as provenance information, are usually traced automatically and stored in provenance stores. Provenance data contains the data recorded by a workflow engine during a workflow execution. It identifies what data is passed between services, which services are involved, and how results are eventually generated for particular sets of input values.
Provenance information is of great importance and has found its way through areas in computer science such as: Bioinformatics, database, social, sensor networks, etc.
Current exploitation and application of provenance data is very limited as provenance systems started being developed for specific applications. Thus, applying learning and knowledge discovery methods to provenance data can provide rich and useful information on workflows and services.
Therefore, in this work, the challenges with workflows and services are studied to discover the possibilities and benefits of providing solutions by using provenance data.
A multifunctional architecture is presented which addresses the workflow and service issues by exploiting provenance data. These challenges include workflow composition, abstract workflow selection, refinement, evaluation, and graph model extraction. The specific contribution of the proposed architecture is its novelty in providing a basis for taking advantage of the previous execution details of services and workflows along with artificial intelligence and knowledge management techniques to resolve the major challenges regarding workflows. The presented architecture is application-independent and could be deployed in any area.
The requirements for such an architecture along with its building components are discussed. Furthermore, the responsibility of the components, related works and the implementation details of the architecture along with each component are presented.
|
10 |
Spoken Dialogue System for Information Navigation based on Statistical Learning of Semantic and Dialogue Structure / 意味・対話構造の統計的学習に基づく情報案内のための音声対話システムYoshino, Koichiro 24 September 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18614号 / 情博第538号 / 新制||情||95(附属図書館) / 31514 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 黒橋 禎夫, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
Page generated in 0.0515 seconds