Global ETD Search

51	Improved Heuristic Search Algorithms for Decision-Theoretic Planning Abdoulahi, Ibrahim 08 December 2017 (has links) A large class of practical planning problems that require reasoning about uncertain outcomes, as well as tradeoffs among competing goals, can be modeled as Markov decision processes (MDPs). This model has been studied for over 60 years, and has many applications that range from stochastic inventory control and supply-chain planning, to probabilistic model checking and robotic control. Standard dynamic programming algorithms solve these problems for the entire state space. A more efficient heuristic search approach focuses computation on solving these problems for the relevant part of the state space only, given a start state, and using heuristics to identify irrelevant parts of the state space that can be safely ignored. This dissertation considers the heuristic search approach to this class of problems, and makes three contributions that advance this approach. The first contribution is a novel algorithm for solving MDPs that integrates the standard value iteration algorithm with branch-and-bound search. Called branch-and-bound value iteration, the new algorithm has several advantages over existing algorithms. The second contribution is the integration of recently-developed suboptimality bounds in heuristic search algorithm for MDPs, making it possible for iterative algorithms for solving these planning problems to detect convergence to a bounded-suboptimal solution. The third contribution is the evaluation and analysis of some techniques that are widely-used by state-of-the-art planning algorithms, the identification of some weaknesses of these techniques, and the development of a more efficient implementation of one of these techniques -- a solved-labeling procedure that speeds converge by leveraging a decomposition of the state-space graph of a planning problem into strongly-connected components. The new algorithms and techniques introduced in this dissertation are experimentally evaluated on a range of widely-used planning benchmarks. Planning under Uncertainty Value Iteration Heuristic Search Suboptimality Bounds Action Elimination Markov Decision Process
52	Principals' Perceptions and Self-efficacy in Relation to School Security Jones, Julian 01 January 2015 (has links) Principals in the nation's schools have been tasked with managing crisis incidents that may occur with students and others on their campuses on a daily basis. The purposes of this study were to determine the differences, if any, that existed in Central Florida public school principals' perceptions regarding school security, their perceived confidence to address critical crisis incidents on their campuses, their perceptions of the likelihood critical incidents would occur, their perceptions of interaction with law enforcement, the critical incidents they fear the most, and their perceptions of factors impacting the incidents they fear the most. Principal subgroup mean responses to the Principal Safety and Security Perceptions Survey in the three areas of Bandura's (1997) triadic reciprocal causation were examined in the context of principals' gender, longevity, student enrollment, grade configuration, free and reduced lunch rate, presence of a law enforcement officer, and presence of a security plan. Findings revealed significant differences between categorical groups of principals in multiple areas. It was determined that significant differences in principals' perceptions warrant further study. Recommendations for practice include security policy development and practical application of noted trends. Natural energy markov decision process mdp load shedding energy storage intermittency expected cost Education Educational Leadership
53	An Operating System Architecture and Hybrid Scheduling Methodology for Real-Time Systems with Uncertainty Apte, Manoj Shriganesh 11 December 2004 (has links) Personal computer desktops, and other standardized computer architectures are optimized to provide the best performance for frequently occurring conditions. Real-time systems designed using worst-case analysis for such architectures under-utilize the hardware. This shortcoming provides the motivation for scheduling algorithms that can improve overall utilization by accounting for inherent uncertainty in task execution duration. A real-time task dispatcher must perform its function with constant scheduling overhead. Given the NP-hard nature of the problem of scheduling non-preemptible tasks, dispatch decisions for such systems cannot be made in real-time. This argues for a hybrid architecture that includes an offline policy generator, and an online dispatcher. This dissertation proposes, and demonstrates a hybrid operating system architecture that enables cost-optimal task dispatch on Commercial-Off-The-Shelf (COTS) systems. This is achieved by explicitly accounting for the stochastic nature of each task?s execution time, and dynamically learning the system behavior. Decision Theoretic Scheduling (DTS) provides the framework for scheduling under uncertainty. The real-time scheduling problem is cast as a Markov Decision Process (MDP). An offline policy generator discovers an epsilon-optimal policy using value iteration with model learning. For the selected representation of state, action, model, and rewards, the policydiscovered using value iteration is proved to have a probability of failure that is less than any arbitrarily small user-specified value. The PromisQoS operating system architecture demonstrates a practical implementation of the proposed approach. PromisQoS is a Linux based platform that supports concurrent execution of time-based (preemptible and non-preemptible) real-time tasks, and best-effort processes on an interactive workstation. Several examples demonstrate that model learning, and scheduling under uncertainty enables PromisQoS to achieve better CPU utilization than other scheduling methods. Real-time task sets that solve practical problems, such as a Laplace solver, matrix multiplication, and transpose, demonstrate the robustness and correctness of PromisQoS design and implementation. This pioneering application demonstrates the feasibility of MDP based scheduling for real-time tasks in practical systems. It also opens avenues for further research into the use of such DTS techniques in real-time system design. operating system Linux model improvement uncertainty markov decision process scheduling real-time
54	Resource Allocation to Improve Equity in Service Operations Yang, Muer 23 September 2011 (has links) No description available. Operations Research Equity Resource Allocation Simulation Optimization Markov decision process Intensive care unit Public Service
55	Analysis of Attacks on Controlled Stochastic Systems Russo, Alessio January 2022 (has links) In this thesis, we investigate attack vectors against Markov decision processes anddynamical systems. This work is motivated by the recent interest in the researchcommunity towards making Machine Learning models safer to malicious attacks. Wefocus on different attack vectors: (I) attacks that alter the input/output signal of aMarkov decision process; (II) eavesdropping attacks whose aim is to detect a change ina dynamical system; (III) poisoning attacks against data-driven control methods.(I) For attacks on Markov decision processes we focus on 2 types of attacks: (1) attacksthat alter the observations of the victim, and (2) attacks that alter the control signalof the victim. Regarding (1), we investigate the problem of devising optimal attacksthat minimize the collected reward of the victim. We show that when the policy andthe system are known to the attacker, designing optimal attacks amounts to solving aMarkov decision process. We also show that, for the victim, the system uncertaintiesinduced by the attack can be modeled using a Partially Observable Markov decisionprocess (POMDP) framework. We demonstrate that using Reinforcement Learningmethods tailored to POMDP lead to more resilient policies. Regarding (2), we insteadinvestigate the problem of designing optimal stealthy poisoning attacks on the controlchannel of Markov decision processes. Previous work constrained the amplitude ofthe adversarial perturbation, with the hope that this constraint will make the attackimperceptible. However, such constraints do not grant any level of undetectabilityand do not take into account the dynamic nature of the underlying Markov process.To design an optimal stealthy attack, we investigate a new attack formulation, basedon information-theoretical quantities, that considers the objective of minimizing thedetectability of the attack as well as the performance of the controlled process.(II) In the second part of this thesis we analyse the problem where an eavesdropper triesto detect a change in a Markov decision process. These processes may be affected bychanges that need to remain private. We study the problem using theoretical tools fromoptimal detection theory to motivate a definition of online privacy based on the averageamount of information per observation of the underlying stochastic system. We provideways to derive privacy upper-bounds and compute policies that attain a higher privacylevel, concluding with examples and numerical simulations.(III) Lastly, we investigate poisoning attacks against data-driven control methods.Specifically, we analyse how a malicious adversary can slightly poison the data soas to minimize the performance of a controller trained using this data. We show thatidentifying the most impactful attack boils down to solving a bi-level non-convexoptimization problem, and provide theoretical insights on the attack. We present ageneric algorithm finding a local optimum of this problem and illustrate our analysisfor various techniques. Numerical experiments reveal that minimal but well-craftedchanges in the data-set are sufficient to deteriorate the performance of data-drivencontrol methods significantly, and even make the closed-loop system unstable. / <p>QC 20220510</p><p></p><p>Topic: Alessio Russo - LicentiateTime: May 31, 2022 04:00 PM Madrid</p><p> Zoom Meeting link https://kth-se.zoom.us/j/69452765598</p> reinforcement learning markov decision process attack detection data poisoning online learning Control Engineering Reglerteknik
56	Cognitive Radar Applied To Target Tracking Using Markov Decision Processes Selvi, Ersin Suleyman 30 January 2018 (has links) The radio-frequency spectrum is a precious resource, with many applications and users, especially with the recent spectrum auction in the United States. Future platforms and devices, such as radars and radios, need to be adaptive to their spectral environment in order to continue serving the needs of their users. This thesis considers an environment with one tracking radar, a single target, and a communications system. The radar-communications coexistence problem is modeled as a Markov decision process (MDP), and reinforcement learning is applied to drive the radar to optimal behavior. / Master of Science Cognitive radar target tracking Markov decision process interference mitigation spectrum coexistence
57	Designförslag på belöningsfunktioner för självkörande bilar i TORCS som inte krockar / Design suggestion on reward functions for self-driving cars in TORCS that do not crash Andersson, Björn, Eriksson, Felix January 2018 (has links) Den här studien använder sig av TORCS (The Open Racing Car Simulator) som är ett intressant spel att skapa självkörande bilar i då det finns nitton olika typer av sensorer som beskriver omgivningen för agenten. Problemet för denna studie har varit att identifiera vilka av alla dessa sensorer som kan användas i en belöningsfunktion och hur denna sedan skall implementeras. Studien har anammat en kvantitativa experimentell studie där forskningsfrågan är: Hur kan en belöningsfunktion utformas så att agenten klarar av att manövrera i spelet TORCS utan att krocka och med ett konsekvent resultat Den kvantitativ experimentell studien valdes då författarna behövde designa, implementera, utföra experiment och utvärdera resultatet för respektive belöningsfunktion. Det har utförts totalt femton experiment över tolv olika belöningsfunktioner i spelet TORCS på två olika banor E-Track 5(E-5) och Aalborg. De tolv belöningsfunktionerna utförde varsitt experiment på E-5 där de tre som fick bäst resultat: Charlie, Foxtrot och Juliette utförde ett experiment på Aalborg, då denna är en svårare bana. Detta för att kunna styrka om den kan köra på mer än en bana och om belöningsfunktionen då är generell. Juliette är den belöningsfunktion som var ensam med att klara både E-5 och Aalborg utan att krocka. Genom de utförda experimenten drogs slutsatsen att Juliette uppfyller forskningsfrågan då den klarar bägge banorna utan att krocka och när den lyckas får den ett konsekvent resultat. Studien har därför lyckats designa och implementera en belöningsfunktion som uppfyller forskningsfrågan. / For this study TORCS (The Open Racing Car Simulator) have been used, since it is an interesting game to create self-driving cars in. This is due to the fact there is nineteen different sensors available that describes the environment for the agent. The problem for this study has been to identify what sensor can be used in a reward function and how should this reward function be implemented. The study have been utilizing a quantitative experimental method where the research questions have been: How can a reward function be designed so that an Agent can maneuver in TORCS without crashing and at the same time have a consistent result The quantitative experimental method was picked since the writer’s hade to design, implement, conduct experiment and evaluate the result for each reward function. Fifteen experiments have been conducted over twelve reward functions on two different maps: E-Track 5 (E-5) and Aalborg. Each of the twelve reward function conducted an experiment on E-5, where the three once with the best result: Charlie, Foxtrot and Juliette conducted an additional experiment on Aalborg. The test on Aalborg was conducted in order to prove if the reward function can maneuver on more than one map. Juliette was the only reward function that managed to complete a lap on both E-5 and Aalborg without crashing. Based on the conducted experiment the conclusion that Juliette fulfills the research question was made, due to it being capable of completing both maps without crashing and if it succeeded it gets a consistent result. Therefor this study has succeeded in answering the research question. Machine learning neural networks self-driving cars self-driving agent reward function Markov Decision Process TORCS maskininlärning neurala nätverk självkörande-bil självkörande-agent belöningsfunktioner Markov Decision Process TORCS Information Systems
58	Optimal mobility patterns in epidemic networks Nirkhiwale, Supriya January 1900 (has links) Master of Science / Department of Electrical and Computer Engineering / Caterina M. Scoglio / Disruption Tolerant Networks or opportunistic networks represent a class of networks where there is no contemporaneous path from source to destination. In other words, these are networks with intermittent connections. These networks are generally sparse or highly mobile wireless networks. Each node has a limited radio range and the connections between nodes may be disrupted due to node movement, hostile environments or power sleep schedules, etc. A common example of such networks is a sensor network monitoring nature or military field or a herd of animals under study. Epidemic routing is a widely proposed routing mechanism for data propagation in these type of networks. According to this mechanism, the source copies its packets to all the nodes it meets in its radio range. These nodes in turn copy the received packets to the other nodes they meet and so on. The data to be transmitted travels in a way analogous to the spread of an infection in a biological network. The destination finally receives the packet and measures are taken to eradicate the packet from the network. The task of routing in epidemic networks faces certain difficulties involving minimizing the delivery delay with a reduced consumption of resources. Every node has severe power constraints and the network is also susceptible to temporary but random failure of nodes. In the previous work, the parameter of mobility has been considered a constant for a certain setting. In our setting, we consider a varying parameter of mobility. In this framework, we determine the optimal mobility pattern and a forwarding policy that a network should follow in order to meet the trade-off between delivery delay and power consumption. In addition, the mobility pattern should be such that it can be practically incorporated. In our work, we formulate an optimization problem which is solved by using the principles of dynamic programming. We have tested the optimal algorithm through extensive simulations and they show that this optimization problem has a global solution. Disruption Tolerant Networks Epidemic routing Dynamic programming Markov Decision Process Computer Science (0984) Operations Research (0796)
59	Prefetching control for on-demand contents distribution : a Markov decision process study / Contrôle du préchargement pour la distribution de contenus à la demande : une approche par les processus de décision markoviens Morad, Olivia 17 September 2014 (has links) Le contexte de la thèse porte sur le contrôle des réseaux de distribution de contenu à la demande. La performance des systèmes distribués interactifs dépend essentiellement sur la prévision du comportement de l'utilisateur et la bande passante en tant que ressource de réseau critique. Le préchargement est une approche prédictive bien connu dans le World Wide Web ce qui évite les délais de réponse en exploitant un temps d'arrêt que permet d'anticiper les futures demandes de l'utilisateur et prend avantage des ressources réseau disponibles. Le contrôle de préchargement est une opération vitale pour les systèmes à la demande interactifs où la réponse instantanée est le facteur crucial pour la réussite du système. Le contrôleur en ce type de système interactif fonctionne dans un environnement incertain et rend séquences de décisions à court et long terme effets stochastique. La difficulté est alors de déterminer à chaque état du système les contenus préchargés dans le cache. Le plan de préchargement pendant une session en flux continu interactif peut être modélisé comme un problème de décision séquentielle par les processus de décision de Markov (MDP). Nous nous concentrons sur le problème de contrôle de préchargement, dans lequel le contrôleur cherche à atteindre l'état du système à coût zéro aussi vite que possible. Nous modélisons ce problème de contrôle comme un problème de programmation dynamique stochastique négatif dans lequel nous minimisons le coût total prévu. Dans ce contexte, nous avons abordé les questions de recherche suivantes: 1) Comment fournir un politique de préchargement optimale/ approximative optimale qui maximise l'utilisation de la bande passante tout en minimisant les coûts de blocage et de la latence de l'utilisateur engagés sur le chemin? 2) Comment exploiter la structure du modèle de contrôle de préchargement pour aider efficacement calculer la politique de contrôle de préchargement avec la réduction des efforts de calcul et la mémoire de stockage? 3) Comment mener une étude d'évaluation pour évaluer le préchargement de différents algorithmes heuristiques basée sur le contexte de l'optimisation au lieu du cadre de l'empirique / simulation. Pour l'étude de notre problème de recherche, nous avons développé notre modèle MDP de préchargement, PREF-CT, nous avons établi ses propriétés théoriques et nous avons résolu par l'algorithme Value Iteration comme algorithme MDP pour calculer la politique de préchargement optimale. Pour calcul de la politique de préchargement optimale efficace, nous avons détecté une structure spéciale qui réalise un modèle de contrôle plus compact. Cette structure spéciale permet de développer deux algorithmes différents stratégiquement qui améliorent la complexité du calcul de la politique de préchargement optimale: - la première est « ONE-PASS » le second est « TREE-DEC ». Pour surmonter le problème de la dimensionnalité résultant du calcul de la politique de préchargement optimale, nous avons proposé l'algorithme de préchargement heuristique: « Relevant Blocks Prefetching » (RBP). Pour évaluer et comparer le préchargement politiques calculés par des algorithmes de préchargement heuristiques différents, nous avons présenté un cadre fondé sur des différentes mesures de performance. Nous avons appliqué le cadre proposé sous différentes configurations de coûts et différents comportements des utilisateurs pour évaluer les politiques de préchargement calculées par notre algorithme de préchargement proposé; RBP. Par rapport aux politiques de préchargement optimales, l'analyse expérimentale a prouvé des performances significatives des politiques de préchargement de l'heuristique du RBP algorithme. En outre, l'algorithme heuristique de préchargement; RBP se distingue par une propriété de clustériser qui est important pour réduire considérablement la mémoire nécessaire pour stocker la politique de préchargement. / The thesis context is concerned with the control of theOn-demand contents distribution networks. The performance of suchinteractive distributed systems basically depends on the prediction ofthe user behavior and the bandwidth as a critical network resource.Prefetching is a well-known predictive approach in the World Wide Webwhich avoids the response delays by exploiting some downtime thatpermits to anticipate the user future requests and takes advantage ofthe available network resources. Prefetching control is a vitaloperation for the On-demand interactive systems where the instantaneousresponse is the crucial factor for the system success. The controller insuch type of interactive system operates in an uncertain environment andmakes sequences of decisions with long and short term stochasticeffects. The difficulty, then, is to determine at every system statewhich contents to prefetch into the cache. The prefetching plan duringan interactive streaming session can be modeled as a sequential decisionmaking problem by a Markov Decision Process (MDP). We focus on theprefetching control problem in which the controller seeks to reach aZero-Cost system state as quickly as possible. We model this controlproblem as a Negative Stochastic Dynamic Programming problem in which weminimize the undiscounted total expected cost. Within this context, weaddressed the following research questions: 1) How to provide anoptimal/approximate-optimal prefetching policy that, maximizes thebandwidth utilization while minimizes the user's blocking and latencycosts incurred along the way? 2) How to exploit structure in theprefetching control model to help efficiently compute such prefetchingcontrol policy with both computational efforts and storage memoryreduction? 3) How to conduct a performance evaluation study to evaluatedifferent prefetching heuristic algorithms based on the context of thecontrol optimization rather than the context of theempirical/simulation. For studying our research problem, we developedour MDP prefetching control model, PREF-CT, we established itstheoretical properties and we solved it by the Value Iteration algorithmas MDP algorithm for computing the optimal prefetching policy. Forcomputing the optimal prefetching policy efficiently, we detected aspecial structure that achieves more compact control model. This specialstructure permits to develop two strategically different algorithmswhich improve the complexities of computing the optimal prefetchingpolicy: - the first one is the ONE-PASS which is based mainly on solvinga system of linear equations simultaneously in only one iteration,whereas the second is the TREE-DEC which is based on Markov decisiontree decomposition in which sequential sets of systems of equations aresolved. For overcoming the problem of the curse of dimensionalityresulting from the computation of the optimal prefetching policy, weproposed the prefetching heuristic algorithm: the Relevant BlocksPrefetching algorithm (RBP). For evaluating and comparing prefetchingpolicies computed by different prefetching heuristic algorithms, wepresented a framework based on different performance measures. Weapplied the suggested framework under different costs configurations anddifferent user behaviors to evaluate the prefetching policies computedby our proposed prefetching heuristic algorithm; the RBP. Compared tothe optimal prefetching policies, the experimental analysis provedsignificant performance of the prefetching policies of the RBP heuristicalgorithm. In addition, the RBP prefetching heuristic algorithm isdistinguished by a clustering property which is of importance to reducesignificantly the memory necessary to store the prefetching policy tothe controller. Préchargement Contrôle optimal Processus de décision Markoviens Contenu à la demand Distribution Prefetching Optimal Control Markov Decision Process On-Demand contents Distribution
60	Estratégias para otimização do algoritmo de Iteração de Valor Sensível a Risco / Strategies for optimization of Risk Sensitive Value Iteration algorithm Igor Oliveira Borges 11 October 2018 (has links) Processos de decisão markovianos sensíveis a risco (Risk Sensitive Markov Decision Process - RS-MDP) permitem modelar atitudes de aversão e propensão ao risco no processo de tomada de decisão usando um fator de risco para representar a atitude ao risco. Para esse modelo, existem operadores que são baseados em funções de transformação linear por partes que incluem fator de risco e fator de desconto. Nesta dissertação são formulados dois algoritmos de Iteração de Valor Sensível a Risco baseados em um desses operadores, esses algoritmos são chamados de Iteração de Valor Sensível a Risco Síncrono (Risk Sensitive Value Iteration - RSVI) e Iteração de Valor Sensível a Risco Assíncrono (Asynchronous Risk Sensitive Value Iteration- A-RSVI). Também são propostas duas heurísticas que podem ser utilizadas para inicializar os valores dos algoritmos de forma a torná-los mais eficentes. Os resultados dos experimentos no domínio de Travessia do Rio em dois cenários de recompensas distintos mostram que: (i) o custo de processamento de políticas extremas a risco, tanto de aversão quanto de propensão, é elevado; (ii) um desconto elevado aumenta o tempo de convergência do algoritmo e reforça a sensibilidade ao risco adotada; (iii) políticas com valores para o fator de risco intermediários possuem custo computacional baixo e já possuem certa sensibilidade ao risco dependendo do fator de desconto utilizado; e (iv) o algoritmo A-RSVI com a heurística baseada no fator de risco pode reduzir o tempo para o algoritmo convergir, especialmente para valores extremos do fator de risco / Risk Sensitive Markov Decision Process (RS-MDP) allows modeling risk-averse and risk-prone attitudes in decision-making process using a risk factor to represent the risk-attitude. For this model, there are operators that are based on a piecewise linear transformation function that includes a risk factor and a discount factor. In this dissertation we formulate two Risk Sensitive Value Iteration algorithms based on one of these operators, these algorithms are called Synchronous Risk Sensitive Value Iteration (RSVI) and Asynchronous Risk Sensitive Value Iteration (A-RSVI). We also propose two heuristics that can be used to initialize the value of the RSVI or A-RSVI algorithms in order to make them more efficient. The results of experiments with the River domain in two distinct rewards scenarios show that: (i) the processing cost in extreme risk policies, for both risk-averse and risk-prone, is high; (ii) a high discount value increases the convergence time and reinforces the chosen risk attitude; (iii) policies with intermediate risk factor values have a low computational cost and show a certain sensitivity to risk based on the discount factor; and (iv) the A-RSVI algorithm with the heuristic based on the risk factor can decrease the convergence time of the algorithm, especially when we need a solution for extreme values of the risk factor Planejamento Estocástico Política Sensível a Risco Risk Sensitive Markov Decision Process Risk Sensitive Policy Stochastic Planning

Search results