Spelling suggestions: "subject:"markov decision canprocess"" "subject:"markov decision 3.3vprocess""
51 |
Selectively decentralized reinforcement learningNguyen, Thanh Minh 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The main contributions in this thesis include the selectively decentralized method in solving multi-agent reinforcement learning problems and the discretized Markov-decision-process (MDP) algorithm to compute the sub-optimal learning policy in completely unknown learning and control problems. These contributions tackle several challenges in multi-agent reinforcement learning: the unknown and dynamic nature of the learning environment, the difficulty in computing the closed-form solution of the learning problem, the slow learning performance in large-scale systems, and the questions of how/when/to whom the learning agents should communicate among themselves. Through this thesis, the selectively decentralized method, which evaluates all of the possible communicative strategies, not only increases the learning speed, achieves better learning goals but also could learn the communicative policy for each learning agent. Compared to the other state-of-the-art approaches, this thesis’s contributions offer two advantages. First, the selectively decentralized method could incorporate a wide range of well-known algorithms, including the discretized MDP, in single-agent reinforcement learning; meanwhile, the state-of-the-art approaches usually could be applied for one class of algorithms. Second, the discretized MDP algorithm could compute the sub-optimal learning policy when the environment is described in general nonlinear format; meanwhile, the other state-of-the-art approaches often assume that the environment is in limited format, particularly in feedback-linearization form. This thesis also discusses several alternative approaches for multi-agent learning, including Multidisciplinary Optimization. In addition, this thesis shows how the selectively decentralized method could successfully solve several real-worlds problems, particularly in mechanical and biological systems.
|
52 |
Improved Heuristic Search Algorithms for Decision-Theoretic PlanningAbdoulahi, Ibrahim 08 December 2017 (has links)
A large class of practical planning problems that require reasoning about uncertain outcomes, as well as tradeoffs among competing goals, can be modeled as Markov decision processes (MDPs). This model has been studied for over 60 years, and has many applications that range from stochastic inventory control and supply-chain planning, to probabilistic model checking and robotic control. Standard dynamic programming algorithms solve these problems for the entire state space. A more efficient heuristic search approach focuses computation on solving these problems for the relevant part of the state space only, given a start state, and using heuristics to identify irrelevant parts of the state space that can be safely ignored. This dissertation considers the heuristic search approach to this class of problems, and makes three contributions that advance this approach. The first contribution is a novel algorithm for solving MDPs that integrates the standard value iteration algorithm with branch-and-bound search. Called branch-and-bound value iteration, the new algorithm has several advantages over existing algorithms. The second contribution is the integration of recently-developed suboptimality bounds in heuristic search algorithm for MDPs, making it possible for iterative algorithms for solving these planning problems to detect convergence to a bounded-suboptimal solution. The third contribution is the evaluation and analysis of some techniques that are widely-used by state-of-the-art planning algorithms, the identification of some weaknesses of these techniques, and the development of a more efficient implementation of one of these techniques -- a solved-labeling procedure that speeds converge by leveraging a decomposition of the state-space graph of a planning problem into strongly-connected components. The new algorithms and techniques introduced in this dissertation are experimentally evaluated on a range of widely-used planning benchmarks.
|
53 |
Principals' Perceptions and Self-efficacy in Relation to School SecurityJones, Julian 01 January 2015 (has links)
Principals in the nation's schools have been tasked with managing crisis incidents that may occur with students and others on their campuses on a daily basis. The purposes of this study were to determine the differences, if any, that existed in Central Florida public school principals' perceptions regarding school security, their perceived confidence to address critical crisis incidents on their campuses, their perceptions of the likelihood critical incidents would occur, their perceptions of interaction with law enforcement, the critical incidents they fear the most, and their perceptions of factors impacting the incidents they fear the most. Principal subgroup mean responses to the Principal Safety and Security Perceptions Survey in the three areas of Bandura's (1997) triadic reciprocal causation were examined in the context of principals' gender, longevity, student enrollment, grade configuration, free and reduced lunch rate, presence of a law enforcement officer, and presence of a security plan. Findings revealed significant differences between categorical groups of principals in multiple areas. It was determined that significant differences in principals' perceptions warrant further study. Recommendations for practice include security policy development and practical application of noted trends.
|
54 |
An Operating System Architecture and Hybrid Scheduling Methodology for Real-Time Systems with UncertaintyApte, Manoj Shriganesh 11 December 2004 (has links)
Personal computer desktops, and other standardized computer architectures are optimized to provide the best performance for frequently occurring conditions. Real-time systems designed using worst-case analysis for such architectures under-utilize the hardware. This shortcoming provides the motivation for scheduling algorithms that can improve overall utilization by accounting for inherent uncertainty in task execution duration. A real-time task dispatcher must perform its function with constant scheduling overhead. Given the NP-hard nature of the problem of scheduling non-preemptible tasks, dispatch decisions for such systems cannot be made in real-time. This argues for a hybrid architecture that includes an offline policy generator, and an online dispatcher. This dissertation proposes, and demonstrates a hybrid operating system architecture that enables cost-optimal task dispatch on Commercial-Off-The-Shelf (COTS) systems. This is achieved by explicitly accounting for the stochastic nature of each task?s execution time, and dynamically learning the system behavior. Decision Theoretic Scheduling (DTS) provides the framework for scheduling under uncertainty. The real-time scheduling problem is cast as a Markov Decision Process (MDP). An offline policy generator discovers an epsilon-optimal policy using value iteration with model learning. For the selected representation of state, action, model, and rewards, the policydiscovered using value iteration is proved to have a probability of failure that is less than any arbitrarily small user-specified value. The PromisQoS operating system architecture demonstrates a practical implementation of the proposed approach. PromisQoS is a Linux based platform that supports concurrent execution of time-based (preemptible and non-preemptible) real-time tasks, and best-effort processes on an interactive workstation. Several examples demonstrate that model learning, and scheduling under uncertainty enables PromisQoS to achieve better CPU utilization than other scheduling methods. Real-time task sets that solve practical problems, such as a Laplace solver, matrix multiplication, and transpose, demonstrate the robustness and correctness of PromisQoS design and implementation. This pioneering application demonstrates the feasibility of MDP based scheduling for real-time tasks in practical systems. It also opens avenues for further research into the use of such DTS techniques in real-time system design.
|
55 |
Resource Allocation to Improve Equity in Service OperationsYang, Muer 23 September 2011 (has links)
No description available.
|
56 |
Analysis of Attacks on Controlled Stochastic SystemsRusso, Alessio January 2022 (has links)
In this thesis, we investigate attack vectors against Markov decision processes anddynamical systems. This work is motivated by the recent interest in the researchcommunity towards making Machine Learning models safer to malicious attacks. Wefocus on different attack vectors: (I) attacks that alter the input/output signal of aMarkov decision process; (II) eavesdropping attacks whose aim is to detect a change ina dynamical system; (III) poisoning attacks against data-driven control methods.(I) For attacks on Markov decision processes we focus on 2 types of attacks: (1) attacksthat alter the observations of the victim, and (2) attacks that alter the control signalof the victim. Regarding (1), we investigate the problem of devising optimal attacksthat minimize the collected reward of the victim. We show that when the policy andthe system are known to the attacker, designing optimal attacks amounts to solving aMarkov decision process. We also show that, for the victim, the system uncertaintiesinduced by the attack can be modeled using a Partially Observable Markov decisionprocess (POMDP) framework. We demonstrate that using Reinforcement Learningmethods tailored to POMDP lead to more resilient policies. Regarding (2), we insteadinvestigate the problem of designing optimal stealthy poisoning attacks on the controlchannel of Markov decision processes. Previous work constrained the amplitude ofthe adversarial perturbation, with the hope that this constraint will make the attackimperceptible. However, such constraints do not grant any level of undetectabilityand do not take into account the dynamic nature of the underlying Markov process.To design an optimal stealthy attack, we investigate a new attack formulation, basedon information-theoretical quantities, that considers the objective of minimizing thedetectability of the attack as well as the performance of the controlled process.(II) In the second part of this thesis we analyse the problem where an eavesdropper triesto detect a change in a Markov decision process. These processes may be affected bychanges that need to remain private. We study the problem using theoretical tools fromoptimal detection theory to motivate a definition of online privacy based on the averageamount of information per observation of the underlying stochastic system. We provideways to derive privacy upper-bounds and compute policies that attain a higher privacylevel, concluding with examples and numerical simulations.(III) Lastly, we investigate poisoning attacks against data-driven control methods.Specifically, we analyse how a malicious adversary can slightly poison the data soas to minimize the performance of a controller trained using this data. We show thatidentifying the most impactful attack boils down to solving a bi-level non-convexoptimization problem, and provide theoretical insights on the attack. We present ageneric algorithm finding a local optimum of this problem and illustrate our analysisfor various techniques. Numerical experiments reveal that minimal but well-craftedchanges in the data-set are sufficient to deteriorate the performance of data-drivencontrol methods significantly, and even make the closed-loop system unstable. / <p>QC 20220510</p><p></p><p>Topic: Alessio Russo - LicentiateTime: May 31, 2022 04:00 PM Madrid</p><p> Zoom Meeting link https://kth-se.zoom.us/j/69452765598</p>
|
57 |
Performance analysis of access control and resource management methods in heterogeneous networksPacheco Páramo, Diego Felipe 07 January 2014 (has links)
El escenario actual de las redes móviles se caracteriza por la creciente demanda de los usuarios por los servicios de datos, circunstancia que se ha visto potenciada por la popularidad de los teléfonos inteligentes y el auge de aplicaciones que necesitan de una conexión permanente a internet, como aquellas que hacen uso de recursos "en la nube" o los servicios de streaming para vídeo. El consumo de datos crece exponencialmente, tanto para los países desarrollados como en los países en desarrollo, y esto ha llevado a los operadores a plantearse soluciones que permitan proveer dichas condiciones de acceso.
Las redes heterogéneas se caracterizan por utilizar diferentes tecnologías de una manera coherente y organizada para proveer a los usuarios con la calidad de servicio requerida en sus comunicaciones, de tal manera que la comunicación sea para estos "transparente". Dicha heterogeneidad se puede dar a nivel de acceso, con la coexistencia de tecnologías como 802.11, WiMAX y redes móviles en sus diferentes generaciones, o incluso a nivel de capas dentro de las redes móviles con la coexistencia de macro, micro, pico y femto celdas. Haciendo un uso organizado de estos múltiples recursos, es posible optimizar las prestaciones de la red y proveer a los usuarios con una mejor calidad de servicio.
Pero la posibilidad de mejorar las prestaciones de la red no se da sólo por el uso simultáneo de estas tecnologías de acceso. Para mejorar la eficiencia en el uso del espectro electromagnético, un recurso limitado y subutilizado según diferentes estudios, se propuso la tecnología de cognitive radio. Por medio de esta tecnología es posible que un usuario sea capaz de medir el instante en el que una parte del espectro electromagnético no esta siendo utilizado para enviar información, siempre evitando interferir en las comunicaciones de aquellos usuarios que usan dicho espectro regularmente.
En el presente trabajo se proveen diferentes soluciones dentro del contexto de las redes heterogéneas que buscan optimizar el uso de los recursos disponibles en la red para proveer a los usuarios con la calidad de servicio esperada, ya sea por medio del control de acceso o la gestión de recursos.
Por un lado se estudia el efecto que la reserva de canales para realizar handoff espectral tiene sobre las prestaciones para los usuarios secundarios en un sistema de cognitive radio. Por otro lado se estudian políticas de acceso para una red en la que dos tecnologías de acceso están disponibles: TDMA y WCDMA, y los usuarios tienen acceso a los servicios de voz y datos. Por otro lado / Performance requirements on mobile networks are tighter than ever as a result
of the adoption of mobile devices such as smartphones or tablets, and
the QoS levels that mobile applications demand for their correct operation.
The data traffic volume carried in mobile networks for 2012 is the same as the
total internet traffic in 2000, and this exponential growth tendency will continue
in years to come. In order to fulfill users¿ expectations, it is imperative
for mobile networks to make the best use of the available resources.
Heterogeneous networks (Hetnets) have the ability to integrate several
technologies in a coherent and efficient manner in order to enhance users¿
experience. The first challenge of heterogeneous networks is to integrate several
radio access technologies, which exist as a result of simultaneous technology
developments and a paced replacement of legacy technology. A joint
management of several RAT¿s enhances network¿s efficiency, and this influences
user¿s experience. Another challenge of heterogeneous networks is the
improvement of current macrocells through an efficient use of the electromagnetic
spectrum. Some approaches aim to optimize the antennas or use
higher-order modulation techniques, but a more disruptive approach is the
use of dynamic spectrum techniques through a technology known as cognitive
radio. Finally, heterogeneous networks should be able to integrate several
layers. In addition to the well studied micro and pico cells, a new generation
of cheaper and easily configurable small cell networks have been proposed.
However, its success is attached to its ability to adapt to the current context
of mobile networks. / Pacheco Páramo, DF. (2013). Performance analysis of access control and resource management methods in heterogeneous networks [Tesis doctoral]. Editorial Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/34782
|
58 |
Cognitive Radar Applied To Target Tracking Using Markov Decision ProcessesSelvi, Ersin Suleyman 30 January 2018 (has links)
The radio-frequency spectrum is a precious resource, with many applications and users, especially with the recent spectrum auction in the United States. Future platforms and devices, such as radars and radios, need to be adaptive to their spectral environment in order to continue serving the needs of their users. This thesis considers an environment with one tracking radar, a single target, and a communications system. The radar-communications coexistence problem is modeled as a Markov decision process (MDP), and reinforcement learning is applied to drive the radar to optimal behavior. / Master of Science / The radio-frequency electromagnetic spectrum is a precious resource, in which users and operators are assigned frequency slots in which they can operate. The federal spectrum auction in the United States freed up some of the spectrum for shared use. The implications of this are the spectrum will become more dense; there will be more devices and users in the same amount of spectrum. The devices and platforms of this spectrum need to be more adaptive and agile in order to (1) not be interfered by other systems, (2) cause interference to other systems, and (3) continue to meet the needs of users (e.g. cell phone users) and operators (e.g. military radar). The work presented in this thesis applies Markov decision process and reinforcement learning to solve the problem.
|
59 |
Designförslag på belöningsfunktioner för självkörande bilar i TORCS som inte krockar / Design suggestion on reward functions for self-driving cars in TORCS that do not crashAndersson, Björn, Eriksson, Felix January 2018 (has links)
Den här studien använder sig av TORCS (The Open Racing Car Simulator) som är ett intressant spel att skapa självkörande bilar i då det finns nitton olika typer av sensorer som beskriver omgivningen för agenten. Problemet för denna studie har varit att identifiera vilka av alla dessa sensorer som kan användas i en belöningsfunktion och hur denna sedan skall implementeras. Studien har anammat en kvantitativa experimentell studie där forskningsfrågan är: Hur kan en belöningsfunktion utformas så att agenten klarar av att manövrera i spelet TORCS utan att krocka och med ett konsekvent resultat Den kvantitativ experimentell studien valdes då författarna behövde designa, implementera, utföra experiment och utvärdera resultatet för respektive belöningsfunktion. Det har utförts totalt femton experiment över tolv olika belöningsfunktioner i spelet TORCS på två olika banor E-Track 5(E-5) och Aalborg. De tolv belöningsfunktionerna utförde varsitt experiment på E-5 där de tre som fick bäst resultat: Charlie, Foxtrot och Juliette utförde ett experiment på Aalborg, då denna är en svårare bana. Detta för att kunna styrka om den kan köra på mer än en bana och om belöningsfunktionen då är generell. Juliette är den belöningsfunktion som var ensam med att klara både E-5 och Aalborg utan att krocka. Genom de utförda experimenten drogs slutsatsen att Juliette uppfyller forskningsfrågan då den klarar bägge banorna utan att krocka och när den lyckas får den ett konsekvent resultat. Studien har därför lyckats designa och implementera en belöningsfunktion som uppfyller forskningsfrågan. / For this study TORCS (The Open Racing Car Simulator) have been used, since it is an interesting game to create self-driving cars in. This is due to the fact there is nineteen different sensors available that describes the environment for the agent. The problem for this study has been to identify what sensor can be used in a reward function and how should this reward function be implemented. The study have been utilizing a quantitative experimental method where the research questions have been: How can a reward function be designed so that an Agent can maneuver in TORCS without crashing and at the same time have a consistent result The quantitative experimental method was picked since the writer’s hade to design, implement, conduct experiment and evaluate the result for each reward function. Fifteen experiments have been conducted over twelve reward functions on two different maps: E-Track 5 (E-5) and Aalborg. Each of the twelve reward function conducted an experiment on E-5, where the three once with the best result: Charlie, Foxtrot and Juliette conducted an additional experiment on Aalborg. The test on Aalborg was conducted in order to prove if the reward function can maneuver on more than one map. Juliette was the only reward function that managed to complete a lap on both E-5 and Aalborg without crashing. Based on the conducted experiment the conclusion that Juliette fulfills the research question was made, due to it being capable of completing both maps without crashing and if it succeeded it gets a consistent result. Therefor this study has succeeded in answering the research question.
|
60 |
Optimal mobility patterns in epidemic networksNirkhiwale, Supriya January 1900 (has links)
Master of Science / Department of Electrical and Computer Engineering / Caterina M. Scoglio / Disruption Tolerant Networks or opportunistic networks represent a class of networks where there is no contemporaneous path from source to destination. In other words, these are networks with intermittent connections. These networks are generally sparse or highly mobile wireless networks. Each node has a limited radio range and the connections between nodes may be disrupted due to node movement, hostile environments or power sleep schedules, etc. A common example of such networks is a sensor network monitoring nature or military field or a herd of animals under study. Epidemic routing is a widely proposed routing mechanism for data propagation in these type of networks. According to this mechanism, the source copies its packets to all the nodes it meets in its radio range. These nodes in turn copy the received packets to the other nodes they meet and so on. The data to be transmitted travels in a way analogous to the spread of an infection in a biological network. The destination finally receives the packet and measures are taken to eradicate the packet from the network. The task of routing in epidemic networks faces certain difficulties involving minimizing the delivery delay with a reduced consumption of resources. Every node has severe power constraints and the network is also susceptible to temporary but random failure of nodes. In the previous work, the parameter of mobility has been considered a constant for a certain setting. In our setting, we consider a varying parameter of mobility. In this framework, we determine the optimal mobility pattern and a forwarding policy that a network should follow in order to meet the trade-off between delivery delay and power consumption. In addition, the mobility pattern should be such that it can be practically incorporated. In our work, we formulate an optimization problem which is solved by using the principles of dynamic programming. We have tested the optimal algorithm through extensive simulations and they show that this optimization problem has a global solution.
|
Page generated in 0.062 seconds