Global ETD Search

1	Adaptive Fuzzy Reinforcement Learning for Flock Motion Control Qu, Shuzheng 06 January 2022 (has links) The flock-guidance problem enjoys a challenging structure where multiple optimization objectives are solved simultaneously. This usually necessitates different control approaches to tackle various objectives, such as guidance, collision avoidance, and cohesion. The guidance schemes, in particular, have long suffered from complex tracking-error dynamics. Furthermore, techniques that are based on linear feedback or output feedback strategies obtained at equilibrium conditions either may not hold or degrade when applied to uncertain dynamic environments. Relying on potential functions, embedded within pre-tuned fuzzy inference architectures, lacks robustness under dynamic disturbances. This thesis introduces two adaptive distributed approaches for the autonomous control of multi-agent systems. The first proposed technique has its structure based on an online fuzzy reinforcement learning Value Iteration scheme which is precise and flexible. This distributed adaptive control system simultaneously targets a number of flocking objectives; namely: 1) tracking the leader, 2) keeping a safe distance from the neighboring agents, and 3) reaching a velocity consensus among the agents. In addition to its resilience in the face of dynamic disturbances, the algorithm does not require more than the agent’s position as a feedback signal. The effectiveness of the proposed method is validated with two simulation scenarios and benchmarked against a similar technique from the literature. The second technique is in the form of an online fuzzy recursive least squares-based Policy Iteration control scheme, which employs a recursive least squares algorithm to estimate the weights in the leader tracking subsystem, as a substitute for the original reinforcement learning actor-critic scheme adopted in the first technique. The recursive least squares algorithm demonstrates a faster approximation weight convergence. The time-invariant communication graph utilized in the fuzzy reinforcement learning method is also improved with time-varying graphs, which can smoothly guide the agents to reach a speed consensus. The fuzzy recursive least squares-based technique is simulated with a few scenarios and benchmarked against the fuzzy reinforcement learning method. The scenarios are simulated in CoppeliaSim for a better visualization and more realistic results. reinforcement multi-agent value iteration policy iteration
2	Verifying Value Iteration and Policy Iteration in Coq Masters, David M. 01 June 2021 (has links) No description available. Computer Science Reinforcement Learning Software Verification Coq Value Iteration Policy Iteration
3	Improved Heuristic Search Algorithms for Decision-Theoretic Planning Abdoulahi, Ibrahim 08 December 2017 (has links) A large class of practical planning problems that require reasoning about uncertain outcomes, as well as tradeoffs among competing goals, can be modeled as Markov decision processes (MDPs). This model has been studied for over 60 years, and has many applications that range from stochastic inventory control and supply-chain planning, to probabilistic model checking and robotic control. Standard dynamic programming algorithms solve these problems for the entire state space. A more efficient heuristic search approach focuses computation on solving these problems for the relevant part of the state space only, given a start state, and using heuristics to identify irrelevant parts of the state space that can be safely ignored. This dissertation considers the heuristic search approach to this class of problems, and makes three contributions that advance this approach. The first contribution is a novel algorithm for solving MDPs that integrates the standard value iteration algorithm with branch-and-bound search. Called branch-and-bound value iteration, the new algorithm has several advantages over existing algorithms. The second contribution is the integration of recently-developed suboptimality bounds in heuristic search algorithm for MDPs, making it possible for iterative algorithms for solving these planning problems to detect convergence to a bounded-suboptimal solution. The third contribution is the evaluation and analysis of some techniques that are widely-used by state-of-the-art planning algorithms, the identification of some weaknesses of these techniques, and the development of a more efficient implementation of one of these techniques -- a solved-labeling procedure that speeds converge by leveraging a decomposition of the state-space graph of a planning problem into strongly-connected components. The new algorithms and techniques introduced in this dissertation are experimentally evaluated on a range of widely-used planning benchmarks. Planning under Uncertainty Value Iteration Heuristic Search Suboptimality Bounds Action Elimination Markov Decision Process
4	Experimental Evaluation of Error bounds for the Stochastic Shortest Path Problem Abdoulahi, Ibrahim 14 December 2013 (has links) A stochastic shortest path (SSP) problem is an undiscounted Markov decision process with an absorbing and zero-cost target state, where the objective is to reach the target state with minimum expected cost. This problem provides a foundation for algorithms for decision-theoretic planning and probabilistic model checking, among other applications. This thesis describes an implementation and evaluation of recently developed error bounds for SSP problems. The bounds can be used in a test for convergence of iterative dynamic programming algorithms for solving SSP problems, as well as in action elimination procedures that can accelerate convergence by excluding provably suboptimal actions that do not need to be re-evaluated each iteration. The techniques are shown to be effective for both decision-theoretic planning and probabilistic model checking. Stochastic shortest path problem Error bounds Value iteration Convergence Sub-optimality test
5	Solving Large MDPs Quickly with Partitioned Value Iteration Wingate, David 14 June 2004 (has links) (PDF) Value iteration is not typically considered a viable algorithm for solving large-scale MDPs because it converges too slowly. However, its performance can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques which focus computation in useful regions of the state space. In order to scale to solve ever larger problems, we evaluate all enhancements and methods in the context of parallelizability. Using the enhancements, we discover that in many instances the limiting factor of the algorithms is no longer time, but space. We thus evaluate all metrics and decisions with respect to cache performance. We generate a family of algorithms by combining several of the methods discussed, and present empirical evidence demonstrating that performance can improve by several orders of magnitude for real-world problems, while preserving accuracy and convergence guarantees. Machine learning reinforcement learning value iteration Markov Decision Processes Computer Sciences
6	Úlohy stochastického dynamického programování: teorie a aplikace / Stochastic Dynamic Programming Problems: Theory and Applications. Lendel, Gabriel January 2012 (has links) Title: Stochastic Dynamic Programming Problems: Theory and Applications Author: Gabriel Lendel Department: Department of Probability and Mathematical Statistics Supervisor: Ing. Karel Sladký CSc. Supervisor's e-mail address: sladky@utia.cas.cz Abstract: In the present work we study Markov decision processes which provide a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. We study iterative procedures for finding policy that is optimal or nearly optimal with respect to the selec- ted criteria. Specifically, we mainly examine the task of finding a policy that is optimal with respect to the total expected discounted reward or the average expected reward for discrete or continuous systems. In the work we study policy iteration algorithms and aproximative value iteration algorithms. We give numerical analysis of specific problems. Keywords: Stochastic dynamic programming, Markov decision process, policy ite- ration, value iteration
7	Stability analysis of new paradigms in wireless networks Kangas, M. (Maria) 02 June 2017 (has links) Abstract Fading in wireless channels, the limited battery energy available in wireless handsets, the changing user demands and the increasing demand for high data rate and low delay pose serious design challenges in the future generations of mobile communication systems. It is necessary to develop efficient transmission policies that adapt to changes in network conditions and achieve the target delay and rate with minimum power consumption. In this thesis, a number of new paradigms in wireless networks are presented. Dynamic programming tools are used to provide dynamic network stabilizing resource allocation solutions for virtualized data centers with clouds, cooperative networks and heterogeneous networks. Exact dynamic programming is used to develop optimal resource allocation and topology control policies for these networks with queues and time varying channels. In addition, approximate dynamic programming is also considered to provide new sub-optimal solutions. Unified system models and unified control problems are also provided for both secondary service provider and primary service provider cognitive networks and for conventional wireless networks. The results show that by adapting to the changes in queue lengths and channel states, the dynamic policy mitigates the effects of primary service provider and secondary service provider cognitive networks on each other. We investigate the network stability and provide new unified stability regions for primary service provider and secondary service provider cognitive networks as well as for conventional wireless networks. The K-step Lyapunov drift is used to analyse the performance and stability of the proposed dynamic control policies, and new unified stability analysis and queuing bound are provided for both primary service provider and secondary service provider cognitive networks and for conventional wireless networks. By adapting to the changes in network conditions, the dynamic control policies are shown to stabilize the network and to minimize the bound for the average queue length. In addition, we prove that the previously proposed frame based does not minimize the bound for the average delay, when there are shared resources between the terminals with queues. / Tiivistelmä Langattomien kanavien häipyminen, langattomien laitteiden akkujen rajallinen koko, käyttäjien käyttötarpeiden muutokset sekä lisääntyvän tiedonsiirron ja lyhyemmän viiveen vaatimukset luovat suuria haasteita tulevaisuuden langattomien verkkojen suunnitteluun. On välttämätöntä kehittää tehokkaita resurssien allokointialgoritmeja, jotka sopeutuvat verkkojen muutoksiin ja saavuttavat sekä tavoiteviiveen että tavoitedatanopeuden mahdollisimman pienellä tehon kulutuksella. Tässä väitöskirjassa esitetään uusia paradigmoja langattomille tietoliikenneverkoille. Dynaamisen ohjelmoinnin välineitä käytetään luomaan dynaamisia verkon stabiloivia resurssien allokointiratkaisuja virtuaalisille pilvipalveludatakeskuksille, käyttäjien yhteistyöverkoille ja heterogeenisille verkoille. Tarkkoja dynaamisen ohjelmoinnin välineitä käytetään kehittämään optimaalisia resurssien allokointi ja topologian kontrollointialgoritmeja näille jonojen ja häipyvien kanavien verkoille. Tämän lisäksi, estimoituja dynaamisen ohjelmoinnin välineitä käytetään luomaan uusia alioptimaalisia ratkaisuja. Yhtenäisiä systeemimalleja ja yhtenäisiä kontrollointiongelmia luodaan sekä toissijaisen ja ensisijaisen palvelun tuottajan kognitiivisille verkoille että tavallisille langattomille verkoille. Tulokset osoittavat että sopeutumalla jonojen pituuksien ja kanavien muutoksiin dynaaminen tekniikka vaimentaa ensisijaisen ja toissijaisen palvelun tuottajien kognitiivisten verkkojen vaikutusta toisiinsa. Tutkimme myös verkon stabiiliutta ja luomme uusia stabiilisuusalueita sekä ensisijaisen ja toissijaisen palveluntuottajan kognitiivisille verkoille että tavallisille langattomille verkoille. K:n askeleen Lyapunovin driftiä käytetään analysoimaan dynaamisen kontrollointitekniikan suorituskykyä ja stabiiliutta. Lisäksi uusi yhtenäinen stabiiliusanalyysi ja jonon yläraja luodaan ensisijaisen ja toissijaisen palveluntuottajan kognitiivisille verkoille ja tavallisille langattomille verkoille. Dynaamisen algoritmin näytetään stabiloivan verkko ja minimoivan keskimääräisen jonon pituuden yläraja sopeutumalla verkon olosuhteiden muutoksiin. Tämän lisäksi todistamme että aiemmin esitetty frame-algoritmi ei minimoi keskimääräisen viiveen ylärajaa, kun käyttäjät jakavat keskenään resursseja. access point ad hoc network cooperative communication dynamic programming lyapunov drift network stability topology control value iteration algorithm access point ad hoc-verkko arvoiteraatioalgoritmi dynaaminen ohjelmointi lyapunov drift topologian kontrollointi verkon stabiilius yhteistyö kommunikaatio

1

Page generated in 0.0994 seconds