Global ETD Search

141	Reinforcement Learning for Hydrobatic AUVs / Reinforcement learning för Hydrobatiska AUV Woźniak, Grzegorz January 2022 (has links) This master thesis focuses on developing a Reinforcement Learning (RL) controller to perform hydrobatic maneuvers on an Autonomous Underwater Vehicle (AUV) successfully. This work also aims to analyze the robustness of the RL controller, as well as provide a comparison between RL algorithms and Proportional Integral Derivative (PID) control. Training of the algorithms is initially conducted in a Numpy simulation in Python. We show how to model the Equations of Motion (EOM) of the AUV and how to use it to train the RL controllers. We use the stablebaselines3 RL framework and create a training environment with the OpenAI gym. The Twin-Delay Deep Deterministic Policy Gradient (TD3) algorithm offers good performance in the simulation. The following maneuvers are studied: trim control, waypoint following, and an inverted pendulum. We test the maneuvers both in the Numpy simulation and Stonefish simulator. Also, we test the robustness of the RL trim controller by simulating noise in the state feedback. Lastly, we run the RL trim controller on a real AUV hardware called SAM. We show that the RL algorithm trained in the Numpy simulator can achieve similar performance to the PID controller in the Stonefish simulator. We generate a policy that can perform the trim control and the Inverted Pendulum maneuver in the Numpy simulation. We show that we can generate a robust policy that executes other types of maneuvers by providing a parameterized cost function to the RL algorithm. We discuss the results of every maneuver we perform with the SAM AUV and provide a discussion about the advantages and disadvantages of this control method applied to underwater robotics. We conclude that RL can be used to create policies that perform hydrobatic maneuvers. This data-driven approach can be applied in the future to more complex problems in underwater robotics. / Denna masteruppsats fokuserar på att utveckla en Reinforcement Learning (RL) kontroller för att framgångsrikt utföra hydrobatiska manövrar på ett autonomt undervattensfordon (AUV). Detta arbete syftar också till att analysera robustheten hos RL-kontrollern, samt tillhandahålla en jämförelse mellan RL-algoritmer och Proportional Integral Derivative (PID) kontroll. Träning av algoritmerna utförs initialt i Numpy-simuleringen i Python. Vi visar hur man modellerar rörelseekvationerna (EOM) för AUV, och hur man använder den för att träna RL-kontrollerna. Vi använder ramverket stablebaselines3 RL och skapar en träningsmiljö med gymmet OpenAI. Algoritmen Twin-Delay Deep Deterministic Policy Gradient (TD3) erbjuder bra prestanda i simuleringen. Följande manövrar studeras: trimkontroll, waypointföljning och en inverterad pendel. Vi testar manövrarna både i Numpy-simulering och Stonefish-simulator. Vi testar också robustheten hos RL-trimkontrollern genom att simulera bruset i tillståndsåterkopplingen. Slutligen kör vi RL-trimkontrollern på den riktiga SAM AUV-hårdvaran. Vi visar att RL-algoritmen tränad i Numpy-simulatorn kan uppnå liknande prestanda som PID-regulatorn i Stonefish-simulatorn. Vi genererar en policy som kan utföra trimkontrollen och manövern med inverterad pendel i Numpy-simuleringen. Vi visar att vi kan generera en robust policy som utför andra typer av manövrar genom att tillhandahålla en parameteriserad kostnadsfunktion till RL-algoritmen. Vi diskuterar resultaten av varje manöver vi utför med SAM AUV och ger en diskussion om fördelarna och nackdelarna med denna kontrollmetod som tillämpas på undervattensrobotik. Vi drar slutsatsen att RL kan användas för att skapa policyer som utför hydrobatiska manövrar. Detta datadrivna tillvägagångssätt kan tillämpas i framtiden på mer komplexa problem inom undervattensrobotik. Deep Reinforcement learning Deep learning Optimal control Hydrobatics Deep Reinforcement learning Deep learning Optimal control Hydrobatics Computer and Information Sciences Data- och informationsvetenskap
142	A study of stochastic differential equations and Fokker-Planck equations with applications Li, Wuchen 27 May 2016 (has links) Fokker-Planck equations, along with stochastic differential equations, play vital roles in physics, population modeling, game theory and optimization (finite or infinite dimensional). In this thesis, we study three topics, both theoretically and computationally, centered around them. In part one, we consider the optimal transport for finite discrete states, which are on a finite but arbitrary graph. By defining a discrete 2-Wasserstein metric, we derive Fokker-Planck equations on finite graphs as gradient flows of free energies. By using dynamical viewpoint, we obtain an exponential convergence result to equilibrium. This derivation provides tools for many applications, including numerics for nonlinear partial differential equations and evolutionary game theory. In part two, we introduce a new stochastic differential equation based framework for optimal control with constraints. The framework can efficiently solve several real world problems in differential games and Robotics, including the path-planning problem. In part three, we introduce a new noise model for stochastic oscillators. With this model, we prove global boundedness of trajectories. In addition, we derive a pair of associated Fokker-Planck equations. Stochastic differential equations Fokker-Planck equations Gradient flow Optimal control Optimal transport
143	Optimal control based method for design and analysis of continuous descent arrivals Park, Sang Gyun 12 January 2015 (has links) Continuous Descent Arrival (CDA) is a procedure where aircraft descend, at or near idle thrust, from their cruise altitude to their Final Approach Fix without leveling off. By eliminating inefficient leveling off at low altitude, CDA provides benefits such as fuel savings, flight time savings, and the significant noise reduction near airports, but the usage of CDAs has been limited in low traffic condition due to difficulty in the separation management. For the successful CDA without degradation of the runway throughput, air traffic controllers should know the performance bound of the CDA trajectory and control the time of arrival for each aircraft, which is interpreted as Required Time of Arrival (RTA) from the aircraft standpoint. This thesis proposes a novel trajectory optimization methodology to meet RTA constraint. The CDA trajectory optimization problem in the flight management system is modeled as a path constrained optimal control problem of switched dynamical system. A sequential method that performs mode sequence estimation and parameter optimization, sequentially, is proposed to solve this problem. By analyzing the relaxed optimal solution with simplified dynamics, a computationally efficient algorithm to find the optimal switching structure is proposed and applied for the mode sequence estimation. This thesis also proposes a performance-bound analysis methodology using optimal control techniques to help controllers make a feasible schedule for CDA operations at a meter fix. The feasible time range analysis for a wide variety of aircraft is performed by using the proposed methodology. Based on the analysis result, a single flight time strategy is proposed for the application of CDA in high traffic conditions. The simulation with real traffic data has been shown that the single flight time strategy, combined with the proposed fixed RTA trajectory optimization, guarantees the conflict free CDA operation. Continuous descent arrival Optimal control Hybrid system Trajectory optimization Flight management system
144	Human Postures and Movements analysed through Constrained Optimization Pettersson, Robert January 2009 (has links) <p>Constrained optimization is used to derive human postures and movements. In the first study a static 3D model with 30 muscle groups is used to analyse postures. The activation levels of these muscles are minimized in order to represent the individual's choice of posture. Subject specific data in terms of anthropometry, strength and orthopedic aids serve as input. The aim is to study effects from orthopedic treatment and altered abilities of the subject. Initial validation shows qualitative agreement of posture strategies but further details about passive stiffness and anthropometry are needed, especially to predict pelvis orientation. In the second application, the athletic long jump, a problem formulation is developed to find optimal movements of a multibody system when subjected to contact. The model was based on rigid links, joint actuators and a wobbling mass. The contact to the ground was modelled as a spring-damper system with tuned properties. The movement in the degrees of freedom representing physical joints was described over contact time through two fifth-order polynomials, with a variable transition time, while the motion in the degrees of freedom of contact and wobbling mass was integrated forwards in time, as a consequence. Muscle activation variables were then optimized in order to maximize ballistic flight distance. The optimization determined contact time, end configuration, activation and interaction with the ground from an initial configuration. The results from optimization show a reasonable agreement with experimentally recorded jumps, but individual recordings and measurements are needed for more precise conclusions.</p><p> </p> multibody system optimal control trajectory optimization long jump posture Other engineering mechanics Övrig teknisk mekanik
145	DECISION MAKING UNDER UNCERTAINTY IN DYNAMIC MULTI-STAGE ATTACKER-DEFENDER GAMES Luo, Yi January 2011 (has links) This dissertation presents efficient, on-line, convergent methods to find defense strategies against attacks in dynamic multi-stage attacker-defender games including adaptive learning. This effort culminated in four papers submitted to high quality journals and a book and they are partially published. The first paper presents a novel fictitious play approach to describe the interactions between the attackers and network administrator along a dynamic game. Multi-objective optimization methodology is used to predict the attacker's best actions at each decision node. The administrator also keeps track of the attacker's actions and updates his knowledge on the attacker's behavior and objectives after each detected attack, and uses this information to update the prediction of the attacker's future actions to find its best response strategies. The second paper proposes a Dynamic game tree based Fictitious Play (DFP) approach to describe the repeated interactive decision processes of the players. Each player considers all possibilities in future interactions with their uncertainties, which are based on learning the opponent's decision process (including risk attitude, objectives). Instead of searching the entire game tree, appropriate future time horizons are dynamically selected for both players. The administrator keeps tracking the opponent's actions, predicts the probabilities of future possible attacks, and then chooses its best moves. The third paper introduces an optimization model to maximize the deterministic equivalent of the random payoff function of a computer network administrator in defending the system against random attacks. By introducing new variables the transformed objective function becomes concave. A special optimization algorithm is developed which requires the computation of the unique solution of a single variable monotonic equation. The fourth paper, which is an invited book chapter, proposes a discrete-time stochastic control model to capture the process of finding the best current move of the defender. The defender's payoffs at each stage of the game depend on the attacker's and the defender's accumulative efforts and are considered random variables due to their uncertainty. Their certain equivalents can be approximated based on their first and second moments which is chosen as the cost functions of the dynamic system. An on-line, convergent, Scenarios based Proactive Defense (SPD) algorithm is developed based on Differential Dynamic Programming (DDP) to solve the associated optimal control problem. Decision making under uncertainty Dynamic programming Forecasting Game theory Intrusion defense system Optimal control
146	Constructing and solving variational image registration problems Cahill, Nathan D. January 2009 (has links) Nonrigid image registration has received much attention in the medical imaging and computer vision research communities, because it enables a wide variety of applications. Feature tracking, segmentation, classification, temporal image differencing, tumour growth estimation, and pharmacokinetic modeling are examples of the many tasks that are enhanced by the use of aligned imagery. Over the years, the medical imaging and computer vision communties have developed and refined image registration techniques in parallel, often based on similar assumptions or underlying paradigms. This thesis focuses on variational registration, which comprises a subset of nonrigid image registration. It is divided into chapters that are based on fundamental aspects of the variational registration problem: image dissimilarity measures, changing overlap regions, regularizers, and computational solution strategies. Key contributions include the development of local versions of standard dissimilarity measures, the handling of changing overlap regions in a manner that is insensitive to the amount of non-interesting background information, the combination of two standard taxonomies of regularizers, and the generalization of solution techniques based on Fourier methods and the Demons algorithm for use with many regularizers. To illustrate and validate the various contributions, two sets of example imagery are used: 3D CT, MR, and PET images of the brain as well as 3D CT images of lung cancer patients. 615.84
147	Kinks in a model for two-phase lipid bilayer membranes Helmers, Michael January 2011 (has links) In the spontaneous curvature model for two-phase lipid bilayer membranes the shape of vesicles is governed by a combination of an elastic bending energy and an interface energy that penalises the size of phase boundaries. Each lipid phase induces a preferred curvature to the membrane surface, and these curvatures as well as phase boundaries may lead to the development of kinks. In a rotationally symmetric setting we introduce a family of energies for smooth surfaces and phase fields for the lipid components and study convergence to a sharp-interface limit, which depends on the choice of the bending parameters of the phase field model. We prove that, if kinks are excluded, our energies $Gamma$-converge to the commonly used sharp-interface spontaneous curvature energy with the additional assumption of $C^1$-regularity across interfaces. For a choice of parameters such that kinks may appear, we obtain a limit that coincides with the $Gamma$-limit on all reasonable membranes and extends the classical model by assigning a bending energy also to kinks. We illustrate the theoretical result by some numerical examples. 512.577
148	On probabilistic inference approaches to stochastic optimal control Rawlik, Konrad Cyrus January 2013 (has links) While stochastic optimal control, together with associate formulations like Reinforcement Learning, provides a formal approach to, amongst other, motor control, it remains computationally challenging for most practical problems. This thesis is concerned with the study of relations between stochastic optimal control and probabilistic inference. Such dualities { exempli ed by the classical Kalman Duality between the Linear-Quadratic-Gaussian control problem and the filtering problem in Linear-Gaussian dynamical systems { make it possible to exploit advances made within the separate fields. In this context, the emphasis in this work lies with utilisation of approximate inference methods for the control problem. Rather then concentrating on special cases which yield analytical inference problems, we propose a novel interpretation of stochastic optimal control in the general case in terms of minimisation of certain Kullback-Leibler divergences. Although these minimisations remain analytically intractable, we show that natural relaxations of the exact dual lead to new practical approaches. We introduce two particular general iterative methods ψ-Learning, which has global convergence guarantees and provides a unifying perspective on several previously proposed algorithms, and Posterior Policy Iteration, which allows direct application of inference methods. From these, practical algorithms for Reinforcement Learning, based on a Monte Carlo approximation to ψ-Learning, and model based stochastic optimal control, using a variational approximation of posterior policy iteration, are derived. In order to overcome the inherent limitations of parametric variational approximations, we furthermore introduce a new approach for none parametric approximate stochastic optimal control based on a reproducing kernel Hilbert space embedding of the control problem. Finally, we address the general problem of temporal optimisation, i.e., joint optimisation of controls and temporal aspects, e.g., duration, of the task. Specifically, we introduce a formulation of temporal optimisation based on a generalised form of the finite horizon problem. Importantly, we show that the generalised problem has a dual finite horizon problem of the standard form, thus bringing temporal optimisation within the reach of most commonly used algorithms. Throughout, problems from the area of motor control of robotic systems are used to evaluate the proposed methods and demonstrate their practical utility.
149	Motion Planning for a Reversing Full-Scale Truck and Trailer System Holmer, Olov January 2016 (has links) In this thesis improvements, implementation and evaluation have been done on a motion planning algorithm for a full-sized reversing truck and trailer system. The motion planner is based on a motion planning algorithm called Closed-Loop Rapidly-exploring Random Tree (CL-RRT). An important property for a certain class of systems, stating that by selecting the input signals in a certain way the same result as reversing the time can be archived, is also presented. For motion planning this means that the problem of reversing from position A to position B can also be solved by driving forward from B to A and then reverse the solution. The use of this result in the motion planner has been evaluated and has shown to be very useful. The main improvements made on the CL-RRT algorithm are a faster collision detection method, a more efficient way to draw samples and a more correct heuristic cost-to-go function. A post optimizing or smoothing method that brings the system to the exact desired configuration, based on numerical optimal control, has also been developed and implemented with successful results. The motion planner has been implemented and evaluated on a full-scale truck with a dolly steered trailer prepared for autonomous operation with promising results. truck and trailer general 2-trailer motion planning RRT CL-RRT numerical optimal control
150	Dynamical system decomposition and analysis using convex optimization Anderson, James David January 2012 (has links) This thesis is concerned with investigating new methods for the analysis of large-scale dynamical systems using convex optimization. The proposed methodology is based on composite Lyapunov theory and is computationally implemented using polynomial programming techniques. The main result of this work is the development of a system decomposition framework that makes it possible to analyze systems that are of such a scale that traditional methods cannot cope with. We begin by addressing the problem of model invalidation. A barrier certificate method for invalidating models in the presence of uncertain data is presented for both continuous and discrete time models. It is shown how a re-parameterization of the time dependent variables can improve the numerical conditioning of the underlying optimization problem. The main contribution of this thesis is the development of an automated dynamical system decomposition framework that permits us to verify the stability of systems that typically have a state dimension large enough to render traditional computational methods intractable. The underlying idea is to decompose a system into a set of lower order subsystems connected in feedback in such a manner that composite methods for stability verification may be employed. What is unique about the algorithm presented is that it takes into account both dynamics and the topology of the interconnection graph. In the first instance we illustrate the methodology with an ecological network and primal Internet congestion control scheme. The versatility of the decomposition framework is also highlighted when it is shown that when applied to a model of the EGF-MAPK signaling pathway it is capable of identifying biologically relevant subsystems in addition to stability verification. Finally we introduce stability metrics for interconnected dynamical systems based on the theory of dissipativity. We conclude by outlining a clustering based decomposition algorithm that explicitly takes into account the input and output dynamics when determining the system decomposition. 519.6

Search results