Global ETD Search

111	EXPANDING THE AUTONOMOUS SURFACE VEHICLE NAVIGATION PARADIGM THROUGH INLAND WATERWAY ROBOTIC DEPLOYMENT Reeve David Lambert (13113279) 19 July 2022 (has links) <p>This thesis presents solutions to some of the problems facing Autonomous Surface Vehicle (ASV) deployments in inland waterways through the development of navigational and control systems. Fluvial systems are one of the hardest inland waterways to navigate and are thus used as a use-case for system development. The systems are built to reduce the reliance on a-prioris during ASV operation. This is crucial for exceptionally dynamic environments such as fluvial bodies of water that have poorly defined routes and edges, can change course in short time spans, carry away and deposit obstacles, and expose or cover shoals and man-made structures as their water level changes. While navigation of fluvial systems is exceptionally difficult potential autonomous data collection can aid in important scientific missions in under studied environments.</p> <p><br></p> <p>The work has four contributions targeting solutions to four fundamental problems present in fluvial system navigation and control. To sense the course of fluvial systems for navigable path determination a fluvial segmentation study is done and a novel dataset detailed. To enable rapid path computations and augmentations in a fast moving environment a Dubins path generator and augmentation algorithm is presented ans is used in conjunction with an Integral Line-Of-Sight (ILOS) path following method. To rapidly avoid unseen/undetected obstacles present in fluvial environments a Deep Reinforcement Learning (DRL) agent is built and tested across domains to create dynamic local paths that can be rapidly affixed to for collision avoidance. Finally, a custom low-cost and deployable ASV, BREAM (Boat for Robotic Engineering and Applied Machine-Learning), capable of operating in fluvial environments is presented along with an autonomy package used in providing base level sensing and autonomy processing capability to varying platforms.</p> <p><br></p> <p>Each of these contributions form a part of a larger documented Fluvial Navigation Control Architecture (FNCA) that is proposed as a way to aid in a-priori free navigation of fluvial waterways. The architecture relates the navigational structures into high, mid, and low-level controller Guidance and Navigational Control (GNC) layers that are designed to increase cross vehicle and domain deployments. Each component of the architecture is documented, tested, and its application to the control architecture as a whole is reported.</p> Autonomous vehicle systems Field robotics Marine engineering Path Planning Autonomous surface vehicle (ASV) Marine Engineering Path Following river navigation Image Segmentation Deep Reinforcement Learning (DRL) Control Systems, Robotics and Automation Autonomous Vehicles Obstacle avoidance
112	Deep Reinforcement Learning for Autonomous Highway Driving Scenario Pradhan, Neil January 2021 (has links) We present an autonomous driving agent on a simulated highway driving scenario with vehicles such as cars and trucks moving with stochastically variable velocity profiles. The focus of the simulated environment is to test tactical decision making in highway driving scenarios. When an agent (vehicle) maintains an optimal range of velocity it is beneficial both in terms of energy efficiency and greener environment. In order to maintain an optimal range of velocity, in this thesis work I proposed two novel reward structures: (a) gaussian reward structure and (b) exponential rise and fall reward structure. I trained respectively two deep reinforcement learning agents to study their differences and evaluate their performance based on a set of parameters that are most relevant in highway driving scenarios. The algorithm implemented in this thesis work is double-dueling deep-Q-network with prioritized experience replay buffer. Experiments were performed by adding noise to the inputs, simulating Partially Observable Markov Decision Process in order to obtain reliability comparison between different reward structures. Velocity occupancy grid was found to be better than binary occupancy grid as input for the algorithm. Furthermore, methodology for generating fuel efficient policies has been discussed and demonstrated with an example. / Vi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel. Deep reinforcement learning Highway driving scenario Tactical decision making fuel reduction high-level decision making autonomous driving Lärande om djupförstärkning motorvägsscenario taktiskt beslutsfattande bränslereduktion beslut på hög nivå autonom körning Computer and Information Sciences Data- och informationsvetenskap
113	Apprentissage de stratégies de calcul adaptatives pour les réseaux neuronaux profonds Kamanda, Aton 07 1900 (has links) La théorie du processus dual stipule que la cognition humaine fonctionne selon deux modes distincts : l’un pour le traitement rapide, habituel et associatif, appelé communément "système 1" et le second, ayant un traitement plus lent, délibéré et contrôlé, que l’on nomme "système 2". Cette distinction indique une caractéristique sous-jacente importante de la cognition humaine : la possibilité de passer de manière adaptative à différentes stratégies de calcul selon la situation. Cette capacité est étudiée depuis longtemps dans différents domaines et de nombreux bénéfices hypothétiques semblent y être liés. Cependant, les réseaux neuronaux profonds sont souvent construits sans cette capacité à gérer leurs ressources calculatoires de manière optimale. Cette limitation des modèles actuels est d’autant plus préoccupante que de plus en plus de travaux récents semblent montrer une relation linéaire entre la capacité de calcul utilisé et les performances du modèle lors de la phase d’évaluation. Pour résoudre ce problème, ce mémoire propose différentes approches et étudie leurs impacts sur les modèles, tout d’abord, nous étudions un agent d’apprentissage par renforcement profond qui est capable d’allouer plus de calcul aux situations plus difficiles. Notre approche permet à l’agent d’adapter ses ressources computationnelles en fonction des exigences de la situation dans laquelle il se trouve, ce qui permet en plus d’améliorer le temps de calcul, améliore le transfert entre des tâches connexes et la capacité de généralisation. L’idée centrale commune à toutes nos approches est basée sur les théories du coût de l’effort venant de la littérature sur le contrôle cognitif qui stipule qu’en rendant l’utilisation de ressource cognitive couteuse pour l’agent et en lui laissant la possibilité de les allouer lors de ses décisions il va lui-même apprendre à déployer sa capacité de calcul de façon optimale. Ensuite, nous étudions des variations de la méthode sur une tâche référence d’apprentissage profond afin d’analyser précisément le comportement du modèle et quels sont précisément les bénéfices d’adopter une telle approche. Nous créons aussi notre propre tâche "Stroop MNIST" inspiré par le test de Stroop utilisé en psychologie afin de valider certaines hypothèses sur le comportement des réseaux neuronaux employant notre méthode. Nous finissons par mettre en lumière les liens forts qui existent entre apprentissage dual et les méthodes de distillation des connaissances. Notre approche a la particularité d’économiser des ressources computationnelles lors de la phase d’inférence. Enfin, dans la partie finale, nous concluons en mettant en lumière les contributions du mémoire, nous détaillons aussi des travaux futurs, nous approchons le problème avec les modèles basés sur l’énergie, en apprenant un paysage d’énergie lors de l’entrainement, le modèle peut ensuite lors de l’inférence employer une capacité de calcul dépendant de la difficulté de l’exemple auquel il fait face plutôt qu’une simple propagation avant fixe ayant systématiquement le même coût calculatoire. Bien qu’ayant eu des résultats expérimentaux infructueux, nous analysons les promesses que peuvent tenir une telle approche et nous émettons des hypothèses sur les améliorations potentielles à effectuer. Nous espérons, avec nos contributions, ouvrir la voie vers des algorithmes faisant un meilleur usage de leurs ressources computationnelles et devenant par conséquent plus efficace en termes de coût et de performance, ainsi que permettre une compréhension plus intime des liens qui existent entre certaines méthodes en apprentissage machine et la théorie du processus dual. / The dual-process theory states that human cognition operates in two distinct modes: one for rapid, habitual and associative processing, commonly referred to as "system 1", and the second, with slower, deliberate and controlled processing, which we call "system 2". This distinction points to an important underlying feature of human cognition: the ability to switch adaptively to different computational strategies depending on the situation. This ability has long been studied in various fields, and many hypothetical benefits seem to be linked to it. However, deep neural networks are often built without this ability to optimally manage their computational resources. This limitation of current models is all the more worrying as more and more recent work seems to show a linear relationship between the computational capacity used and model performance during the evaluation phase. To solve this problem, this thesis proposes different approaches and studies their impact on models. First, we study a deep reinforcement learning agent that is able to allocate more computation to more difficult situations. Our approach allows the agent to adapt its computational resources according to the demands of the situation in which it finds itself, which in addition to improving computation time, enhances transfer between related tasks and generalization capacity. The central idea common to all our approaches is based on cost-of-effort theories from the cognitive control literature, which stipulate that by making the use of cognitive resources costly for the agent, and allowing it to allocate them when making decisions, it will itself learn to deploy its computational capacity optimally. We then study variations of the method on a reference deep learning task, to analyze precisely how the model behaves and what the benefits of adopting such an approach are. We also create our own task "Stroop MNIST" inspired by the Stroop test used in psychology to validate certain hypotheses about the behavior of neural networks employing our method. We end by highlighting the strong links between dual learning and knowledge distillation methods. Finally, we approach the problem with energy-based models, by learning an energy landscape during training, the model can then during inference employ a computational capacity dependent on the difficulty of the example it is dealing with rather than a simple fixed forward propagation having systematically the same computational cost. Despite unsuccessful experimental results, we analyze the promise of such an approach and speculate on potential improvements. With our contributions, we hope to pave the way for algorithms that make better use of their computational resources, and thus become more efficient in terms of cost and performance, as well as providing a more intimate understanding of the links that exist between certain machine learning methods and dual process theory. Apprentissage par renforcement profond Théorie du processus dual Efficacité computationnelle Apprentissage profond Efficacité computationnelle Distillation des connaissances Modèles basés sur l’énergie Contrôle cognitif Deep learning Deep reinforcement learning Dual process theory Computational efficiency Knowledge distillation Energy-based models Cognitive control
114	Deep Reinforcement Learning Adaptive Traffic Signal Control / Reinforcement Learning Traffic Signal Control Genders, Wade 22 November 2018 (has links) Sub-optimal automated transportation control systems incur high mobility, human health and environmental costs. With society reliant on its transportation systems for the movement of individuals, goods and services, minimizing these costs benefits many. Intersection traffic signal controllers are an important element of modern transportation systems that govern how vehicles traverse road infrastructure. Many types of traffic signal controllers exist; fixed time, actuated and adaptive. Adaptive traffic signal controllers seek to minimize transportation costs through dynamic control of the intersection. However, many existing adaptive traffic signal controllers rely on heuristic or expert knowledge and were not originally designed for scalability or for transportation’s big data future. This research addresses the aforementioned challenges by developing a scalable system for adaptive traffic signal control model development using deep reinforcement learning in traffic simulation. Traffic signal control can be modelled as a sequential decision-making problem; reinforcement learning can solve sequential decision-making problems by learning an optimal policy. Deep reinforcement learning makes use of deep neural networks, powerful function approximators which benefit from large amounts of data. Distributed, parallel computing techniques are used to provide scalability, with the proposed methods validated on a simulation of the City of Luxembourg, Luxembourg, consisting of 196 intersections. This research contributes to the body of knowledge by successfully developing a scalable system for adaptive traffic signal control model development and validating it on the largest traffic microsimulator in the literature. The proposed system reduces delay, queues, vehicle stopped time and travel time compared to conventional traffic signal controllers. Findings from this research include that using reinforcement learning methods which explicitly develop the policy offers improved performance over purely value-based methods. The developed methods are expected to mitigate the problems caused by sub-optimal automated transportation signal controls systems, improving mobility and human health and reducing environmental costs. / Thesis / Doctor of Philosophy (PhD) / Inefficient transportation systems negatively impact mobility, human health and the environment. The goal of this research is to mitigate these negative impacts by improving automated transportation control systems, specifically intersection traffic signal controllers. This research presents a system for developing adaptive traffic signal controllers that can efficiently scale to the size of cities by using machine learning and parallel computation techniques. The proposed system is validated by developing adaptive traffic signal controllers for 196 intersections in a simulation of the City of Luxembourg, Luxembourg, successfully reducing delay, queues, vehicle stopped time and travel time. intelligent transportation systems machine learning machine learning transportation machine learning traffic signal control artificial intelligence artificial intelligence transportation deep learning deep neural networks traffic optimization adaptive traffic signal control machine learning engineering
115	Reinforcement Learning for Market Making / Förstärkningsinlärningsbaserad likviditetsgarantering Carlsson, Simon, Regnell, August January 2022 (has links) Market making – the process of simultaneously and continuously providing buy and sell prices in a financial asset – is rather complicated to optimize. Applying reinforcement learning (RL) to infer optimal market making strategies is a relatively uncharted and novel research area. Most published articles in the field are notably opaque concerning most aspects, including precise methods, parameters, and results. This thesis attempts to explore and shed some light on the techniques, problem formulations, algorithms, and hyperparameters used to construct RL-derived strategies for market making. First, a simple probabilistic model of a limit order book is used to compare analytical and RL-derived strategies. Second, a market making agent is trained on a more complex Markov chain model of a limit order book using tabular Q-learning and deep reinforcement learning with double deep Q-learning. Results and strategies are analyzed, compared, and discussed. Finally, we propose some exciting extensions and directions for future work in this research field. / Likviditetsgarantering (eng. ”market making”) – processen att simultant och kontinuerligt kvotera köp- och säljpriser i en finansiell tillgång – är förhållandevis komplicerat att optimera. Att använda förstärkningsinlärning (eng. ”reinforcement learning”) för att härleda optimala strategier för likviditetsgarantering är ett relativt outrett och nytt forskningsområde. De flesta publicerade artiklarna inom området är anmärkningsvärt återhållsamma gällande detaljer om de tekniker, problemformuleringar, algoritmer och hyperparametrar som används för att framställa förstärkningsinlärningsbaserade strategier. I detta examensarbete så gör vi ett försök på att utforska och bringa klarhet över dessa punkter. Först används en rudimentär probabilistisk modell av en limitorderbok som underlag för att jämföra analytiska och förstärkningsinlärda strategier. Därefter brukas en mer sofistikerad Markovkedjemodell av en limitorderbok för att jämföra tabulära och djupa inlärningsmetoder. Till sist presenteras även spännande utökningar och direktiv för framtida arbeten inom området. Reinforcement learning Market making Deep reinforcement learning Limit order book Algorithmic trading High-frequency trading Machine learning Artificial intelligence Q-learning DDQN Förstärkningsinlärning Market making Djup förstärkningsinlärning Limitorderbok Algoritmisk handel Högfrekvenshandel Maskininlärning Artificiell intelligens Q-learning DDQN Other Mathematics Annan matematik
116	Deep Reinforcement Learning for Multi-Agent Path Planning in 2D Cost Map Environments : using Unity Machine Learning Agents toolkit Persson, Hannes January 2024 (has links) Multi-agent path planning is applied in a wide range of applications in robotics and autonomous vehicles, including aerial vehicles such as drones and other unmanned aerial vehicles (UAVs), to solve tasks in areas like surveillance, search and rescue, and transportation. In today's rapidly evolving technology in the fields of automation and artificial intelligence, multi-agent path planning is growing increasingly more relevant. The main problems encountered in multi-agent path planning are collision avoidance with other agents, obstacle evasion, and pathfinding from a starting point to an endpoint. In this project, the objectives were to create intelligent agents capable of navigating through two-dimensional eight-agent cost map environments to a static target, while avoiding collisions with other agents and simultaneously minimizing the path cost. The method of reinforcement learning was used by utilizing the development platform Unity and the open-source ML-Agents toolkit that enables the development of intelligent agents with reinforcement learning inside Unity. Perlin Noise was used to generate the cost maps. The reinforcement learning algorithm Proximal Policy Optimization was used to train the agents. The training was structured as a curriculum with two lessons, the first lesson was designed to teach the agents to reach the target, without colliding with other agents or moving out of bounds. The second lesson was designed to teach the agents to minimize the path cost. The project successfully achieved its objectives, which could be determined from visual inspection and by comparing the final model with a baseline model. The baseline model was trained only to reach the target while avoiding collisions, without minimizing the path cost. A comparison of the models showed that the final model outperformed the baseline model, reaching an average of $27.6\%$ lower path cost. / Multi-agent-vägsökning används inom en rad olika tillämpningar inom robotik och autonoma fordon, inklusive flygfarkoster såsom drönare och andra obemannade flygfarkoster (UAV), för att lösa uppgifter inom områden som övervakning, sök- och räddningsinsatser samt transport. I dagens snabbt utvecklande teknik inom automation och artificiell intelligens blir multi-agent-vägsökning allt mer relevant. De huvudsakliga problemen som stöts på inom multi-agent-vägsökning är kollisioner med andra agenter, undvikande av hinder och vägsökning från en startpunkt till en slutpunkt. I detta projekt var målen att skapa intelligenta agenter som kan navigera genom tvådimensionella åtta-agents kostnadskartmiljöer till ett statiskt mål, samtidigt som de undviker kollisioner med andra agenter och minimerar vägkostnaden. Metoden förstärkningsinlärning användes genom att utnyttja utvecklingsplattformen Unity och Unitys open-source ML-Agents toolkit, som möjliggör utveckling av intelligenta agenter med förstärkningsinlärning inuti Unity. Perlin Brus användes för att generera kostnadskartorna. Förstärkningsinlärningsalgoritmen Proximal Policy Optimization användes för att träna agenterna. Träningen strukturerades som en läroplan med två lektioner, den första lektionen var utformad för att lära agenterna att nå målet, utan att kollidera med andra agenter eller röra sig utanför gränserna. Den andra lektionen var utformad för att lära agenterna att minimera vägkostnaden. Projektet uppnådde framgångsrikt sina mål, vilket kunde fastställas genom visuell inspektion och genom att jämföra den slutliga modellen med en basmodell. Basmodellen tränades endast för att nå målet och undvika kollisioner, utan att minimera vägen kostnaden. En jämförelse av modellerna visade att den slutliga modellen överträffade baslinjemodellen, och uppnådde en genomsnittlig $27,6\%$ lägre vägkostnad. deep reinforcement learning reinforcement learning machine learning path planning cost map ML-agents unity artificial neural networks collision avoidance PPO multi agent multi-agent multi-agent system förstärkningsinlärning djup förstärkningsinlärning fleragentssystem kostnadkarta kostnadskartor artificiella neurala nätverk maskininlärning proximal policy optimization PPO svärmintelligens

Page generated in 0.092 seconds