• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 74
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 104
  • 104
  • 104
  • 32
  • 24
  • 19
  • 19
  • 18
  • 17
  • 17
  • 17
  • 17
  • 17
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Autonomous Cyber Defense for Resilient Cyber-Physical Systems

Zhang, Qisheng 09 January 2024 (has links)
In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving. / Doctor of Philosophy / In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving.
72

Deep Reinforcement Learning for Temperature Control in Buildings and Adversarial Attacks

Ammouri, Kevin January 2021 (has links)
Heating, Ventilation and Air Conditioning (HVAC) systems in buildings are energy consuming and traditional methods used for building control results in energy losses. The methods cannot account for non-linear dependencies in the thermal behaviour. Deep Reinforcement Learning (DRL) is a powerful method for reaching optimal control in many different control environments. DRL utilizes neural networks to approximate the optimal actions to take given that the system is in a given state. Therefore, DRL is a promising method for building control and this fact is highlighted by several studies. However, neural network polices are known to be vulnerable to adversarial attacks, which are small, indistinguishable changes to the input, which make the network choose a sub-optimal action. Two of the main approaches to attack DRL policies are: (1) the Fast Gradient Sign Method, which uses the gradients of the control agent’s network to conduct the attack; (2) to train a a DRL-agent with the goal to minimize performance of control agents. The aim of this thesis is to investigate different strategies for solving the building control problem with DRL using the building simulator IDA ICE. This thesis is also going to use the concept of adversarial machine learning by applying the attacks on the agents controlling the temperature inside the building. We first built a DRL architecture to learn how to efficiently control temperature in a building. Experiments demonstrate that exploration of the agent plays a crucial role in the training of the building control agent, and one needs to fine-tune the exploration strategy in order to achieve satisfactory performance. Finally, we tested the susceptibility of the trained DRL controllers to adversarial attacks. These tests showed, on average, that attacks trained using DRL methods have a larger impact on building control than those using FGSM, while random perturbation have almost null impact. / Ventilationssystem i byggnader är energiförbrukande och traditionella metoder som används för byggnadskontroll resulterar i förlust av energisparande. Dessa metoder kan inte ta hänsyn till icke-linjära beroenden i termisk beteenden. Djup förstärkande inlärning (DRL) är en kraftfull metod för att uppnå optimal kontroll i många kontrollmiljöer. DRL använder sig av neurala nätverk för att approximera optimala val som kan tas givet att systemet befinner sig i en viss stadie. Därför är DRL en lovande metod för byggnadskontroll och detta faktumet är markerat av flera studier. Likväl, neurala nätverk i allmänhet är kända för att vara svaga mot adversarial attacker, vilket är små ändringar i inmatningen, som gör att neurala nätverket väljer en åtgärd som är suboptimal. Syftet med denna anvhandling är att undersöka olika strategier för att lösa byggnadskontroll-problemet med DRL genom att använda sig av byggnadssimulatorn IDA ICE. Denna avhandling kommer också att använda konceptet av adversarial machine learning för att attackera agenterna som kontrollerar temperaturen i byggnaden. Det finns två olika sätt att attackera neurala nätverk: (1) Fast Gradient Sign Method, som använder gradienterna av kontrollagentens nätverk för att utföra sin attack; (2) träna en inlärningsagent med DRL med målet att minimera kontrollagenternas prestanda. Först byggde vi en DRL-arkitektur som lärde sig kontrollera temperaturen i en byggad. Experimenten visar att utforskning av agenten är en grundläggande faktor för träningen av kontrollagenten och man måste finjustera utforskningen av agenten för att nå tillfredsställande prestanda. Slutligen testade vi känsligheten av de tränade DRL-agenterna till adversarial attacker. Dessa test visade att i genomsnitt har det större påverkan på kontrollagenterna att använda DRL metoder än att använda sig av FGSM medans att attackera helt slumpmässigt har nästan ingen påverkan.
73

Understanding and Combating Online Social Deception

Guo, Zhen 02 May 2023 (has links)
In today's world, online communication through social network services (SNSs) has become an essential aspect of people's daily lives. As social networking sites (SNSs) have become more sophisticated, cyber attackers have found ways to exploit them for harmful activities such as financial fraud, privacy violations, and sexual or labor exploitation. Thus, it is imperative to gain an understanding of these activities and develop effective countermeasures to build SNSs that can be trusted. The existing approaches have focused on discussing detection mechanisms for a particular type of online social deception (OSD) using various artificial intelligence (AI) techniques, including machine/deep learning (ML/DL) or text mining. However, fewer studies exist on the prevention and response (or mitigation) mechanisms for effective defense against OSD attacks. Further, there have been insufficient efforts to investigate the underlying intents and tactics of those OSD attackers through their in-depth understanding. This dissertation is motivated to take defense approaches to combat OSD attacks through the in-depth understanding of the psychological-social behaviors of attackers and potential victims, which can effectively guide us to take more proactive action against OSD attacks which can minimize potential damages to the potential victims as well as be cost-effective by minimizing or saving recovery cost. In this dissertation, we examine the OSD attacks mainly through two tasks, including understanding their causes and combating them in terms of prevention, detection, and mitigation. In the OSD understanding task, we investigate the intent and tactics of false informers (e.g., fake news spreaders) in propagating fake news or false information. We understand false informers' intent more accurately based on intent-related phrases from fake news contexts to decide on effective and efficient defenses (or interventions) against them. In the OSD combating task, we develop the defense systems following two sub-tasks: (1) The social capital-based friending recommendation system to guide OSN users to choose trustworthy users to defend against phishing attackers proactively; and (2) The defensive opinion update framework for OSN users to process their opinions by filtering out false information. The schemes proposed for combating OSD attacks contribute to the prevention, detection, and mitigation of OSD attacks. / Doctor of Philosophy / This Ph.D. dissertation explores the issue of online social deception (OSD) in the context of social networking services (SNSs). With the increasing sophistication of SNSs, cyber attackers have found ways to exploit them for harmful activities, such as financial fraud and privacy violations. While previous studies have focused on detection mechanisms using artificial intelligence (AI) techniques, this dissertation takes a defense approach by investigating the underlying psychological-social behaviors of attackers and potential victims. Through two tasks of understanding OSD causes and combating them through various AI approaches, this dissertation proposes a social capital-based friending recommendation system, a defensive opinion update framework, and a fake news spreaders' intent analysis framework to guide SNS users in choosing trustworthy users and filtering out phishing attackers or false information. The proposed schemes contribute to the prevention, detection, and mitigation of OSD attacks, potentially minimizing potential damages to potential victims and saving recovery costs.
74

Optimization Methods for Distribution Systems: Market Design and Resiliency Enhancement

Bedoya Ceballos, Juan Carlos 05 August 2020 (has links)
The increasing penetration of proactive agents in distribution systems (DS) has opened new possibilities to make the grid more resilient and to increase participation of responsive loads (RL) and non-conventional generation resources. On the resiliency side, plug-in hybrid electric vehicles (PHEV), energy storage systems (ESS), microgrids (MG), and distributed energy resources (DER), can be leveraged to restore critical load in the system when the utility system is not available for extended periods of time. Critical load restoration is a key factor to achieve a resilient distribution system. On the other hand, existing DERs and responsive loads can be coordinated in a market environment to contribute to efficiency of electricity consumption and fair electricity tariffs, incentivizing proactive agents' participation in the distribution system. Resiliency and market applications for distribution systems are highly complex decision-making problems that can be addressed using modern optimization techniques. Complexities of these problems arise from non-linear relations, integer decision variables, scalability, and asynchronous information. On the resiliency side, existing models include optimization approaches that consider system's available information and neglect asynchrony of data arrival. As a consequence, these models can lead to underutilization of critical resources during system restoration. They can also become computationally intractable for large-scale systems. In the market design problem, existing approaches are based on centralized or computational distributed approaches that are not only limited by hardware requirements but also restrictive for active participation of the market agents. In this context, the work of this dissertation results in major contributions regarding new optimization algorithms for market design and resiliency improvement in distribution systems. In the DS market side, two novel contribution are presented: 1) A computational distributed coordination framework based on bilateral transactions where social welfare is maximized, and 2) A fully decentralized transactive framework where power suppliers, in a simultaneous auction environment, strategically bid using a Markowitz portfolio optimization approach. On the resiliency side, this research proposed a system restoration approach, taking into account uncertain devices and associated asynchronous information, by means of a two-module optimization models based on binary programming and three phase unbalanced optimal power flow. Furthermore, a Reinforcement Learning (RL) method along with a Monte Carlo tree search algorithm has been proposed to solve the scalability problem for resiliency enhancement. / Doctor of Philosophy / Distribution systems (DS) are evolving from traditional centralized and fossil fuel generation resources to networks with large scale deployment of responsive loads and distributed energy resources. Optimization-based decision-making methods to improve resiliency and coordinate DS participants are required. Prohibitive costs due to extended power outages require efficient mechanisms to avoid interruption of service to critical load during catastrophic power outages. Coordination mechanisms for various generation resources and proactive loads are in great need. Existing optimization-based approaches either neglect the asynchronous nature of the information arrival or are computationally intractable for large scale system. The work of this dissertation results in major contributions regarding new optimization methods for market design, coordination of DS participants, and improvement of DS resiliency. Four contributions toward the application of optimization approaches for DS are made: 1) A distributed optimization algorithm based on decomposition and best approximation techniques to maximize social welfare in a market environment, 2) A simultaneous auction mechanism and portfolio optimization method in a fully decentralized market framework, 3) Binary programming and nonlinear unbalanced power flow, considering asynchronous information, to enhance resiliency in a DS, and 4) A reinforcement learning method together with an efficient search algorithm to support large scale resiliency improvement models incorporating asynchronous information.
75

Autonomous Driving with Deep Reinforcement Learning

Zhu, Yuhua 17 May 2023 (has links)
The researcher developed an autonomous driving simulation by training an end-to-end policy model using deep reinforcement learning algorithms in the Gym-duckietown virtual environment. The control strategy of the model was designed for the lane-following task. Several reinforcement learning algorithms were implemented and the SAC algorithm was chosen to train a non-end-to-end model with the information provided by the environment such as speed as input values, as well as an end-to-end model with images captured by the agent's front camera as input. In this paper, the researcher compared the advantages and disadvantages of the two models using kinetic parameters in the environment and conducted a series of experiments on the control strategy of the end-to-end model to explore the effects of different environmental parameters or reward functions on the models.:CHAPTER 1 INTRODUCTION 1 1.1 AUTONOMOUS DRIVING OVERVIEW 1 1.2 RESEARCH QUESTIONS AND METHODS 3 1.2.1 Research Questions 3 1.2.2 Research Methods 4 1.3 PAPER STRUCTURE 5 CHAPTER 2 RESEARCH BACKGROUND 7 2.1 RESEARCH STATUS 7 2.2 THEORETICAL BASIS 8 2.2.1 Machine Learning 8 2.2.2 Deep Learning 9 2.2.3 Reinforcement Learning 11 2.2.4 Deep Reinforcement Learning 14 CHAPTER 3 METHOD 15 3.1 SIMULATION PLATFORM 16 3.2 CONTROL TASK 17 3.3 OBSERVATION SPACE 18 3.3.1 Information as Observation (Non-end-to-end) 19 3.3.2 Images as Observation (End-to-end) 20 3.4 ACTION SPACE 22 3.5 ALGORITHM 23 3.5.1 Mathematical Foundations 23 3.5.2 Policy Iteration 25 3.6 POLICY ARCHITECTURE 25 3.6.1 Network Architecture for Non-end-to-end Model 26 3.6.2 Network Architecture for End-to-end Model 28 3.7 REWARD SHAPING 29 3.7.1 Calculation of Speed-based Reward Function 30 3.7.2 Calculation of the reward function based on the position of the agent relative to the right lane 31 CHAPTER 4 TRAINING PROCESS 33 4.1 TRAINING PROCESS OF NON-END-TO-END MODEL 34 4.2 TRAINING PROCESS OF END-TO-END MODEL 35 CHAPTER 5 RESULT 38 CHAPTER 6 TEST AND EVALUATION 41 6.1 EVALUATION OF END-TO-END MODEL 43 6.1.1 Speed Tests in Two Scenarios 43 6.1.2 Lateral Deviation between the Agent and the Right Lane’s Centerline 44 6.1.3 Orientation Deviation between the Agent and the Right Lane’s Centerline 45 6.2 COMPARISON OF THE END-TO-END MODEL TO TWO BASELINES IN SIMULATION 46 6.2.1 Comparison with Non-end-to-end Baseline 47 6.2.2 Comparison with PD Baseline 51 6.3 TEST THE EFFECT OF DIFFERENT WEIGHTS ASSIGNMENTS ON THE END-TO-END MODEL 53 CHAPTER 7 CONCLUSION 57 / Der Forscher entwickelte eine autonome Fahrsimulation, indem er ein End-to-End-Regelungsmodell mit Hilfe von Deep Reinforcement Learning-Algorithmen in der virtuellen Umgebung von Gym-duckietown trainierte. Die Kontrollstrategie des Modells wurde für die Aufgabe des Spurhaltens entwickelt. Es wurden mehrere Verstärkungslernalgorithmen implementiert, und der SAC-Algorithmus wurde ausgewählt, um ein Nicht-End-to-End-Modell mit den von der Umgebung bereitgestellten Informationen wie Geschwindigkeit als Eingabewerte sowie ein End-to-End-Modell mit den von der Frontkamera des Agenten aufgenommenen Bildern als Eingabe zu trainieren. In diesem Beitrag verglich der Forscher die Vor- und Nachteile der beiden Modelle unter Verwendung kinetischer Parameter in der Umgebung und führte eine Reihe von Experimenten zur Kontrollstrategie des End-to-End-Modells durch, um die Auswirkungen verschiedener Umgebungsparameter oder Belohnungsfunktionen auf die Modelle zu untersuchen.:CHAPTER 1 INTRODUCTION 1 1.1 AUTONOMOUS DRIVING OVERVIEW 1 1.2 RESEARCH QUESTIONS AND METHODS 3 1.2.1 Research Questions 3 1.2.2 Research Methods 4 1.3 PAPER STRUCTURE 5 CHAPTER 2 RESEARCH BACKGROUND 7 2.1 RESEARCH STATUS 7 2.2 THEORETICAL BASIS 8 2.2.1 Machine Learning 8 2.2.2 Deep Learning 9 2.2.3 Reinforcement Learning 11 2.2.4 Deep Reinforcement Learning 14 CHAPTER 3 METHOD 15 3.1 SIMULATION PLATFORM 16 3.2 CONTROL TASK 17 3.3 OBSERVATION SPACE 18 3.3.1 Information as Observation (Non-end-to-end) 19 3.3.2 Images as Observation (End-to-end) 20 3.4 ACTION SPACE 22 3.5 ALGORITHM 23 3.5.1 Mathematical Foundations 23 3.5.2 Policy Iteration 25 3.6 POLICY ARCHITECTURE 25 3.6.1 Network Architecture for Non-end-to-end Model 26 3.6.2 Network Architecture for End-to-end Model 28 3.7 REWARD SHAPING 29 3.7.1 Calculation of Speed-based Reward Function 30 3.7.2 Calculation of the reward function based on the position of the agent relative to the right lane 31 CHAPTER 4 TRAINING PROCESS 33 4.1 TRAINING PROCESS OF NON-END-TO-END MODEL 34 4.2 TRAINING PROCESS OF END-TO-END MODEL 35 CHAPTER 5 RESULT 38 CHAPTER 6 TEST AND EVALUATION 41 6.1 EVALUATION OF END-TO-END MODEL 43 6.1.1 Speed Tests in Two Scenarios 43 6.1.2 Lateral Deviation between the Agent and the Right Lane’s Centerline 44 6.1.3 Orientation Deviation between the Agent and the Right Lane’s Centerline 45 6.2 COMPARISON OF THE END-TO-END MODEL TO TWO BASELINES IN SIMULATION 46 6.2.1 Comparison with Non-end-to-end Baseline 47 6.2.2 Comparison with PD Baseline 51 6.3 TEST THE EFFECT OF DIFFERENT WEIGHTS ASSIGNMENTS ON THE END-TO-END MODEL 53 CHAPTER 7 CONCLUSION 57
76

深度增強學習在動態資產配置上之應用— 以美國ETF為例 / The Application of Deep Reinforcement Learning on Dynamic Asset Allocation : A Case Study of U.S. ETFs

劉上瑋 Unknown Date (has links)
增強式學習(Reinforcement Learning)透過與環境不斷的互動來學習,以達到極大化每一期報酬的總和的目標,廣泛被運用於多期的決策過程。基於這些特性,增強式學習可以應用於建立需不斷動態調整投資組合配置比例的動態資產配置策略。 本研究應用Deep Q-Learning演算法建立動態資產配置策略,研究如何在每期不同的環境狀態之下,找出最佳的配置權重。採用2007年7月2日至2017年6月30日的美國中大型股的股票ETF及投資等級的債券ETF建立投資組合,以其日報酬率資料進行訓練,並與買進持有策略及固定比例投資策略比較績效,檢視深度增強式學習在動態資產配置適用性。 / Reinforcement learning learns by interacting with the environment continuously, in order to achieve the target of maximizing the sum of each return. It has been used to solve multi-period decision making problem broadly. Because of these characteristics, reinforcement learning can be applied to build the strategies of dynamic asset allocation which keep reallocating the mix of portfolio consistently. In this study, we apply deep Q-Learning algorithm to build the strategies of dynamic asset allocation. Studying how to find the optimal weights in the different environment. We use Large-Cap, Mid-Cap ETFs and investment-grade bond ETFs in the U.S. to build up the portfolio. We train the model with the data of daily return, and then we measure its performance by comparing with buy-and-hold and constant-mix strategy to check the fitness of deep Q-Learning.
77

Leveraging deep reinforcement learning in the smart grid environment

Desage, Ysaël 05 1900 (has links)
L’apprentissage statistique moderne démontre des résultats impressionnants, où les or- dinateurs viennent à atteindre ou même à excéder les standards humains dans certaines applications telles que la vision par ordinateur ou les jeux de stratégie. Pourtant, malgré ces avancées, force est de constater que les applications fiables en déploiement en sont encore à leur état embryonnaire en comparaison aux opportunités qu’elles pourraient apporter. C’est dans cette perspective, avec une emphase mise sur la théorie de décision séquentielle et sur les recherches récentes en apprentissage automatique, que nous démontrons l’applica- tion efficace de ces méthodes sur des cas liés au réseau électrique et à l’optimisation de ses acteurs. Nous considérons ainsi des instances impliquant des unités d’emmagasinement éner- gétique ou des voitures électriques, jusqu’aux contrôles thermiques des bâtiments intelligents. Nous concluons finalement en introduisant une nouvelle approche hybride qui combine les performances modernes de l’apprentissage profond et de l’apprentissage par renforcement au cadre d’application éprouvé de la recherche opérationnelle classique, dans le but de faciliter l’intégration de nouvelles méthodes d’apprentissage statistique sur différentes applications concrètes. / While modern statistical learning is achieving impressive results, as computers start exceeding human baselines in some applications like computer vision, or even beating pro- fessional human players at strategy games without any prior knowledge, reliable deployed applications are still in their infancy compared to what these new opportunities could fathom. In this perspective, with a keen focus on sequential decision theory and recent statistical learning research, we demonstrate efficient application of such methods on instances involving the energy grid and the optimization of its actors, from energy storage and electric cars to smart buildings and thermal controls. We conclude by introducing a new hybrid approach combining the modern performance of deep learning and reinforcement learning with the proven application framework of operations research, in the objective of facilitating seamlessly the integration of new statistical learning-oriented methodologies in concrete applications.
78

Deep Reinforcement Learning on Social Environment Aware Navigation based on Maps

Sanchez, Victor January 2023 (has links)
Reinforcement learning (RL) has seen a fast expansion in recent years of its successful application to a range of decision-making and complex control tasks. Moreover, deep learning offers RL the opportunity to enlarge its spectrum of complex fields. Social Robotics is a domain that involves challenges like Human-Robot Interaction which bears inspiration for development in deep RL. Autonomous systems demand a fast and efficient environment perception so as to guarantee safety. However, while being attentive to its surrounding, a robot needs to take decisions to navigate optimally and avoid potential obstacles. In this thesis, we investigate a deep RL method for mobile robot end-to-end navigation in a social environment. Using the observation collected in a simulation environment, a convolutional neural network is trained to predict an appropriate set of discrete angular and linear velocities for a robot based on its egocentric local occupancy grid map. We compare a random learning way to a curriculum learning approach to ameliorate speed convergence during training. We divide the main problem by analysing separately end-to-end navigation and obstacle avoidance in static and dynamic environments. For each problem, we propose an adaptation that aims to improve the surrounding awareness of the agent. The qualitative and quantitative evaluations of the investigated approach were performed in simulations. The results show that the end-to-end navigation map-based model is easy to set up and shows similar performance as a Model Predictive Control approach. However, we discern that obstacle avoidance is harder to translate to a deep RL framework. Despite this difficulty, using different RL methods and configurations will definitely help and bring ideas for improvement for future work. / Förstärkande Inlärning (RL) har sett en snabb expansion de senaste åren för sin fruktbara tillämpning på en rad beslutsfattande och komplexa kontrolluppgifter. Dessutom erbjuder djupinlärning RL möjligheten att utöka sitt spektrum till komplexa områden. Social Robotics är en domän som involverar utmaningar som människa-robot interaktion som bär inspiration för utveckling i djup RL. Autonoma system kräver en snabb och effektiv miljöuppfattning för att garantera säkerheten. Men samtidigt som den är uppmärksam på sin omgivning, måste en robot fatta beslut för att navigera optimalt och undvika potentiella hinder. I detta examensarbete undersöker vi en djup RL-metod för mobil robot-end-to-end-navigering i en social miljö. Med hjälp av observationen som samlats in i en simuleringsmiljö tränas ett faltningsneuralt nätverk för att förutsäga en lämplig uppsättning diskreta vinkel- och linjärhastigheter för en robot baserat på dess egocentriska rutnätskarta över lokala beläggningar. Vi jämför ett slumpmässigt inlärningssätt med läroplansinlärningsmetod för att förbättra hastighetskonvergensen. Vi delar upp huvudproblemet genom att separat analysera end-to-end-navigering och undvikande av hinder i statisk och dynamisk miljö. För varje problem föreslår vi en anpassning som syftar till att agenten bättre förstår sin omgivning. De kvalitativa och kvantitativa utvärderingarna av det undersökta tillvägagångssättet utfördes endast i simuleringar. Resultaten visar att den heltäckande navigationskartbaserade modellen är lätt att distribuera och visar liknande prestanda som en modell för prediktiv kontroll. Vi ser dock att undvikande av hinder är svårare att översätta till ett djupt RL-ramverk. Trots denna svårighet kommer användning av olika RL-metoder och konfiguration definitivt att hjälpa och ge idéer om förbättringar för framtida arbete. / L’apprentissage par renforcement (RL) a connu une expansion rapide ces dernières années pour ses applications à une gamme de tâches de prise de décision et de contrôle complexes. Le deep learning offre au RL la possibilité d’élargir son spectre à des domaines complexes. La robotique sociale est un domaine qui implique des défis tels que l’interaction homme-robot, source d’inspiration pour le développement en RL profond. Les systèmes autonomes exigent une perception rapide et efficace de l’environnement afin de garantir la sécurité. Cependant, tout en étant attentif à son environnement, un robot doit prendre des décisions pour naviguer de manière optimale et éviter les obstacles potentiels. Dans cette thèse, nous étudions une méthode de RL profond pour la navigation de bout a bout de robots mobiles dans un environnement social. À l’aide de l’observation recueillie dans un environnement de simulation, un réseau neuronal convolutif prédit un ensemble adapté de vitesses angulaires et linéaires discrètes pour un robot en fonction de sa carte de grille d’occupation locale égocentrique. Nous comparons une méthode d’apprentissage aléatoire à une approche d’apprentissage du curriculum pour accelerer la convergence durant l’entrainement. Nous divisons le problème principal en analysant séparément la navigation de bout a bout et l’évitement d’obstacles dans un environnement statique et dynamique. Pour chaque problème, nous proposons une adaptation visant à ce que l’agent comprenne mieux son environnement. Les évaluations qualitatives et quantitatives de l’approche étudiée ont été effectuées uniquement dans des simulations. Les résultats montrent que le modèle basé sur la carte de navigation de bout en bout est facile à déployer et affiche des performances similaires à celles d’une approche de contrôle prédictif de modèle. Cependant, nous discernons que l’évitement d’obstacles est plus difficile à traduire dans un cadre RL profond. Malgré cette difficulté, l’utilisation de différentes méthodes et configurations RL aidera certainement et apportera une idée d’amélioration pour les travaux futurs.
79

Remembering how to walk - Using Active Dendrite Networks to Drive Physical Animations / Att minnas att gå - användning av Active Dendrite Nätverk för att driva fysiska animeringar

Henriksson, Klas January 2023 (has links)
Creating embodied agents capable of performing a wide range of tasks in different types of environments has been a longstanding challenge in deep reinforcement learning. A novel network architecture introduced in 2021 called the Active Dendrite Network [A. Iyer et al., “Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments”] designed to create sparse subnetworks for different tasks showed promising multi-tasking performance on the Meta-World [T. Yu et al., “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning”] multi-tasking benchmark. This thesis further explores the performance of this novel architecture in a multi-tasking environment focused on physical animations and locomotion. Specifically we implement and compare the architecture to the commonly used Multi-Layer Perceptron (MLP) architecture on a multi-task reinforcement learning problem in a video-game setting consisting of training a hexapedal agent on a set of locomotion tasks involving moving at different speeds, turning and standing still. The evaluation focused on two areas: (1) Assessing the average overall performance of the Active Dendrite Network relative to the MLP on a set of locomotive scenarios featuring our behaviour sets and environments. (2) Assessing the relative impact Active Dendrite networks have on transfer learning between related tasks by comparing their performance on novel behaviours shortly after training a related behaviour. Our findings suggest that the novel Active Dendrite Network can make better use of limited network capacity compared to the MLP - the Active Dendrite Network outperformed the MLP by ∼18% on our benchmark using limited network capacity. When both networks have sufficient capacity however, there is not much difference between the two. We further find that Active Dendrite Networks have very similar transfer-learning capabilities compared to the MLP in our benchmarks.
80

REAL-TIME UPDATING AND NEAR-OPTIMAL ENERGY MANAGEMENT SYSTEM FOR MULTI-MODE ELECTRIFIED POWERTRAIN WITH REINFORCEMENT LEARNING CONTROL

Biswas, Atriya January 2021 (has links)
Energy management systems (EMSs), implemented in the electronic control unit (ECU) of an actual vehicle with electri ed powertrain, is a much simpler version of the theoretically developed EMS. Such simpli cation is done to accommodate the EMS within the given memory constraint and computational capacity of the ECU. The simpli cation should ensure reasonable performance compared to theoretical EMS under real-life driving scenarios. The process of simpli cation must be effective to create a versatile and utilitarian EMS. The reinforcement learning-based controllers feature pro table characteristics in optimizing the performance of controllable physical systems as they do not mandatorily require a mathematical model of system dynamics (i.e. they are model-free). Quite naturally, it can aspired to testify such prowess of reinforcement learning-based controllers in achieving near-global optimal performance for energy management system (supervisory) of electri ed powertrains. Before deployment of any supervisory controller as a mainstream controller, they should be essentially scrutinized through various levels of virtual simulation platforms with an ascending order of physical system emulating-capability. The controller evolves from a mathematical concept to an utilitarian embedded system through a series of these levels where it undergoes gradual transformation to finally become apposite for a real physical system. Implementation of the control strategy in a Simulink-based forward simulation model could be the first stage of the aforementioned evolution process. This brief will delineate all the steps required for implementing an reinforcement learning-based supervisory controller in a forward simulation model of a hybrid electric vehicle. A novel framework of loss-minimization based instantaneous optimal strategy is introduced for the energy management system of a multi-mode hybrid electric powertrain in this brief. The loss-minimization strategy is flexible enough to be implemented in any architecture of electrified powertrains. It is mathematically proven that the overall system loss minimization is equivalent to the minimization of fuel consumption. An online simulation framework is developed in this article to evaluate the performance of a multi-mode electrified powertrain equipped with more than one power source. An electrically variable transmission with two planetary gear-set has been chosen as the centerpiece of the powertrain considering the versatility and future prospects of such transmissions. It is noteworthy to mention that a novel architecture topology selected for this dissertation is engendered through a series of rigorous screening process whose workflow is presented here with brevity. One of the legitimate concern of multi-mode transmission is it's proclivity to contribute discontinuity of power-flow in the downstream of the powertrain. Mode-shift events can be predominantly held responsible for engendering such discontinuity. Advent of dynamic coordinated control as a technique for ameliorating such discontinuity has been substantiated by many scholars in literature. Hence, a system-level coordinated control is employed within the energy management system which governs the mode schedule of the multi-mode powertrain in real-time simulation. / Thesis / Doctor of Philosophy (PhD)

Page generated in 0.5351 seconds