• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 65
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 95
  • 95
  • 95
  • 28
  • 17
  • 17
  • 17
  • 16
  • 16
  • 16
  • 16
  • 16
  • 15
  • 14
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design

Kočí, Jakub January 2020 (has links)
This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
52

Optimizing Power Consumption, Resource Utilization, and Performance for Manycore Architectures using Reinforcement Learning

Fettes, Quintin 23 May 2022 (has links)
No description available.
53

Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning

Zachery Peter Berg (12455928) 25 April 2022 (has links)
<p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p>
54

A comparison of genetic algorithm and reinforcement learning for autonomous driving / En jämförelse mellan genetisk algoritm och förstärkningslärande för självkörande bilar

Xiang, Ziyi January 2019 (has links)
This paper compares two different methods, reinforcement learning and genetic algorithm for designing autonomous cars’ control system in a dynamic environment. The research problem could be formulated as such: How is the learning efficiency compared between reinforcement learning and genetic algorithm on autonomous navigation through a dynamic environment? In conclusion, the genetic algorithm outperforms the reinforcement learning on mean learning time, despite the fact that the prior shows a large variance, i.e. genetic algorithm provide a better learning efficiency. / I det här papperet jämförs två olika metoder, förstärkningsinlärning och genetisk algoritm för att designa autonoma bilar styrsystem i en dynamisk miljö. Forskningsproblemet kan formuleras som: Hur är inlärningseffektiviteten jämför mellan förstärkningsinlärning och genetisk algoritm på autonom navigering i en dynamisk miljö? Sammanfattningsvis, den genetisk algoritm överträffar förstärkningsinlärning på genomsnittlig inlärningstid, trots att den tidigare visar en stor varians, dvs genetisk algoritm, ger en bättre inlärningseffektivitet.
55

Building the Intelligent IoT-Edge: Balancing Security and Functionality using Deep Reinforcement Learning

Anand A Mudgerikar (11791094) 19 December 2021 (has links)
<div>The exponential growth of Internet of Things (IoT) and cyber-physical systems is resulting in complex environments comprising of various devices interacting with each other and with users. In addition, the rapid advances in Artificial Intelligence are making those devices able to autonomously modify their behaviors through the use of techniques such as reinforcement learning (RL). There is thus the need for an intelligent monitoring system on the network edge with a global view of the environment to autonomously predict optimal device actions. However, it is clear however that ensuring safety and security in such environments is critical. To this effect, we develop a constrained RL framework for IoT environments that determines optimal devices actions with respect to user-defined goals or required functionalities using deep Q learning. We use anomaly based intrusion detection on the network edge to dynamically generate security and safety policies to constrain the RL agent in the framework. We analyze the balance required between ‘safety/security’ and ‘functionality’ in IoT environments by manipulating the exploration of safe and unsafe benefit state spaces in the RL framework. We instantiate the framework for testing on application layer control in smart home environments, and network layer control including network functionalities like rate control and routing, for SDN based environments.</div>
56

Reinforcement Learning for Hydrobatic AUVs / Reinforcement learning för Hydrobatiska AUV

Woźniak, Grzegorz January 2022 (has links)
This master thesis focuses on developing a Reinforcement Learning (RL) controller to perform hydrobatic maneuvers on an Autonomous Underwater Vehicle (AUV) successfully. This work also aims to analyze the robustness of the RL controller, as well as provide a comparison between RL algorithms and Proportional Integral Derivative (PID) control. Training of the algorithms is initially conducted in a Numpy simulation in Python. We show how to model the Equations of Motion (EOM) of the AUV and how to use it to train the RL controllers. We use the stablebaselines3 RL framework and create a training environment with the OpenAI gym. The Twin-Delay Deep Deterministic Policy Gradient (TD3) algorithm offers good performance in the simulation. The following maneuvers are studied: trim control, waypoint following, and an inverted pendulum. We test the maneuvers both in the Numpy simulation and Stonefish simulator. Also, we test the robustness of the RL trim controller by simulating noise in the state feedback. Lastly, we run the RL trim controller on a real AUV hardware called SAM. We show that the RL algorithm trained in the Numpy simulator can achieve similar performance to the PID controller in the Stonefish simulator. We generate a policy that can perform the trim control and the Inverted Pendulum maneuver in the Numpy simulation. We show that we can generate a robust policy that executes other types of maneuvers by providing a parameterized cost function to the RL algorithm. We discuss the results of every maneuver we perform with the SAM AUV and provide a discussion about the advantages and disadvantages of this control method applied to underwater robotics. We conclude that RL can be used to create policies that perform hydrobatic maneuvers. This data-driven approach can be applied in the future to more complex problems in underwater robotics. / Denna masteruppsats fokuserar på att utveckla en Reinforcement Learning (RL) kontroller för att framgångsrikt utföra hydrobatiska manövrar på ett autonomt undervattensfordon (AUV). Detta arbete syftar också till att analysera robustheten hos RL-kontrollern, samt tillhandahålla en jämförelse mellan RL-algoritmer och Proportional Integral Derivative (PID) kontroll. Träning av algoritmerna utförs initialt i Numpy-simuleringen i Python. Vi visar hur man modellerar rörelseekvationerna (EOM) för AUV, och hur man använder den för att träna RL-kontrollerna. Vi använder ramverket stablebaselines3 RL och skapar en träningsmiljö med gymmet OpenAI. Algoritmen Twin-Delay Deep Deterministic Policy Gradient (TD3) erbjuder bra prestanda i simuleringen. Följande manövrar studeras: trimkontroll, waypointföljning och en inverterad pendel. Vi testar manövrarna både i Numpy-simulering och Stonefish-simulator. Vi testar också robustheten hos RL-trimkontrollern genom att simulera bruset i tillståndsåterkopplingen. Slutligen kör vi RL-trimkontrollern på den riktiga SAM AUV-hårdvaran. Vi visar att RL-algoritmen tränad i Numpy-simulatorn kan uppnå liknande prestanda som PID-regulatorn i Stonefish-simulatorn. Vi genererar en policy som kan utföra trimkontrollen och manövern med inverterad pendel i Numpy-simuleringen. Vi visar att vi kan generera en robust policy som utför andra typer av manövrar genom att tillhandahålla en parameteriserad kostnadsfunktion till RL-algoritmen. Vi diskuterar resultaten av varje manöver vi utför med SAM AUV och ger en diskussion om fördelarna och nackdelarna med denna kontrollmetod som tillämpas på undervattensrobotik. Vi drar slutsatsen att RL kan användas för att skapa policyer som utför hydrobatiska manövrar. Detta datadrivna tillvägagångssätt kan tillämpas i framtiden på mer komplexa problem inom undervattensrobotik.
57

AI for an Imperfect-Information Wargame with Self-Play Reinforcement Learning / AI med självspelande förstärkningsinlärning för ett krigsspel med imperfekt information

Ryblad, Filip January 2021 (has links)
The task of training AIs for imperfect-information games has long been difficult. However, recently the algorithm ReBeL, a general framework for self-play reinforcement learning, has been shown to excel at heads-up no-limit Texas hold 'em, among other imperfect-information games. In this report the ability to adapt ReBeL to a downscaled version of the strategy wargame \say{Game of the Generals} is explored. It is shown that an implementation of ReBeL that uses no domain-specific knowledge is able to beat all benchmark bots, which indicates that ReBeL can be a useful framework when training AIs for imperfect-information wargames. / Det har länge varit en utmaning att träna AI:n för spel med imperfekt information. Nyligen har dock algoritmen ReBeL, ett generellt ramverk för självspelande förstärkningsinlärning, visat lovande prestanda i heads-up no-limit Texas hold 'em och andra spel med imperfekt information. I denna rapport undersöks ReBeLs förmåga att anpassas till en nedskalad version av spelet \say{Game of the Generals}, vilket är ett strategiskt krigsspel. Det visas att en implementation av ReBeL som inte använder någon domänspecifik kunskap klarar av att besegra alla bottar som användes vid jämförelse, vilket indikerar att ReBeL kan vara ett användbart ramverk för att träna AI:n för krigsspel med imperfekt information.
58

Smart Tracking for Edge-assisted Object Detection : Deep Reinforcement Learning for Multi-objective Optimization of Tracking-based Detection Process / Smart Spårning för Edge-assisterad Objektdetektering : Djup Förstärkningsinlärning för Flermålsoptimering av Spårningsbaserad Detekteringsprocess

Zhou, Shihang January 2023 (has links)
Detecting generic objects is one important sensing task for applications that need to understand the environment, for example eXtended Reality (XR), drone navigation etc. However, Object Detection algorithms are particularly computationally heavy for real-time video analysis on resource-constrained mobile devices. Thus Object Tracking, which is a much lighter process, is introduced under the Tracking-By-Detection (TBD) paradigm to alleviate the computational overhead. Still, it is common that the configurations of the TBD remain unchanged, which would result in unnecessary computation and/or performance loss in many cases.\\ This Master's Thesis presents a novel approach for multi-objective optimization of the TBD process on precision and latency, with the platform being power-constrained devices. We propose a Deep Reinforcement Learning based scheduling architecture that selects appropriate TBD actions in video sequences to achieve the desired goals. Specifically, we develop a simulation environment providing Markovian state information as input for the scheduler neural network, justified options of TBD actions, and a scalarized reward function to combine the multiple objectives. Our results demonstrate that the trained policies can learn to utilize content information from the current and previous frames, thus optimally controlling the TBD process at each frame. The proposed approach outperforms the baselines that have fixed TBD configurations and recent research works, achieving the precision close to pure detection while keeping the latency much lower. Both tuneable configurations show positive and synergistic contribution to the optimization objectives. We also show that our policies are generalizable, with inference and action time of the scheduler having minimal latency overhead. This makes our scheduling design highly practical in real XR or similar applications on power-constrained devices. / Att upptäcka generiska objekt är en viktig uppgift inom avkänning för tillämpningar som behöver förstå omgivningen, såsom eXtended Reality (XR) och navigering med drönare, bland annat. Algoritmer för objektdetektering är dock särskilt beräkningstunga när det gäller videoanalyser i realtid på resursbegränsade mobila enheter. Objektspårning, å andra sidan, är en lättare process som vanligtvis implementeras under Tracking-By-Detection (TBD)-paradigmet för att minska beräkningskostnaden. Det är dock vanligt att TBD-konfigurationerna förblir oförändrade, vilket leder till onödig beräkning och/eller prestandaförlust i många fall.\\ I detta examensarbete presenteras en ny metod för multiobjektiv optimering av TBD-processen med avseende på precision och latens på plattformar med begränsad prestanda. Vi föreslår en djup förstärkningsinlärningsbaserad schemaläggningsarkitektur som väljer lämpliga TBD-åtgärder för videosekvenser för att uppnå de önskade målen. Vi utvecklar specifikt en simulering som tillhandahåller Markovian state-information som indata för schemaläggaren, samt neurala nätverk, motiverade alternativ för TBD-åtgärder och en skalariserad belöningsfunktion för att kombinera de olika målen. Våra resultat visar att de tränade strategierna kan lära sig att använda innehållsinformation från aktuella och tidigare ramar för att optimalt styra TBD-processen för varje bild. Det föreslagna tillvägagångssättet är bättre än både de grundläggande metoderna med en fast TBD-konfiguration och nyare forskningsarbeten. Det uppnår en precision som ligger nära den rena detektionen samtidigt som latensen hålls mycket låg. Båda justerbara konfigurationerna bidrar positivt och synergistiskt till optimeringsmålen. Vi visar också att våra strategier är generaliserbara genom att dela upp träning och testning med en 50 %-ig uppdelning, vilket resulterar i minimal inferenslatens och schemaläggarens handlingslatens. Detta gör vår schemaläggningsdesign mycket praktisk i verkliga XR- eller liknande tillämpningar på enheter med begränsad strömförsörjning.
59

Autonomous Cyber Defense for Resilient Cyber-Physical Systems

Zhang, Qisheng 09 January 2024 (has links)
In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving. / Doctor of Philosophy / In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving.
60

Deep reinforcement learning approach to portfolio management / Deep reinforcement learning metod för portföljförvaltning

Jama, Fuaad January 2023 (has links)
This thesis evaluates the use of a Deep Reinforcement Learning (DRL) approach to portfolio management on the Swedish stock market. The idea is to construct a portfolio that is adjusted daily using the DRL algorithm Proximal policy optimization (PPO) with a multi perceptron neural network. The input to the neural network was historical data in the form of open, high, and low price data. The portfolio is evaluated by its performance against the OMX Stockholm 30 index (OMXS30). Furthermore, three different approaches for optimization are going to be studied, in that three different reward functions are going to be used. These functions are Sharp ratio, cumulative reward (Daily return) and Value at risk reward (which is a daily return with a value at risk penalty). The historival data that is going to be used is from the period 2010-01-01 to 2015-12-31 and the DRL approach is then tested on two different time periods which represents different marked conditions, 2016-01-01 to 2018-12-31 and 2019-01-01 to 2021-12-31. The results show that in the first test period all three methods (corresponding to the three different reward functions) outperform the OMXS30 benchmark in returns and sharp ratio, while in the second test period none of the methods outperform the OMXS30 index. / Målet med det här arbetet var att utvärdera användningen av "Deep reinforcement learning" (DRL) metod för portföljförvaltning på den svenska aktiemarknaden. Idén är att konstruera en portfölj som justeras dagligen med hjälp av DRL algoritmen "Proximal policy optimization" (PPO) med ett neuralt nätverk med flera perceptroner. Inmatningen till det neurala nätverket var historiska data i form av öppnings, lägsta och högsta priser. Portföljen utvärderades utifrån dess prestation mot OMX Stockholm 30 index (OMXS30). Dessutom studerades tre olika tillvägagångssätt för optimering, genom att använda tre olika belöningsfunktioner. Dessa funktioner var Sharp ratio, kumulativ belöning (Daglig avkastning) och Value at risk-belöning (som är en daglig avkastning minus Value at risk-belöning). Den historiska data som användes var från perioden 2010-01-01 till 2015-12-31 och DRL-metoden testades sedan på två olika tidsperioder som representerar olika marknadsförhållanden, 2016-01-01 till 2018-12-31 och 2019-01-01 till 2021-12-31. Resultatet visar att i den första testperioden så överträffade alla tre metoder (vilket motsvarar de tre olika belöningsfunktionerna) OMXS30 indexet i avkastning och sharp ratio, medan i den andra testperioden så överträffade ingen av metoderna OMXS30 indexet.

Page generated in 0.0552 seconds