• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 18
  • 1
  • Tagged with
  • 19
  • 19
  • 19
  • 16
  • 10
  • 9
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • 7
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Distributed Deep Reinforcement Learning for a Multi-Robot Warehouse System

Stenberg, Holger, Wahréus, Johan January 2021 (has links)
This project concerns optimizing the behavior ofmultiple dispatching robots in a virtual warehouse environment.Q-learning and deep Q-learning algorithms, two establishedmethods in reinforcement learning, were used for this purpose.Simulations were run during the project, implementing andcomparing different algorithms on environments with up to fourrobots. The efficiency of a given algorithm was assessed primarilyby the number of packages it enabled the robots to deliver andhow fast the solution converged. The simulation results revealedthat a Q-learning algorithm could solve problems in environmentswith up to two active robots efficiently. To solve more complexproblems in environments with more than two robots, deep Qlearninghad to be implemented to avoid prolonged computationsand excessive memory usage. / Detta projekt handlar om att optimera rörelserna för ett flertal robotar i en virtuell miljö. Q-learning och deep Q-learning-algoritmer, två väletablerade metoder inom maskininlärning, användes för detta. Under projektet utfördes simuleringar där de olika algoritmerna jämfördes i miljöer med upp till fyra robotar. En given algoritms prestanda bedömdes med avseende på hur många paket robotarna kunde leverera i miljön samt hur snabbt en lösning konvergerade. Resultaten visade att Q-learning kunde lösa problem i miljöer med upp 2 robotar effektivt. För större problem användes deep Q-learning för att undvika långvariga beräkningar och stor minnesåtgång. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
12

Deep Reinforcement Learning in Cart Pole and Pong

Kuurne Uussilta, Dennis, Olsson, Viktor January 2020 (has links)
In this project, we aim to reproduce previous resultsachieved with Deep Reinforcement Learning. We present theMarkov Decision Process model as well as the algorithms Q-learning and Deep Q-learning Network (DQN). We implement aDQN agent, first in an environment called CartPole, and later inthe game Pong.Our agent was able to solve the CartPole environment in lessthan 300 episodes. We assess the impact some of the parametershad on the agents performance. The performance of the agentis particularly sensitive to the learning rate and seeminglyproportional to the dimension of the neural network. The DQNagent implemented in Pong was unable to learn, performing atthe same level as an agent picking actions at random, despiteintroducing various modifications to the algorithm. We discusspossible sources of error, including the RAM used as input,possibly not containing sufficient information. Furthermore, wediscuss the possibility of needing additional modifications to thealgorithm in order to achieve convergence, as it is not guaranteedfor DQN. / Målet med detta projekt är att reproducera tidigare resultat som uppnåtts med Deep Reinforcement Learning. Vi presenterar Markov Decision Process-modellen samt algoritmerna Q-learning och Deep Q-learning Network (DQN). Vi implementerar en DQN agent, först i miljön CartPole, sedan i spelet Pong.  Vår agent lyckades lösa CartPole på mindre än 300 episoder. Vi gör en bedömning av vissa parametrars påverkan på agentens prestanda. Agentens prestanda är särskilt känslig för värdet på ”learning rate” och verkar vara proportionell mot dimensionen av det neurala nätverket. DQN-agenten som implementerades i Pong var oförmögen att lära sig och spelade på samma nivå som en agent som agerar slumpmässigt, trots att vi introducerade diverse modifikationer. Vi diskuterar möjliga felkällor, bland annat att RAM, som används som indata till agenten, eventuellt saknar tillräcklig information. Dessutom diskuterar vi att ytterligare modifikationer kan vara nödvändiga för uppnå konvergens eftersom detta inte är garanterat för DQN. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
13

Distributed Optimization Through Deep Reinforcement Learning

Funkquist, Mikaela, Lu, Minghua January 2020 (has links)
Reinforcement learning methods allows self-learningagents to play video- and board games autonomously. Thisproject aims to study the efficiency of the reinforcement learningalgorithms Q-learning and deep Q-learning for dynamical multi-agent problems. The goal is to train robots to optimally navigatethrough a warehouse without colliding.A virtual environment was created, in which the learning algo-rithms were tested by simulating moving agents. The algorithms’efficiency was evaluated by how fast the agents learned to performpredetermined tasks.The results show that Q-learning excels in simple problemswith few agents, quickly solving systems with two active agents.Deep Q-learning proved to be better suited for complex systemscontaining several agents, though cases of sub-optimal movementwere still possible. Both algorithms showed great potential fortheir respective areas however improvements still need to be madefor any real-world use. / Förstärkningsinlärningsmetoder tillåter självlärande enheter att spela video- och brädspel autonomt. Projektet siktar på att studera effektiviteten hos förstärkningsinlärningsmetoderna Q-learning och deep Q-learning i dynamiska problem. Målet är att träna upp robotar så att de kan röra sig genom ett varuhus på bästa sätt utan att kollidera. En virtuell miljö skapades, i vilken algoritmerna testades genom att simulera agenter som rörde sig. Algoritmernas effektivitet utvärderades av hur snabbt agenterna lärde sig att utföra förutbestämda uppgifter. Resultatet visar att Q-learning fungerar bra för enkla problem med få agenter, där system med två aktiva agenter löstes snabbt. Deep Q-learning fungerar bättre för mer komplexa system som innehåller fler agenter, men fall med suboptimala rörelser uppstod. Båda algoritmerna visade god potential inom deras respektive områden, däremot måste förbättringar göras innan de kan användas i verkligheten. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
14

Deep Q-Learning for Lane Localization : Exploring Reinforcement Learning for Accurate Lane Detection / Djupinlärning med Q-lärande för fillokalisation : Utforskning av förstärkningsinlärning för noggrann filavkänning

Ganesan, Aishwarya January 2024 (has links)
In autonomous driving, achieving fast and reliable lane detection is essential. This project explores a two-step lane detection and localization approach, diverging from relying solely on end-to-end deep learning methods, which often struggle with curved or occluded lanes. Specifically, we investigate the feasibility of training a deep reinforcement learning-based agent to adjust the detected lane, manipulating either the lane points or the parameters of a Bézier curve. However, the study found that reinforcement learning-based localization, particularly on datasets like TuSimple, did not perform as well as anticipated, despite efforts to enhance performance using various metrics. Introducing curves to expand the localizer's scope did not surpass the point-based approach, indicating the need for further refinement for Deep Q-learning localization to be feasible. Although optimization techniques like Double Deep Q-Network showed improvements, the study did not support the hypothesis that curves with Deep Q-learning offer superior performance, highlighting the necessity for additional research into alternative methods to achieve more accurate lane detection and localization in autonomous driving systems using reinforcement learning. / I autonom körning är att uppnå snabb och pålitlig filavkänning av avgörande betydelse. Detta projekt utforskar ett tvåstegs tillvägagångssätt för filavkänning och lokalisation som skiljer sig från att enbart förlita sig på end-to-end djupinlärningsmetoder, vilka ofta har svårt med krökta eller ockluderade filer. Mer specifikt undersöker vi genomförbarheten att träna en djupinlärningsbaserad förstärkningsinlärningsagent för att justera den upptäckta filen genom att manipulera antingen filpunkterna eller parametrarna för en Bézier-kurva. Studien fann dock att lokalisation baserad på förstärkningsinlärning, särskilt på dataset som TuSimple, inte presterade så bra som förväntat, trots ansträngningar att förbättra prestanda med olika metriker. Att introducera kurvor för att utvidga lokaliserarens omfattning överträffade inte det punktbaserade tillvägagångssättet, vilket tyder på behovet av ytterligare förfining för att göra Deep Q-learning lokalisation praktiskt genomförbart. Även om optimeringstekniker som Double Deep Q-Network visade förbättringar, stödde studien inte hypotesen att kurvor med Deep Q-learning erbjuder överlägsen prestanda, vilket understryker nödvändigheten av ytterligare forskning om alternativa metoder för att uppnå mer exakt filavkänning och lokalisation i autonom körningssystem med hjälp av förstärkningsinlärning.
15

[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES

30 December 2021 (has links)
[pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated.
16

Evaluation of Deep Q-Learning Applied to City Environment Autonomous Driving

Wedén, Jonas January 2024 (has links)
This project’s goal was to assess both the challenges of implementing the Deep Q-Learning algorithm to create an autonomous car in the CARLA simulator, and the driving performance of the resulting model. An agent was trained to follow waypoints based on two main approaches. First, a camera-based approach, which allowed the agent to gather information about the environment from a camera sensor. The image along with other driving features were fed to a convolutional neural network. Second, an approach focused purely on following the waypoints without the camera sensor. The camera sensor was substituted for an array containing the agent’s angle with respect to the upcoming waypoints along with other driving features. Even though the camera-based approach was the best during evaluation, no approach was successful in consistently following the waypoints of a straight route. To increase the performance of the camera-based approach more training episodes need to be provided. Furthermore, both approaches would greatly benefit from experimentation and optimization of the model’s neural network configuration and its hyperparameters.
17

Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies

Drotz, Axel, Hector, Markus January 2021 (has links)
In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
18

An empirical study of stability and variance reduction in DeepReinforcement Learning

Lindström, Alexander January 2024 (has links)
Reinforcement Learning (RL) is a branch of AI that deals with solving complex sequential decision making problems such as training robots, trading while following patterns and trends, optimal control of industrial processes, and more. These applications span various fields, including data science, factories, finance, and others[1]. The most popular RL algorithm today is Deep Q Learning (DQL), developed by a team at DeepMind, which successfully combines RL with Neural Network (NN). However, combining RL and NN introduces challenges such as numerical instability and unstable learning due to high variance. Among others, these issues are due to the“moving target problem”. To mitigate this problem, the target network was introduced as a solution. However, using a target network slows down learning, vastly increases memory requirements, and adds overheads in running the code. In this thesis, we conduct an empirical study to investigate the importance of target networks. We conduct this empirical study for three scenarios. In the first scenario, we train agents in online learning. The aim here is to demonstrate that the target network can be removed after some point in time without negatively affecting performance. To evaluate this scenario, we introduce the concept of the stabilization point. In thesecond scenario, we pre-train agents before continuing to train them in online learning. For this scenario, we demonstrate the redundancy of the target network by showing that it can be completely omitted. In the third scenario, we evaluate a newly developed activation function called Truncated Gaussian Error Linear Unit (TGeLU). For thisscenario, we train an agent in online learning and show that by using TGeLU as anactivation function, we can completely remove the target network. Through the empirical study of these scenarios, we conjecture and verify that a target network has only transient benefits concerning stability. We show that it has no influence on the quality of the policy found. We also observed that variance was generally higher when using a target network in the later stages of training compared to cases where the target network had been removed. Additionally, during the investigation of the second scenario, we observed that the magnitude of training iterations during pre-training affected the agent’s performance in the online learning phase. This thesis provides a deeper understanding of how the target networkaffects the training process of DQL, some of them - surrounding variance reduction- are contrary to popular belief. Additionally, the results have provided insights into potential future work. These include further explore the benefits of lower variance observed when removing the target network and conducting more efficient convergence analyses for the pre-training part in the second scenario.
19

Theseus : a 3D virtual reality orientation game with real-time guidance system for cognitive training

Jha, Manish Kumar 10 1900 (has links)
Des études soutiennent que l’entraînement cognitif est une méthode efficace pour ralentirle déclin cognitif chez les personnes âgées. Les jeux sérieux basés sur la réalité virtuelle(RV) ont trouvé une application dans ce domaine en raison du haut niveau d’immersionet d’interactivité offert par les environnements virtuels (EV). Ce projet implémente unjeu d’orientation 3D en réalité virtuelle entièrement immersif avec un système pour guiderl’utilisateur en temps réel. Le jeu d’orientation 3D est utilisé comme exercice pour entraînerles capacités cognitives des utilisateurs. Les effets immédiats du jeu d’orientation sur lescapacités de mémoire et d’attention ont été étudiés sur quinze personnes âgées présentant undéclin cognitif subjectif (DCS). Il a été observé que bien qu’il n’y ait pas eu d’améliorationsignificative des résultats pour les exercices d’attention, les participants ont obtenu demeilleurs résultats aux exercices de mémoire spécifiques après avoir joué au jeu d’orientation. Le manque de succès dans la réalisation de l’objectif requis peut parfois augmenter lesémotions négatives chez les êtres humains, et plus particulièrement chez les personnes quisouffrent de déclin cognitif. C’est pourquoi le jeu a été équipé d’un système de guidageavec indices de localisation en temps réel pour contrôler les émotions négatives et aiderles participants à accomplir leurs tâches. Le système de guidage est basé sur des règleslogiques; chaque indice est délivré si une condition spécifique est remplie. Le changement desémotions des participants a montré que les indices sont efficaces pour réduire la frustration,étant donné qu’ils sont facilement compréhensibles et conçus pour donner un retour positif. La dernière partie du projet se concentre sur le système de guidage et met en oeuvre unmoyen pour l’activer entièrement selon les émotions d’une personne. Le problème consisteà identifier l’état des émotions qui devraient déclencher l’activation du système de guidage.Ce problème prend la forme d’un processus de décision markovien (PDM), qui peut êtrerésolu via l’apprentissage par renforcement (AR). Le réseau profond Q (RPQ) avec relectured’expérience (ER), qui est l’un des algorithmes d’apprentissage par renforcement les plusavancés pour la prédiction d’actions dans un espace d’action discret, a été utilisé dans cecontexte. L’algorithme a été formé sur des données d’émotions simulées, et testé sur les données de quinze personnes âgées acquises lors d’expériences menées dans la première partiedu projet. On observe que la méthode basée sur l’AR est plus performante que la méthodebasée sur les règles pour identifier l’état mental d’une personne afin de lui fournir des indices. / Studies support cognitive training as an efficient method to slow the cognitive declinein older adults. Virtual reality (VR) based serious games have found application in thisfield due to the high level of immersion and interactivity offered by virtual environments(VE). This project implements a fully immersive 3D virtual reality orientation game with areal-time guidance system to be used as an exercise for cognitive training. The immediateaftereffects of playing the orientation game on memory and attention abilities were studiedon fifteen older adults with subjective cognitive decline (SCD). It was observed that whilethere was no significant improvement in attention exercises, the participants performedbetter in specific memory exercises after playing the orientation game. Sometimes lack of success in achieving the required objective may increase the negativeemotions in humans and more so in people who suffer from cognitive decline. Hence, thegame was equipped with a real-time guidance system with location hints to control negativeemotions and help participants to complete the tasks. The guidance system is based onlogical rules; each hint is delivered if a specific condition is met. Change in emotions ofparticipants showed that hints are effective in reducing frustration, given that the hints areeasily comprehensible and designed to give positive feedback. The final part of the project focuses on the guidance system and implements a way toactivate it entirely based on a person’s emotions. The problem calls for identifying the stateof the emotions that should trigger the guidance system’s activation. This problem takes theform of a Markov decision process (MDP), which can be solved by setting it in a reinforcementlearning framework. Deep Q-Learning network (DQN) with experience replay (ER),which is one of the state-of-the-art reinforcement learning algorithms for predicting actionsin discrete action space, was used in this context. The algorithm was trained on simulateddata of emotions and tested on the data of fifteen older adults acquired in experimentsconducted in the first part of the project. It is observed that the RL based method performsbetter than the rule-based method in identifying the mental state of a person to provide hints.

Page generated in 0.0593 seconds