• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Robust Reinforcement Learning in Continuous Action/State Space

Grönland, Axel, Eriksson Möllerstedt, Viktor January 2020 (has links)
In this project we aim to apply Robust Reinforce-ment Learning algorithms, presented by Doya and Morimoto [1],[2], to control problems. Specifically, we train an agent to balancea pendulum in the unstable equilibrium, which is the invertedstate.We investigate the performance of controllers based on twodifferent function approximators. One is quadratic, and the othermakes use of a Radial Basis Function neural network. To achieverobustness we will make use of an approach similar toH∞control, which amounts to introducing an adversary in the controlsystem.By changing the mass of the pendulum after training, we aimedto show as in [2] that the supposedly robust controllers couldhandle this disruption better than its non-robust counterparts.This was not the case. We also added a random disturber signalafter training and performed similar tests, but we were againunable to show robustness. / I detta projekt applicerar vi Robust Rein- forcement Learning (RRL) algoritmer, framtagna av Doya och Morimoto [1], [2], på reglerproblem. Målet var att träna en agent att balansera en pendel i det instabila jämviktsläget; det inverterade tillståndet. Vi undersökte prestandan hos regulatorer baserade på två value function approximators. Den ena är kvadratisk och den andra en Radial Basis Function neuralt nätverk. För att skapa robusthet så använder vi en metod som är ekvivalent med H∞ - reglering, som innebär att man introducerar en motståndare i reglersystemet. Genom att ändra pendelns massa efter träning, hoppas vi att som i [2] kunna visa att den förment robusta regulatorn klarar av denna störning bättre än sin icke-robusta mostvarighet. Detta var inte fallet. Vi lade även till en slumpmässig störsignal efter träning och utförde liknande tester, men lyckades inte visa robusthet i detta fall heller. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm

Page generated in 0.0412 seconds