• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • Tagged with
  • 4
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Asynchronous Advantage Actor-Critic and Flappy Bird

Wibrink, Marcus, Fredriksson, Markus January 2021 (has links)
Games provide ideal environments for assessingreinforcement learning algorithms because of their simple dynamicsand their inexpensive testing, compared to real-worldenvironments. Asynchronous Advantage Actor-Critic (A3C), developedby DeepMind, has shown significant improvements inperformance over other state-of-the-art algorithms on Atarigames. Additionally, the algorithm A3C(lambda) which is ageneralization of A3C, has previously been shown to furtherimprove upon A3C in these environments. In this work, weimplement A3C and A3C(lambda) on the environment Cart-Poleand Flappy Bird and evaluate their performance via simulation.The simulations show that A3C effectively masters the Cart-Poleenvironment, as expected. In Flappy Bird sparse rewards arepresent, and the simulations reveal that despite this A3C managesto overcome this challenge the majority of times, achievinga linear increase in learning. Further simulations were madeon Flappy Bird with the inclusion of an entropy term andwith A3C(lambda), which display no signs of improvement inperformance when compared to regular A3C. / Spel utgör ideella miljöer för att bedöma reinforcement learning algoritmer på grund av deras enkla dynamik och billiga testning jämfört med verkliga miljöer. Asynchronous advantage actor-critic (A3C) utvecklad av DeepMind har visat betydande förbättringar på Atari spel jämfört med andra etablerade RL-algoritmer. Vidare har algoritmen A3C(lambda), som är en generalisering av A3C, tidigare visats ge ännu bättre resultat för dessa spel. I denna studie implementerar vi A3C och A3C(lambda) på miljöerna Cart-Pole och Flappy Bird och utvärderar algoritmerna via simulering. Simuleringarna visar att A3C på kort tid bemästrar Cart-Pole, som väntat. I Flappy Bird är användbar information glest fördelad och belöningen har ett lokalt optimum vilket leder till att algoritmen riskerar att fastna. Trots detta visar simuleringarna att A3C lyckas ta sig förbi det lokala optimat majoriteten av försöken och förbättrar sin belöning linjärt därefter. Ytterligare simuleringar gjordes på Flappy Bird genom att inkludera en entropiterm och med A3C(lambda). Metoderna visade någon märkbar förbättring jämfört med vanlig A3C. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
2

MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler

Baheri, Betis 24 July 2020 (has links)
No description available.
3

Single Image Super Resolution with Infrared Imagery and Multi-Step Reinforcement Learning

Vassilo, Kyle January 2020 (has links)
No description available.
4

Strojové učení ve strategických hrách / Machine Learning in Strategic Games

Vlček, Michael January 2018 (has links)
Machine learning is spearheading progress for the field of artificial intelligence in terms of providing competition in strategy games to a human opponent, be it in a game of chess, Go or poker. A field of machine learning, which shows the most promising results in playing strategy games, is reinforcement learning. The next milestone for the current research lies in a computer game Starcraft II, which outgrows the previous ones in terms of complexity, and represents a potential new breakthrough in this field. The paper focuses on analysis of the problem, and suggests a solution incorporating a reinforcement learning algorithm A2C and hyperparameter optimization implementation PBT, which could mean a step forward for the current progress.

Page generated in 0.0218 seconds