• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

An Evaluation of the Unity Machine Learning Agents Toolkit in Dense and Sparse Reward Video Game Environments

Hanski, Jari, Biçak, Kaan Baris January 2021 (has links)
In computer games, one use case for artificial intelligence is used to create interesting problems for the player. To do this new techniques such as reinforcement learning allows game developers to create artificial intelligence agents with human-like or superhuman abilities. The Unity ML-agents toolkit is a plugin that provides game developers with access to reinforcement algorithms without expertise in machine learning. In this paper, we compare reinforcement learning methods and provide empirical training data from two different environments. First, we describe the chosen reinforcement methods and then explain the design of both training environments. We compared the benefits in both dense and sparse rewards environments. The reinforcement learning methods were evaluated by comparing the training speed and cumulative rewards of the agents. The goal was to evaluate how much the combination of extrinsic and intrinsic rewards accelerated the training process in the sparse rewards environment. We hope this study helps game developers utilize reinforcement learning more effectively, saving time during the training process by choosing the most fitting training method for their video game environment. The results show that when training reinforcement agents in sparse rewards environments the agents trained faster with the combination of extrinsic and intrinsic rewards. And when training an agent in a sparse reward environment with only extrinsic rewards the agent failed to learn to complete the task.
2

Asynchronous Advantage Actor-Critic and Flappy Bird

Wibrink, Marcus, Fredriksson, Markus January 2021 (has links)
Games provide ideal environments for assessingreinforcement learning algorithms because of their simple dynamicsand their inexpensive testing, compared to real-worldenvironments. Asynchronous Advantage Actor-Critic (A3C), developedby DeepMind, has shown significant improvements inperformance over other state-of-the-art algorithms on Atarigames. Additionally, the algorithm A3C(lambda) which is ageneralization of A3C, has previously been shown to furtherimprove upon A3C in these environments. In this work, weimplement A3C and A3C(lambda) on the environment Cart-Poleand Flappy Bird and evaluate their performance via simulation.The simulations show that A3C effectively masters the Cart-Poleenvironment, as expected. In Flappy Bird sparse rewards arepresent, and the simulations reveal that despite this A3C managesto overcome this challenge the majority of times, achievinga linear increase in learning. Further simulations were madeon Flappy Bird with the inclusion of an entropy term andwith A3C(lambda), which display no signs of improvement inperformance when compared to regular A3C. / Spel utgör ideella miljöer för att bedöma reinforcement learning algoritmer på grund av deras enkla dynamik och billiga testning jämfört med verkliga miljöer. Asynchronous advantage actor-critic (A3C) utvecklad av DeepMind har visat betydande förbättringar på Atari spel jämfört med andra etablerade RL-algoritmer. Vidare har algoritmen A3C(lambda), som är en generalisering av A3C, tidigare visats ge ännu bättre resultat för dessa spel. I denna studie implementerar vi A3C och A3C(lambda) på miljöerna Cart-Pole och Flappy Bird och utvärderar algoritmerna via simulering. Simuleringarna visar att A3C på kort tid bemästrar Cart-Pole, som väntat. I Flappy Bird är användbar information glest fördelad och belöningen har ett lokalt optimum vilket leder till att algoritmen riskerar att fastna. Trots detta visar simuleringarna att A3C lyckas ta sig förbi det lokala optimat majoriteten av försöken och förbättrar sin belöning linjärt därefter. Ytterligare simuleringar gjordes på Flappy Bird genom att inkludera en entropiterm och med A3C(lambda). Metoderna visade någon märkbar förbättring jämfört med vanlig A3C. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm

Page generated in 0.3178 seconds