• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 87
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 120
  • 120
  • 120
  • 36
  • 24
  • 22
  • 22
  • 21
  • 20
  • 20
  • 19
  • 19
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Simulated Fixed-Wing Aircraft Attitude Control using Reinforcement Learning Methods

David Jona Richter (11820452) 20 December 2021 (has links)
<div>Autonomous transportation is a research field that has gained huge interest in recent years, with autonomous electric or hydrogen cars coming ever closer to seeing everyday use. Not just cars are subject to autonomous research though, the field of aviation is also being explored for fully autonomous flight. One very important aspect for making autonomous flight a reality is attitude control, the control of roll, pitch, and sometimes yaw. Traditional approaches for automated attitude control use PID (proportional-integral-derivative) controllers, which use hand-tuned parameters to fulfill the task. In this work, however, the use of Reinforcement Learning algorithms for attitude control will be explored. With the surge of more and more powerful artificial neural networks, which have proven to be universally usable function approximators, Deep Reinforcement Learning also becomes an intriguing option. </div><div>A software toolkit will be developed and used to allow for the use of multiple flight simulators to train agents with Reinforcement Learning as well as Deep Reinforcement Learning. Experiments will be run using different hyperparamters, algorithms, state representations, and reward functions to explore possible options for autonomous attitude control using Reinforcement Learning.</div>
72

Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design

Kočí, Jakub January 2020 (has links)
This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
73

Optimizing Power Consumption, Resource Utilization, and Performance for Manycore Architectures using Reinforcement Learning

Fettes, Quintin 23 May 2022 (has links)
No description available.
74

Research on Dynamic Offloading Strategy of Satellite Edge Computing Based on Deep Reinforcement Learning

Geng, Rui January 2021 (has links)
Nowadays more and more data is generated at the edge of the network, and people are beginning to consider decentralizing computing tasks to the edge of the network. The network architecture of edge computing is different from the traditional network architecture. Its distributed configuration can make up for some shortcomings of traditional networks, such as data congestion, increased delay, and limited capacity. With the continuous development of 5G technology, satellite communication networks are also facing many new business challenges. By using idle computing power and storage space on satellites and integrating edge computing technology into satellite communication networks, it will greatly improve satellite communication service quality, and enhance satellite task processing capabilities, thereby improving the satellite edge computing system performance. The primary problem that limits the computing performance of satellite edge networks is how to obtain a more effective dynamic service offloading strategy. To study this problem, this thesis monitors the status information satellite nodes in different periods, such as service load and distance to the ground, uses the Markov decision process to model the dynamic offloading problem of the satellite edge computing system, and finally obtains the service offloading strategies. The deployment plan is based on deep reinforcement learning algorithms. We mainly study the performance of the Deep Q-Network (DQN) algorithm and two improved DQN algorithms Double DQN (DDQN) and Dueling DQN (DuDQN) in different service request types and different system scenarios. Compared with existing service deployment algorithms, deep reinforcement learning algorithms take into account the long-term service quality of the system and form more reasonable offloading strategies. / Med den snabba utvecklingen av mobil kommunikationsteknik genereras mer och mer data i utkanten av nätverket, och människor börjar överväga att decentralisera datoruppgifter till kanten av nätverket. Och byggde ett komplett mobilt edge computing -arkitektursystem. Edge -dators nätverksarkitektur skiljer sig från den traditionella nätverksarkitekturen. Dess distribuerade konfiguration kan kompensera för eventuella brister i traditionella nätverk, såsom överbelastning av data, ökad fördröjning och begränsad kapacitet. Med den ständiga utvecklingen av 5G -teknik står satellitkommunikationsnät också inför många nya affärsutmaningar. Genom att använda inaktiv datorkraft och lagringsutrymme på satelliter och integrera edge computing -teknik i satellitkommunikationsnät kommer det att förkorta servicetiden för traditionella mobila satelliter kraftigt, förbättra satellitkommunikationstjänstkvaliteten och förbättra satellituppgiftsbehandlingsförmågan och därigenom förbättra satelliten edge computing systemprestanda. Det primära problemet som begränsar datorprestanda för satellitkantnät är hur man får en mer effektiv dynamisk tjänstavlastningsstrategi. Detta papper övervakar servicebelastningen av satellitnoder i olika perioder, markpositionsinformation och annan statusinformation använder Markov - beslutsprocessen för att modellera den dynamiska distributionen av satellitkantstjänster och får slutligen en uppsättning tjänstedynamik baserad på modell och design . Distributionsplanen är baserad på en djupt förbättrad algoritm för dynamisk distribution av tjänster. Det här dokumentet studerar huvudsakligen prestandan för DQN -algoritmen och två förbättrade DQN - algoritmer Double DQN och Dueling DQN i olika serviceförfrågningstyper och olika systemscenarier. Jämfört med befintliga algoritmer för serviceutplacering är prestandan för algoritmer för djupförstärkning något bättre.
75

Automatic game-testing with personality : Multi-task reinforcement learning for automatic game-testing / Automatisk speltestning med personlighet : Multi-task förstärkning lärande för automatisk speltestning

Canal Anton, Oleguer January 2021 (has links)
This work presents a scalable solution to automate game-testing. Traditionally, game-testing has been performed by either human players or scripted Artificial Intelligence (AI) agents. While the first produces the most reliable results, the process of organizing testing sessions is time consuming. On the other hand, scripted AI dramatically speeds up the process, however, the insights it provides are far less useful: these agents’ behaviors are highly predictable. The presented solution takes the best of both worlds: the automation of scripted AI, and the richness of human testing by framing the problem within the Deep Reinforcement Learning (DRL) paradigm. Reinforcement Learning (RL) agents are trained to adapt to any unseen level and present customizable human personality traits: such as aggressiveness, greed, fear, etc. This is achieved exploring the problem from a multi-task RL setting. Each personality trait is understood as a different task which can be linearly combined by the proposed algorithm. Furthermore, since Artificial Neural Networks (ANNs) have been used to model the agent’s policies, the solution is highly adaptable and scalable. This thesis reviews the state of the art in both automatic game-testing and RL, and proposes a solution to the above-mentioned problem. Finally, promising results are obtained evaluating the solution on two different environments: a simple environment used to quantify the quality of the designed algorithm, and a generic game environment useful to show-case its applicability. In particular, results show that the designed agent is able to perform good on game levels never seen before. In addition, the agent can display any convex combination of the trained behaviors. Furthermore, its performance is as good as if it had been specifically trained on that particular combination. / Detta arbete presenterar en skalbar lösning för att automatisera speltestning. Traditionellt har speltestning utförts av antingen mänskliga spelare eller förprogrammerade agenter. Även om det förstanämnda ger de mest tillförlitliga resultaten är processen tidskrävande. Å andra sidan påskyndar förprogrammerade agenter processen dramatiskt, men de insikter som de ger är mycket mindre användbara: dessa agenters beteenden är mycket förutsägbara. Den presenterade lösningen använder det bästa av två världar: automatiseringsmöjligheten från förprogrammerade agenter samt möjligheten att simulera djupet av mänskliga tester genom att inrama problemet inom paradigmet Djup Förstärkningsinlärning. En agent baserad på förstärkningsinlärning tränas i att anpassa sig till tidigare osedda spelmiljöer och presenterar anpassningsbara mänskliga personlighetsdrag: som aggressivitet, girighet, rädsla... Eftersom Artificiella Neurala Nätverk (ANNs) har använts för att modellera agentens policyer är lösningen potentiellt mycket anpassnings- och skalbar. Denna rapport granskar först den senaste forskningen inom både automatisk speltestning och förstärkningsinlärning. Senare presenteras en lösning för ovan nämnda problem. Slutligen evalueras lösningen i två olika miljöer med lovande resultat. Den första miljön används för att kvantifiera kvaliteten på den designade algoritmen. Den andra är en generisk spelmiljö som är användbar för att påvisa lösningens tillämplighet.
76

Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning

Zachery Peter Berg (12455928) 25 April 2022 (has links)
<p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p>
77

A comparison of genetic algorithm and reinforcement learning for autonomous driving / En jämförelse mellan genetisk algoritm och förstärkningslärande för självkörande bilar

Xiang, Ziyi January 2019 (has links)
This paper compares two different methods, reinforcement learning and genetic algorithm for designing autonomous cars’ control system in a dynamic environment. The research problem could be formulated as such: How is the learning efficiency compared between reinforcement learning and genetic algorithm on autonomous navigation through a dynamic environment? In conclusion, the genetic algorithm outperforms the reinforcement learning on mean learning time, despite the fact that the prior shows a large variance, i.e. genetic algorithm provide a better learning efficiency. / I det här papperet jämförs två olika metoder, förstärkningsinlärning och genetisk algoritm för att designa autonoma bilar styrsystem i en dynamisk miljö. Forskningsproblemet kan formuleras som: Hur är inlärningseffektiviteten jämför mellan förstärkningsinlärning och genetisk algoritm på autonom navigering i en dynamisk miljö? Sammanfattningsvis, den genetisk algoritm överträffar förstärkningsinlärning på genomsnittlig inlärningstid, trots att den tidigare visar en stor varians, dvs genetisk algoritm, ger en bättre inlärningseffektivitet.
78

Building the Intelligent IoT-Edge: Balancing Security and Functionality using Deep Reinforcement Learning

Anand A Mudgerikar (11791094) 19 December 2021 (has links)
<div>The exponential growth of Internet of Things (IoT) and cyber-physical systems is resulting in complex environments comprising of various devices interacting with each other and with users. In addition, the rapid advances in Artificial Intelligence are making those devices able to autonomously modify their behaviors through the use of techniques such as reinforcement learning (RL). There is thus the need for an intelligent monitoring system on the network edge with a global view of the environment to autonomously predict optimal device actions. However, it is clear however that ensuring safety and security in such environments is critical. To this effect, we develop a constrained RL framework for IoT environments that determines optimal devices actions with respect to user-defined goals or required functionalities using deep Q learning. We use anomaly based intrusion detection on the network edge to dynamically generate security and safety policies to constrain the RL agent in the framework. We analyze the balance required between ‘safety/security’ and ‘functionality’ in IoT environments by manipulating the exploration of safe and unsafe benefit state spaces in the RL framework. We instantiate the framework for testing on application layer control in smart home environments, and network layer control including network functionalities like rate control and routing, for SDN based environments.</div>
79

Learning and planning with noise in optimization and reinforcement learning

Thomas, Valentin 06 1900 (has links)
La plupart des algorithmes modernes d'apprentissage automatique intègrent un certain degré d'aléatoire dans leurs processus, que nous appellerons le bruit, qui peut finalement avoir un impact sur les prédictions du modèle. Dans cette thèse, nous examinons de plus près l'apprentissage et la planification en présence de bruit pour les algorithmes d'apprentissage par renforcement et d'optimisation. Les deux premiers articles présentés dans ce document se concentrent sur l'apprentissage par renforcement dans un environnement inconnu, et plus précisément sur la façon dont nous pouvons concevoir des algorithmes qui utilisent la stochasticité de leur politique et de l'environnement à leur avantage. Notre première contribution présentée dans ce document se concentre sur le cadre de l'apprentissage par renforcement non supervisé. Nous montrons comment un agent laissé seul dans un monde inconnu sans but précis peut apprendre quels aspects de l'environnement il peut contrôler indépendamment les uns des autres, ainsi qu'apprendre conjointement une représentation latente démêlée de ces aspects que nous appellerons \emph{facteurs de variation}. La deuxième contribution se concentre sur la planification dans les tâches de contrôle continu. En présentant l'apprentissage par renforcement comme un problème d'inférence, nous empruntons des outils provenant de la littérature sur les m\'thodes de Monte Carlo séquentiel pour concevoir un algorithme efficace et théoriquement motiv\'{e} pour la planification probabiliste en utilisant un modèle appris du monde. Nous montrons comment l'agent peut tirer parti de note objectif probabiliste pour imaginer divers ensembles de solutions. Les deux contributions suivantes analysent l'impact du bruit de gradient dû à l'échantillonnage dans les algorithmes d'optimisation. La troisième contribution examine le rôle du bruit de l'estimateur du gradient dans l'estimation par maximum de vraisemblance avec descente de gradient stochastique, en explorant la relation entre la structure du bruit du gradient et la courbure locale sur la généralisation et la vitesse de convergence du modèle. Notre quatrième contribution revient sur le sujet de l'apprentissage par renforcement pour analyser l'impact du bruit d'échantillonnage sur l'algorithme d'optimisation de la politique par ascension du gradient. Nous constatons que le bruit d'échantillonnage peut avoir un impact significatif sur la dynamique d'optimisation et les politiques découvertes en apprentissage par renforcement. / Most modern machine learning algorithms incorporate a degree of randomness in their processes, which we will refer to as noise, which can ultimately impact the model's predictions. In this thesis, we take a closer look at learning and planning in the presence of noise for reinforcement learning and optimization algorithms. The first two articles presented in this document focus on reinforcement learning in an unknown environment, specifically how we can design algorithms that use the stochasticity of their policy and of the environment to their advantage. Our first contribution presented in this document focuses on the unsupervised reinforcement learning setting. We show how an agent left alone in an unknown world without any specified goal can learn which aspects of the environment it can control independently from each other as well as jointly learning a disentangled latent representation of these aspects, or factors of variation. The second contribution focuses on planning in continuous control tasks. By framing reinforcement learning as an inference problem, we borrow tools from Sequential Monte Carlo literature to design a theoretically grounded and efficient algorithm for probabilistic planning using a learned model of the world. We show how the agent can leverage the uncertainty of the model to imagine a diverse set of solutions. The following two contributions analyze the impact of gradient noise due to sampling in optimization algorithms. The third contribution examines the role of gradient noise in maximum likelihood estimation with stochastic gradient descent, exploring the relationship between the structure of the gradient noise and local curvature on the generalization and convergence speed of the model. Our fourth contribution returns to the topic of reinforcement learning to analyze the impact of sampling noise on the policy gradient algorithm. We find that sampling noise can significantly impact the optimization dynamics and policies discovered in on-policy reinforcement learning.
80

AI for an Imperfect-Information Wargame with Self-Play Reinforcement Learning / AI med självspelande förstärkningsinlärning för ett krigsspel med imperfekt information

Ryblad, Filip January 2021 (has links)
The task of training AIs for imperfect-information games has long been difficult. However, recently the algorithm ReBeL, a general framework for self-play reinforcement learning, has been shown to excel at heads-up no-limit Texas hold 'em, among other imperfect-information games. In this report the ability to adapt ReBeL to a downscaled version of the strategy wargame \say{Game of the Generals} is explored. It is shown that an implementation of ReBeL that uses no domain-specific knowledge is able to beat all benchmark bots, which indicates that ReBeL can be a useful framework when training AIs for imperfect-information wargames. / Det har länge varit en utmaning att träna AI:n för spel med imperfekt information. Nyligen har dock algoritmen ReBeL, ett generellt ramverk för självspelande förstärkningsinlärning, visat lovande prestanda i heads-up no-limit Texas hold 'em och andra spel med imperfekt information. I denna rapport undersöks ReBeLs förmåga att anpassas till en nedskalad version av spelet \say{Game of the Generals}, vilket är ett strategiskt krigsspel. Det visas att en implementation av ReBeL som inte använder någon domänspecifik kunskap klarar av att besegra alla bottar som användes vid jämförelse, vilket indikerar att ReBeL kan vara ett användbart ramverk för att träna AI:n för krigsspel med imperfekt information.

Page generated in 0.1481 seconds