• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • Tagged with
  • 14
  • 14
  • 14
  • 14
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Random Access Control In Massive Cellular Internet of Things: A Multi-Agent Reinforcement Learning Approach

Bai, Jianan 14 January 2021 (has links)
Internet of things (IoT) is envisioned as a promising paradigm to interconnect enormous wireless devices. However, the success of IoT is challenged by the difficulty of access management of the massive amount of sporadic and unpredictable user traffics. This thesis focuses on the contention-based random access in massive cellular IoT systems and introduces two novel frameworks to provide enhanced scalability, real-time quality of service management, and resource efficiency. First, a local communication based congestion control framework is introduced to distribute the random access attempts evenly over time under bursty traffic. Second, a multi-agent reinforcement learning based preamble selection framework is designed to increase the access capacity under a fixed number of preambles. Combining the two mechanisms provides superior performance under various 3GPP-specified machine type communication evaluation scenarios in terms of achieving much lower access latency and fewer access failures. / Master of Science / In the age of internet of things (IoT), massive amount of devices are expected to be connected to the wireless networks in a sporadic and unpredictable manner. The wireless connection is usually established by contention-based random access, a four-step handshaking process initiated by a device through sending a randomly selected preamble sequence to the base station. While different preambles are orthogonal, preamble collision happens when two or more devices send the same preamble to a base station simultaneously, and a device experiences access failure if the transmitted preamble cannot be successfully received and decoded. A failed device needs to wait for another random access opportunity to restart the aforementioned process and hence the access delay and resource consumption are increased. The random access control in massive IoT systems is challenged by the increased access intensity, which results in higher collision probability. In this work, we aim to provide better scalability, real-time quality of service management, and resource efficiency in random access control for such systems. Towards this end, we introduce 1) a local communication based congestion control framework by enabling a device to cooperate with neighboring devices and 2) a multi-agent reinforcement learning (MARL) based preamble selection framework by leveraging the ability of MARL in forming the decision-making policy through the collected experience. The introduced frameworks are evaluated under the 3GPP-specified scenarios and shown to outperform the existing standard solutions in terms of achieving lower access delays with fewer access failures.
2

Towards a Deep Reinforcement Learning based approach for real-time decision making and resource allocation for Prognostics and Health Management applications

Ludeke, Ricardo Pedro João January 2020 (has links)
Industrial operational environments are stochastic and can have complex system dynamics which introduce multiple levels of uncertainty. This uncertainty leads to sub-optimal decision making and resource allocation. Digitalisation and automation of production equipment and the maintenance environment enable predictive maintenance, meaning that equipment can be stopped for maintenance at the optimal time. Resource constraints in maintenance capacity could however result in further undesired downtime if maintenance cannot be performed when scheduled. In this dissertation the applicability of using a Multi-Agent Deep Reinforcement Learning based approach for decision making is investigated to determine the optimal maintenance scheduling policy in a fleet of assets where there are maintenance resource constraints. By considering the underlying system dynamics of maintenance capacity, as well as the health state of individual assets, a near-optimal decision making policy is found that increases equipment availability while also maximising maintenance capacity. The implemented solution is compared to a run-to-failure corrective maintenance strategy, a constant interval preventive maintenance strategy and a condition based predictive maintenance strategy. The proposed approach outperformed traditional maintenance strategies across several asset and operational maintenance performance metrics. It is concluded that Deep Reinforcement Learning based decision making for asset health management and resource allocation is more effective than human based decision making. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2020. / Mechanical and Aeronautical Engineering / MEng (Mechanical Engineering) / Unrestricted
3

Non-Reciprocating Sharing Methods in Cooperative Q-Learning Environments

Cunningham, Bryan 28 August 2012 (has links)
Past research on multi-agent simulation with cooperative reinforcement learning (RL) for homogeneous agents focuses on developing sharing strategies that are adopted and used by all agents in the environment. These sharing strategies are considered to be reciprocating because all participating agents have a predefined agreement regarding what type of information is shared, when it is shared, and how the participating agent's policies are subsequently updated. The sharing strategies are specifically designed around manipulating this shared information to improve learning performance. This thesis targets situations where the assumption of a single sharing strategy that is employed by all agents is not valid. This work seeks to address how agents with no predetermined sharing partners can exploit groups of cooperatively learning agents to improve learning performance when compared to Independent learning. Specifically, several intra-agent methods are proposed that do not assume a reciprocating sharing relationship and leverage the pre-existing agent interface associated with Q-Learning to expedite learning. The other agents' functions and their sharing strategies are unknown and inaccessible from the point of view of the agent(s) using the proposed methods. The proposed methods are evaluated on physically embodied agents in the multi-agent cooperative robotics field learning a navigation task via simulation. The experiments conducted focus on the effects of the following factors on the performance of the proposed non-reciprocating methods: scaling the number of agents in the environment, limiting the communication range of the agents, and scaling the size of the environment. / Master of Science
4

Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent Systems

Trang, Matthew Luu 06 January 2023 (has links)
Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks.
5

Adaptive manufacturing: dynamic resource allocation using multi-agent reinforcement learning

Heik, David, Bahrpeyma, Fouad, Reichelt, Dirk 13 February 2024 (has links)
The global value creation networks have experienced increased volatility and dynamic behavior in recent years, resulting in an acceleration of a trend already evident in the shortening of product and technology cycles. In addition, the manufacturing industry is demonstrating a trend of allowing customers to make specific adjustments to their products at the time of ordering. Not only do these changes require a high level of flexibility and adaptability from the cyber-physical systems, but also from the employees and the supervisory production planning. As a result, the development of control and monitoring mechanisms becomes more complex. It is also necessary to adjust the production process dynamically if there are unforeseen events (disrupted supply chains, machine breakdowns, or absences of staff) in order to make the most effective and efficient use of the available production resources. In recent years, reinforcement learning (RL) research has gained increasing popularity in strategic planning as a result of its ability to handle uncertainty in dynamic environments in real time. RL has been extended to include multiple agents cooperating on complex tasks as a solution to complex problems. Despite its potential, the real-world application of multi-agent reinforcement learning (MARL) to manufacturing problems, such as flexible job-shop scheduling, has been less frequently approached. The main reason for this is most of the applications in this field are frequently subject to specific requirements as well as confidentiality obligations. Due to this, it is difficult for the research community to obtain access to them, which presents substantial challenges for the implementation of these tools. ...
6

Multi Agent Reinforcement Learning for Game Theory : Financial Graphs / Multi-agent förstärkning lärande för spelteori : Ekonomiska grafer

Yu, Bryan January 2021 (has links)
We present the rich research potential at the union of multi agent reinforcement learning (MARL), game theory, and financial graphs. We demonstrate how multiple game theoretic scenarios arise in three node financial graphs with minor modifications. We highlight six scenarios used in this study. We discuss how to setup an environment for MARL training and evaluation. We first investigate individual games and demonstrate that MARL agents consistently learn Nash Equilibrium strategies. We next investigate mixed games and find again that MARL agents learn Nash Equilibrium strategies given sufficient information and incentive (e.g. prosociality). We find introducing a embedding layer in agents deep network improves learned representations and as such, learned strategies, (2) MARL agents can learn a variety of complex strategies, and (3) selfishness improves strategies’ fairness and efficiency. Next we introduce populations and find that (1) pro social members in a population influences the action profile and that (2) complex strategies present in individual scenarios no longer emerge as populations’ portfolio of strategies converge to a main diagonal. We identify two challenges that arises in populations; namely (1) identifying partner’s prosociality and (2) identifying partner’s identity. We study three information settings which supplement agents observation set and find having knowledge of partners prosociality or identity to have negligible impact on how portfolio of strategies converges. / Vi presenterar den rika forskningspotentialen vid unionen av multi-agent förstärkningslärning (MARL), spelteori och finansiella grafer. Vi demonstrerar hur flera spelteoretiska scenarier uppstår i tre nodgrafikgrafer med mindre ändringar. Vi belyser sex scenarier som används i denna studie. Vi diskuterar hur man skapar en miljö för MARL -utbildning och utvärdering. Vi undersöker först enskilda spel och visar att MARL -agenter konsekvent lär sig Nash Equilibrium -strategier. Vi undersöker sedan blandade spel och finner igen att MARL -agenter lär sig Nash Equilibrium -strategier med tillräcklig information och incitament (t.ex. prosocialitet). Vi finner att införandet av ett inbäddande lager i agenternas djupa nätverk förbättrar inlärda representationer och som sådan inlärda strategier, (2) MARL-agenter kan lära sig en mängd komplexa strategier och (3) själviskhet förbättrar strategiernas rättvisa och effektivitet. Därefter introducerar vi populationer och upptäcker att (1) pro sociala medlemmar i en befolkning påverkar åtgärdsprofilen och att (2) komplexa strategier som finns i enskilda scenarier inte längre framkommer när befolkningens portfölj av strategier konvergerar till en huvuddiagonal. Vi identifierar två utmaningar som uppstår i befolkningen; nämligen (1) identifiera partnerns prosocialitet och (2) identifiera partnerns identitet. Vi studerar tre informationsinställningar som kompletterar agents observationsuppsättning och finner att kunskap om partners prosocialitet eller identitet har en försumbar inverkan på hur portföljen av strategier konvergerar.
7

Measuring and Influencing Sequential Joint Agent Behaviours

Raffensperger, Peter Abraham January 2013 (has links)
Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of turn-taking in simulation results and we calculate the quantity of turn-taking that could be observed between independent random agents. We demonstrate our turn-taking metric by reinterpreting previous work on turn-taking in emergent communication and by analysing a recorded human conversation. Given a metric, we can explore the space of reward functions and identify those reward functions that result in global success in groups of learning agents. We describe 'medium access games' as a model for human and machine communication and we present simulation results for an extensive range of reward functions for pairs of Q-learning agents. We use the Nash equilibria of medium access games to develop predictors for determining which reward functions result in turn-taking. Having demonstrated the predictive power of Nash equilibria for turn-taking in medium access games, we focus on synthesis of reward functions for stochastic games that result in arbitrary desirable Nash equilibria. Our method constructs a reward function such that a particular joint behaviour is the unique Nash equilibrium of a stochastic game, provided that such a reward function exists. This method builds on techniques for designing rewards for Markov decision processes and for normal form games. We explain our reward design methods in detail and formally prove that they are correct.
8

Emerging communication between competitive agents

Noukhovitch, Mikhail 12 1900 (has links)
Nous utilisons l’apprentissage automatique pour répondre à une question fondamentale: comment les individus peuvent apprendre à communiquer pour partager de l'information et se coordonner même en présence de conflits? Cette th\`ese essaie de corriger l'idée qui prévaut à l'heure actuelle dans la communauté de l'apprentissage profond que les agents compétitifs ne peuvent pas apprendre à communiquer efficacement. Dans ce travail de recherche, nous étudions l’émergence de la communication dans les jeux coopératifs-compétitifs à travers un jeu expéditeur-receveur que nous construisons. Nous portons aussi une attention particulière à la qualité de notre évaluation. Nous observons que les agents peuvent en effet apprendre à communiquer, confirmant des résultats connus dans les domaines des sciences économiques. Nous trouvons également trois façons d'améliorer le protocole de communication appris. Premierement, l'efficacité de la communication est proportionnelle au niveau de coopération entre les agents, les agents apprennent à communiquer plus facilement quand le jeu est plus coopératif que compétitif. Ensuite, LOLA (Foerster et al, 2018) peut améliorer la stabilité de l'entraînement et l'efficacité de la communication, principalement dans les jeux compétitifs. Et enfin, que les protocoles de communication discrets sont plus adaptés à l'apprentissage d'un protocole de communication juste et coopératif que les protocoles de communication continus. Le chapitre 1 présente une introduction aux techniques d'apprentissage utilisées par les agents, l'apprentissage automatique et l'apprentissage par renforcement, ainsi qu'une description des méthodes d'apprentissage par renforcement propre aux systemes multi-agents. Nous présentons ensuite un historique de l'émergence du language dans d'autres domaines tels que la biologie, la théorie des jeux évolutionnaires, et les sciences économiques. Le chapitre 2 approndit le sujet de l'émergence de la communication entre agents compétitifs. Le chapitre 3 présente les conclusions de notre travail et expose les enjeux et défis de l'apprentissage de la communication dans un environment compétitif. / We investigate the fundamental question of how agents in competition learn communication protocols in order to share information and coordinate with each other. This work aims to overturn current literature in machine learning which holds that unaligned, self-interested agents do not learn to communicate effectively. To study emergent communication for the spectrum of cooperative-competitive games, we introduce a carefully constructed sender-receiver game and put special care into evaluation. We find that communication can indeed emerge in partially-competitive scenarios, and we discover three things that are tied to improving it. First, that selfish communication is proportional to cooperation, and it naturally occurs for situations that are more cooperative than competitive. Second, that stability and performance are improved by using LOLA (Foerster et al, 2018), a higher order ``theory-of-mind'' learning algorith, especially in more competitive scenarios. And third, that discrete protocols lend themselves better to learning fair, cooperative communication than continuous ones. Chapter 1 provides an introduction to the underlying learning techniques of the agents, Machine Learning and Reinforcement Learning, and provides an overview of approaches to Multi-Agent Reinforcement Learning for different types of games. It then gives a background on language emergence by motivating this study and examining the history of techniques and results across Biology, Evolutionary Game Theory, and Economics. Chapter 2 delves into the work on language emergence between selfish, competitive agents. Chapter 3 draws conclusion from the work and points out the intrigue and challenge of learning communication in a competitive setting, setting the stage for future work.
9

Continuous coordination as a realistic scenario for lifelong learning

Badrinaaraayanan, Akilesh 04 1900 (has links)
Les algorithmes actuels d'apprentissage profond par renforcement (RL) sont encore très spécifiques à leur tâche et n'ont pas la capacité de généraliser à de nouveaux environnements. L'apprentissage tout au long de la vie (LLL), cependant, vise à résoudre plusieurs tâches de manière séquentielle en transférant et en utilisant efficacement les connaissances entre les tâches. Malgré un regain d'intérêt pour le RL tout au long de la vie ces dernières années, l'absence d'un banc de test réaliste rend difficile une évaluation robuste des algorithmes d'apprentissage tout au long de la vie. Le RL multi-agents (MARL), d'autre part, peut être considérée comme un scénario naturel pour le RL tout au long de la vie en raison de sa non-stationnarité inhérente, puisque les politiques des agents changent avec le temps. Dans cette thèse, nous présentons un banc de test multi-agents d'apprentissage tout au long de la vie qui prend en charge un paramétrage à la fois zéro et quelques-coups. Notre configuration est basée sur Hanabi - un jeu multi-agents partiellement observable et entièrement coopératif qui s'est avéré difficile pour la coordination zéro coup. Son vaste espace stratégique en fait un environnement souhaitable pour les tâches RL tout au long de la vie. Nous évaluons plusieurs méthodes MARL récentes et comparons des algorithmes d'apprentissage tout au long de la vie de pointe dans des régimes de mémoire et de calcul limités pour faire la lumière sur leurs forces et leurs faiblesses. Ce paradigme d'apprentissage continu nous fournit également une manière pragmatique d'aller au-delà de la formation centralisée qui est le protocole de formation le plus couramment utilisé dans MARL. Nous montrons empiriquement que les agents entraînés dans notre environnement sont capables de bien se coordonner avec des agents inconnus, sans aucune hypothèse supplémentaire faite par des travaux précédents. Mots-clés: le RL multi-agents, l'apprentissage tout au long de la vie. / Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of lifelong learning algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this thesis, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi --- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art lifelong learning algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unknown agents, without any additional assumptions made by previous works. Key words: multi-agent reinforcement learning, lifelong learning.
10

Agent abstraction in multi-agent reinforcement learning

Memarian, Amin 06 1900 (has links)
Cette thèse est organisée en deux chapitres. Le premier chapitre sert d’introduction aux concepts et idées utilisés dans le deuxième chapitre (l’article). Le premier chapitre est divisé en trois sections. Dans la première section, nous introduisons l’apprentissage par renforcement en tant que paradigme d’apprentissage automatique et montrons comment ses problèmes sont formalisés à l’aide de processus décisionnels de Markov. Nous formalisons les buts sous forme de rendements attendus et montrons comment les équations de Bellman utilisent la formulation récursive du rendement pour établir une relation entre les valeurs de deux états successifs sous la politique de l’agent. Après cela, nous soutenons que la résolution des équations d’optimalité de Bellman est insoluble et introduisons des algorithmes basés sur des valeurs tels que la programmation dynamique, les méthodes de Monte Carlo et les méthodes de différence temporelle qui se rapprochent de la solution optimale à l’aide de l’itération de politique généralisée. L’approximation de fonctions est ensuite proposée comme moyen de traiter les grands espaces d’états. Nous discutons également de la manière dont les méthodes basées sur les politiques optimisent directement la politique sans optimiser la fonction de valeur. Dans la deuxième section, nous introduisons les jeux de Markov comme une extension des processus décisionnels de Markov pour plusieurs agents. Nous couvrons les différents cadres formés par les différentes structures de récompense et donnons les dilemmes sociaux séquentiels comme exemple du cadre d’incitation mixte. En fin de compte, nous introduisons différentes structures d’information telles que l’apprentissage centralisé qui peuvent aider à faire face à la non-stationnarité in- duite par l’adversaire. Enfin, dans la troisième section, nous donnons un bref aperçu des types d’abstraction d’état et introduisons les métriques de bisimulation comme un concept inspiré de l’abstraction de non-pertinence du modèle qui mesure la similarité entre les états. Dans le deuxième chapitre (l’article), nous approfondissons finalement l’abstraction d’agent en tant que métrique de bisimulation et dérivons un facteur de compression que nous pouvons appliquer à la diplomatie pour révéler l’agence supérieure sur les unités de joueur. / This thesis is organized into two chapters. The first chapter serves as an introduction to the concepts and ideas used in the second chapter (the article). The first chapter is divided into three sections. In the first section, we introduce Reinforcement Learning as a Machine Learning paradigm and show how its problems are formalized using Markov Decision Processes. We formalize goals as expected returns and show how the Bellman equations use the recursive formulation of return to establish a relation between the values of two successive states under the agent’s policy. After that, we argue that solving the Bellman optimality equations is intractable and introduce value-based algorithms such as Dynamic Programming, Monte Carlo methods, and Temporal Difference methods that approximate the optimal solution using Generalized Policy Iteration. Function approximation is then proposed as a way of dealing with large state spaces. We also discuss how policy-based methods optimize the policy directly without optimizing the value function. In the second section, we introduce Markov Games as an extension of Markov Decision Processes for multiple agents. We cover the different settings formed by the different reward structures and give Sequential Social Dilemmas as an example of the mixed-incentive setting. In the end, we introduce different information structures such as centralized learning that can help deal with the opponent-induced non-stationarity. Finally, in the third section, we give a brief overview of state abstraction types and introduce bisimulation metrics as a concept inspired by model-irrelevance abstraction that measures the similarity between states. In the second chapter (the article), we ultimately delve into agent abstraction as a bisimulation metric and derive a compression factor that we can apply to Diplomacy to reveal the higher agency over the player units.

Page generated in 0.1468 seconds