Spelling suggestions: "subject:"multiagent reinforcement learning"" "subject:"multitangent reinforcement learning""
1 |
Cooperative Perception for Connected VehiclesMehr, Goodarz 31 May 2024 (has links)
Doctor of Philosophy / Self-driving cars promise a future with safer roads and reduced traffic incidents and fatalities. This future hinges on the car's accurate understanding of its surrounding environment; however, the reliability of the algorithms that form this perception is not always guaranteed and adverse traffic and environmental conditions can significantly diminish the performance of these algorithms. To solve this problem, this research builds on the idea that enabling cars to share and exchange information via communication allows them to extend the range and quality of their perception beyond their capability. To that end, this research formulates a robust and flexible framework for cooperative perception, explores how connected vehicles can learn to collaborate to improve their perception, and introduces an affordable, experimental vehicle platform for connected autonomy research.
|
2 |
Random Access Control In Massive Cellular Internet of Things: A Multi-Agent Reinforcement Learning ApproachBai, Jianan 14 January 2021 (has links)
Internet of things (IoT) is envisioned as a promising paradigm to interconnect enormous
wireless devices. However, the success of IoT is challenged by the difficulty of access management
of the massive amount of sporadic and unpredictable user traffics. This thesis focuses
on the contention-based random access in massive cellular IoT systems and introduces two
novel frameworks to provide enhanced scalability, real-time quality of service management,
and resource efficiency. First, a local communication based congestion control framework
is introduced to distribute the random access attempts evenly over time under bursty traffic.
Second, a multi-agent reinforcement learning based preamble selection framework is
designed to increase the access capacity under a fixed number of preambles. Combining the
two mechanisms provides superior performance under various 3GPP-specified machine type
communication evaluation scenarios in terms of achieving much lower access latency and
fewer access failures. / Master of Science / In the age of internet of things (IoT), massive amount of devices are expected to be connected
to the wireless networks in a sporadic and unpredictable manner. The wireless connection
is usually established by contention-based random access, a four-step handshaking process
initiated by a device through sending a randomly selected preamble sequence to the base
station. While different preambles are orthogonal, preamble collision happens when two
or more devices send the same preamble to a base station simultaneously, and a device
experiences access failure if the transmitted preamble cannot be successfully received and
decoded. A failed device needs to wait for another random access opportunity to restart the
aforementioned process and hence the access delay and resource consumption are increased.
The random access control in massive IoT systems is challenged by the increased access
intensity, which results in higher collision probability. In this work, we aim to provide better
scalability, real-time quality of service management, and resource efficiency in random access
control for such systems. Towards this end, we introduce 1) a local communication based
congestion control framework by enabling a device to cooperate with neighboring devices
and 2) a multi-agent reinforcement learning (MARL) based preamble selection framework by
leveraging the ability of MARL in forming the decision-making policy through the collected
experience. The introduced frameworks are evaluated under the 3GPP-specified scenarios
and shown to outperform the existing standard solutions in terms of achieving lower access
delays with fewer access failures.
|
3 |
Towards a Deep Reinforcement Learning based approach for real-time decision making and resource allocation for Prognostics and Health Management applicationsLudeke, Ricardo Pedro João January 2020 (has links)
Industrial operational environments are stochastic and can have complex system dynamics which introduce multiple levels of uncertainty. This uncertainty leads to sub-optimal decision making and resource allocation. Digitalisation and automation of production equipment and the maintenance environment enable predictive maintenance, meaning that equipment can be stopped for maintenance at the optimal time. Resource constraints in maintenance capacity could however result in further undesired downtime if maintenance cannot be performed when scheduled.
In this dissertation the applicability of using a Multi-Agent Deep Reinforcement Learning based approach for decision making is investigated to determine the optimal maintenance scheduling policy in a fleet of assets where there are maintenance resource constraints. By considering the underlying system dynamics of maintenance capacity, as well as the health state of individual assets, a near-optimal decision making policy is found that increases equipment availability while also maximising maintenance capacity.
The implemented solution is compared to a run-to-failure corrective maintenance strategy, a constant interval preventive maintenance strategy and a condition based predictive maintenance strategy. The proposed approach outperformed traditional maintenance strategies across several asset and operational maintenance performance metrics. It is concluded that Deep Reinforcement Learning based decision making for asset health management and resource allocation is more effective than human based decision making. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2020. / Mechanical and Aeronautical Engineering / MEng (Mechanical Engineering) / Unrestricted
|
4 |
Adaptive manufacturing: dynamic resource allocation using multi-agent reinforcement learningHeik, David, Bahrpeyma, Fouad, Reichelt, Dirk 13 February 2024 (has links)
The global value creation networks have experienced increased volatility and dynamic behavior in
recent years, resulting in an acceleration of a trend already evident in the shortening of product and
technology cycles. In addition, the manufacturing industry is demonstrating a trend of allowing customers
to make specific adjustments to their products at the time of ordering. Not only do these changes
require a high level of flexibility and adaptability from the cyber-physical systems, but also from the
employees and the supervisory production planning. As a result, the development of control and monitoring
mechanisms becomes more complex. It is also necessary to adjust the production process dynamically
if there are unforeseen events (disrupted supply chains, machine breakdowns, or absences
of staff) in order to make the most effective and efficient use of the available production resources.
In recent years, reinforcement learning (RL) research has gained increasing popularity in strategic
planning as a result of its ability to handle uncertainty in dynamic environments in real time. RL has
been extended to include multiple agents cooperating on complex tasks as a solution to complex problems.
Despite its potential, the real-world application of multi-agent reinforcement learning (MARL) to
manufacturing problems, such as flexible job-shop scheduling, has been less frequently approached.
The main reason for this is most of the applications in this field are frequently subject to specific requirements
as well as confidentiality obligations. Due to this, it is difficult for the research community
to obtain access to them, which presents substantial challenges for the implementation of these tools.
...
|
5 |
Decentralized Integration of Distributed Energy Resources into Energy Markets with Physical ConstraintsChen Feng (18556528) 29 May 2024 (has links)
<p dir="ltr">With the growing installation of distributed energy resources (DERs) at homes, more residential households are able to reduce the overall energy cost by storing unused energy in the storage battery when there is abundant renewable energy generation, and using the stored energy when there is insufficient renewable energy generation and high demand. It could be even more economical for the household if energy can be traded and shared among neighboring households. Despite the great economic benefit of DERs, they could also make it more challenging to ensure the stability of the grid due to the decentralization of agents' activities.</p><p><br></p><p dir="ltr">This thesis presents two approaches that combine market and control mechanisms to address these challenges. In the first work, we focus on the integration of DERs into local energy markets. We introduce a peer-to-peer (P2P) local energy market and propose a consensus multi-agent reinforcement learning (MARL) framework, which allows agents to develop strategies for trading and decentralized voltage control within the P2P market. It is compared to both the fully decentralized and centralized training & decentralized execution (CTDE) framework. Numerical results reveal that under each framework, the system is able to converge to a dynamic balance with the guarantee of system stability as each agent gradually learns the approximately optimal strategy. Theoretical results also prove the convergence of the consensus MARL algorithm under certain conditions.</p><p dir="ltr">In the second work, we introduce a mean-field game framework for the integration of DERs into wholesale energy markets. This framework helps DER owners automatically learn optimal decision policies in response to market price fluctuations and their own variable renewable energy outputs. We prove the existence of a mean-field equilibrium (MFE) for the wholesale energy market, and we develop a heuristic decentralized mean-field learning algorithm to converge to an MFE, taking into consideration the demand/supply shock and flexible demand. Our numerical experiments point to convergence to an MFE and show that our framework effectively reduces peak load and price fluctuations, especially during exogenous demand or supply shocks.</p>
|
6 |
Non-Reciprocating Sharing Methods in Cooperative Q-Learning EnvironmentsCunningham, Bryan 28 August 2012 (has links)
Past research on multi-agent simulation with cooperative reinforcement learning (RL) for homogeneous agents focuses on developing sharing strategies that are adopted and used by all agents in the environment. These sharing strategies are considered to be reciprocating because all participating agents have a predefined agreement regarding what type of information is shared, when it is shared, and how the participating agent's policies are subsequently updated. The sharing strategies are specifically designed around manipulating this shared information to improve learning performance. This thesis targets situations where the assumption of a single sharing strategy that is employed by all agents is not valid. This work seeks to address how agents with no predetermined sharing partners can exploit groups of cooperatively learning agents to improve learning performance when compared to Independent learning. Specifically, several intra-agent methods are proposed that do not assume a reciprocating sharing relationship and leverage the pre-existing agent interface associated with Q-Learning to expedite learning. The other agents' functions and their sharing strategies are unknown and inaccessible from the point of view of the agent(s) using the proposed methods. The proposed methods are evaluated on physically embodied agents in the multi-agent cooperative robotics field learning a navigation task via simulation. The experiments conducted focus on the effects of the following factors on the performance of the proposed non-reciprocating methods: scaling the number of agents in the environment, limiting the communication range of the agents, and scaling the size of the environment. / Master of Science
|
7 |
Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent SystemsTrang, Matthew Luu 06 January 2023 (has links)
Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks.
|
8 |
Communication approaches in Multi-Agent Reinforcement LearningNechai, Vladyslav 22 October 2024 (has links)
In decentralised multi-agent reinforcement learning communication can be used as a measure to increase coordination among the agents. At the same time, the essence of message exchange and its contribution to successful goal achievement can only be established with the domain knowledge of a given environment. This thesis focuses on understanding the impact of communication on a decentralised multi-agent system. To achieve this, communication is employed and studied in the context of Urban Air Mobility, in particular- to the vertiport terminal area control problem. A proposed in this work experimental framework, that promotes different information exchange protocols, allows to investigate if and how the agents leverage their communication capabilities. Acquired simulation results show that in the terminal area of a vertiport the aircrafts, controlled in a decentralised way, are capable of proper self-organisation, similar to the structured technique formulated in [Bertram and Wei(2020)]. A study of their communication mechanisms indicates that through different protocols the agents learn to signal their intent to enter a vertiport regardless of environment settings.
|
9 |
Multi Agent Reinforcement Learning for Game Theory : Financial Graphs / Multi-agent förstärkning lärande för spelteori : Ekonomiska graferYu, Bryan January 2021 (has links)
We present the rich research potential at the union of multi agent reinforcement learning (MARL), game theory, and financial graphs. We demonstrate how multiple game theoretic scenarios arise in three node financial graphs with minor modifications. We highlight six scenarios used in this study. We discuss how to setup an environment for MARL training and evaluation. We first investigate individual games and demonstrate that MARL agents consistently learn Nash Equilibrium strategies. We next investigate mixed games and find again that MARL agents learn Nash Equilibrium strategies given sufficient information and incentive (e.g. prosociality). We find introducing a embedding layer in agents deep network improves learned representations and as such, learned strategies, (2) MARL agents can learn a variety of complex strategies, and (3) selfishness improves strategies’ fairness and efficiency. Next we introduce populations and find that (1) pro social members in a population influences the action profile and that (2) complex strategies present in individual scenarios no longer emerge as populations’ portfolio of strategies converge to a main diagonal. We identify two challenges that arises in populations; namely (1) identifying partner’s prosociality and (2) identifying partner’s identity. We study three information settings which supplement agents observation set and find having knowledge of partners prosociality or identity to have negligible impact on how portfolio of strategies converges. / Vi presenterar den rika forskningspotentialen vid unionen av multi-agent förstärkningslärning (MARL), spelteori och finansiella grafer. Vi demonstrerar hur flera spelteoretiska scenarier uppstår i tre nodgrafikgrafer med mindre ändringar. Vi belyser sex scenarier som används i denna studie. Vi diskuterar hur man skapar en miljö för MARL -utbildning och utvärdering. Vi undersöker först enskilda spel och visar att MARL -agenter konsekvent lär sig Nash Equilibrium -strategier. Vi undersöker sedan blandade spel och finner igen att MARL -agenter lär sig Nash Equilibrium -strategier med tillräcklig information och incitament (t.ex. prosocialitet). Vi finner att införandet av ett inbäddande lager i agenternas djupa nätverk förbättrar inlärda representationer och som sådan inlärda strategier, (2) MARL-agenter kan lära sig en mängd komplexa strategier och (3) själviskhet förbättrar strategiernas rättvisa och effektivitet. Därefter introducerar vi populationer och upptäcker att (1) pro sociala medlemmar i en befolkning påverkar åtgärdsprofilen och att (2) komplexa strategier som finns i enskilda scenarier inte längre framkommer när befolkningens portfölj av strategier konvergerar till en huvuddiagonal. Vi identifierar två utmaningar som uppstår i befolkningen; nämligen (1) identifiera partnerns prosocialitet och (2) identifiera partnerns identitet. Vi studerar tre informationsinställningar som kompletterar agents observationsuppsättning och finner att kunskap om partners prosocialitet eller identitet har en försumbar inverkan på hur portföljen av strategier konvergerar.
|
10 |
Agent Contribution in Multi-Agent Reinforcement Learning : A Case Study in Remote Electrical TiltEmanuelsson, William January 2024 (has links)
As multi-agent reinforcement learning (MARL) continues to evolve and find applications in complex real-world systems, the imperative for explainability in these systems becomes increasingly critical. Central to enhancing this explainability is tackling the credit assignment problem, a key challenge in MARL that involves quantifying the individual contributions of agents toward a common goal. In addressing this challenge, this thesis introduces and explores the application of Local and Global Shapley Values (LSV and GSV) within MARL contexts. These novel adaptations of the traditional Shapley value from cooperative game theory are investigated particularly in the context of optimizing remote electrical tilt in telecommunications antennas. Using both predator-prey and remote electrical tilt environments, the study delves into local and global explanations, examining how the Shapley value can illuminate changes in agent contributions over time and across different states, as well as aggregate these insights over multiple episodes. The research findings demonstrate that the use of Shapley values enhances the understanding of individual agent behaviors, offers insights into policy suboptimalities and environmental nuances, and aids in identifying agent redundancies—a feature with potential applications in energy savings in real-world systems. Altogether, this thesis highlights the considerable potential of employing the Shapley value as a tool in explainable MARL. / I takt med utvecklingen och tillämpningen av multi-agent förstärkningsinlärning (MARL) i komplexa verkliga system, blir behovet av förklarbarhet i dessa system allt mer väsentligt. För att förbättra denna förklarbarhet är det viktigt att lösa problemet med belöningstilldelning, en nyckelutmaning i MARL som innefattar att kvantifiera de enskilda bidragen från agenter mot ett gemensamt mål. I denna uppsats introduceras och utforskas tillämpningen av lokala och globala Shapley-värden (LSV och GSV) inom MARL-sammanhang. Dessa nya anpassningar av det traditionella Shapley-värdet från samarbetsbaserad spelteori undersöks särskilt i sammanhanget av att optimera fjärrstyrda elektriska lutningar i telekommunikationsantenner. Genom att använda både rovdjur-byte och fjärrstyrda elektriska lutningsmiljöer fördjupar studien sig i lokala och globala förklaringar, och undersöker hur Shapley-värdet kan belysa förändringar i agenters bidrag över tid och över olika tillstånd, samt sammanfatta dessa insikter över flera episoder. Resultaten visar att användningen av Shapley-värden förbättrar förståelsen för individuella agentbeteenden, erbjuder insikter i policybrister och miljönyanser, och hjälper till att identifiera agentredundanser – en egenskap med potentiella tillämpningar för energibesparingar i verkliga system. Sammanfattningsvis belyser denna uppsats den betydande potentialen av att använda Shapley-värdet som ett verktyg i förklaringsbar MARL.
|
Page generated in 0.1401 seconds