Spelling suggestions: "subject:"multiagent learning"" "subject:"multitangent learning""
1 |
Mobilized ad-hoc networks: A reinforcement learning approachChang, Yu-Han, Ho, Tracey, Kaelbling, Leslie Pack 04 December 2003 (has links)
Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies.
|
2 |
Mobilized ad-hoc networks: A reinforcement learning approachChang, Yu-Han, Ho, Tracey, Kaelbling, Leslie Pack 04 December 2003 (has links)
Research in mobile ad-hoc networks has focused on situations in whichnodes have no control over their movements. We investigate animportant but overlooked domain in which nodes do have controlover their movements. Reinforcement learning methods can be used tocontrol both packet routing decisions and node mobility, dramaticallyimproving the connectivity of the network. We first motivate theproblem by presenting theoretical bounds for the connectivityimprovement of partially mobile networks and then present superiorempirical results under a variety of different scenarios in which themobile nodes in our ad-hoc network are embedded with adaptive routingpolicies and learned movement policies.
|
3 |
Learning Successful Strategies in Repeated General-sum GamesCrandall, Jacob W. 21 December 2005 (has links) (PDF)
Many environments in which an agent can use reinforcement learning techniques to learn profitable strategies are affected by other learning agents. These situations can be modeled as general-sum games. When playing repeated general-sum games with other learning agents, the goal of a self-interested learning agent is to maximize its own payoffs over time. Traditional reinforcement learning algorithms learn myopic strategies in these games. As a result, they learn strategies that produce undesirable results in many games. In this dissertation, we develop and analyze algorithms that learn non-myopic strategies when playing many important infinitely repeated general-sum games. We show that, in many of these games, these algorithms outperform existing multiagent learning algorithms. We derive performance guarantees for these algorithms (for certain learning parameters) and show that these guarantees become stronger and apply to larger classes of games as more information is observed and used by the agents. We establish these results through empirical studies and mathematical proofs.
|
4 |
Deep Reinforcement Learning For Distributed Fog Network ProbingGuan, Xiaoding 01 September 2020 (has links)
The sixth-generation (6G) of wireless communication systems will significantly rely on fog/edge network architectures for service provisioning. To satisfy stringent quality of service requirements using dynamically available resources at the edge, new network access schemes are needed. In this paper, we consider a cognitive dynamic edge/fog network where primary users (PUs) may temporarily share their resources and act as fog nodes for secondary users (SUs). We develop strategies for distributed dynamic fog probing so SUs can find out available connections to access the fog nodes. To handle the large-state space of the connectivity availability that includes availability of channels, computing resources, and fog nodes, and the partial observability of the states, we design a novel distributed Deep Q-learning Fog Probing (DQFP) algorithm. Our goal is to develop multi-user strategies for accessing fog nodes in a distributed manner without any centralized scheduling or message passing. By using cooperative and competitive utility functions, we analyze the impact of the multi-user dynamics on the connectivity availability and establish design principles for our DQFP algorithm.
|
5 |
Dynamic Structure Adaptation for Communities of Learning MachinesLeJeune, Kennan Clark 23 May 2022 (has links)
No description available.
|
6 |
Scaling Multi-Agent Learning in Complex EnvironmentsZhang, Chongjie 01 September 2011 (has links)
Cooperative multi-agent systems (MAS) are finding applications in a wide variety of domains, including sensor networks, robotics, distributed control, collaborative decision support systems, and data mining. A cooperative MAS consists of a group of autonomous agents that interact with one another in order to optimize a global performance measure. A central challenge in cooperative MAS research is to design distributed coordination policies. Designing optimal distributed coordination policies offline is usually not feasible for large-scale complex multi-agent systems, where 10s to 1000s of agents are involved, there is limited communication bandwidth and communication delay between agents, agents have only limited partial views of the whole system, etc. This infeasibility is either due to a prohibitive cost to build an accurate decision model, or a dynamically evolving environment, or the intractable computation complexity. This thesis develops a multi-agent reinforcement learning paradigm to allow agents to effectively learn and adapt coordination policies in complex cooperative domains without explicitly building the complete decision models. With multi-agent reinforcement learning (MARL), agents explore the environment through trial and error, adapt their behaviors to the dynamics of the uncertain and evolving environment, and improve their performance through experiences. To achieve the scalability of MARL and ensure the global performance, the MARL paradigm developed in this thesis restricts the learning of each agent to using information locally observed or received from local interactions with a limited number of agents (i.e., neighbors) in the system and exploits non-local interaction information to coordinate the learning processes of agents. This thesis develops new MARL algorithms for agents to learn effectively with limited observations in multi-agent settings and introduces a low-overhead supervisory control framework to collect and integrate non-local information into the learning process of agents to coordinate their learning. More specifically, the contributions of already completed aspects of this thesis are as follows: Multi-Agent Learning with Policy Prediction: This thesis introduces the concept of policy prediction and augments the basic gradient-based learning algorithm to achieve two properties: best-response learning and convergence. The convergence property of multi-agent learning with policy prediction is proven for a class of static games under the assumption of full observability. MARL Algorithm with Limited Observability: This thesis develops PGA-APP, a practical multi-agent learning algorithm that extends Q-learning to learn stochastic policies. PGA-APP combines the policy gradient technique with the idea of policy prediction. It allows an agent to learn effectively with limited observability in complex domains in presence of other learning agents. The empirical results demonstrate that PGA-APP outperforms state-of-the-art MARL techniques in both benchmark games. MARL Application in Cloud Computing: This thesis illustrates how MARL can be applied to optimizing online distributed resource allocation in cloud computing. Empirical results show that the MARL approach performs reasonably well, compared to an optimal solution, and better than a centralized myopic allocation approach in some cases. A General Paradigm for Coordinating MARL: This thesis presents a multi-level supervisory control framework to coordinate and guide the agents' learning process. This framework exploits non-local information and introduces a more global view to coordinate the learning process of individual agents without incurring significant overhead and exploding their policy space. Empirical results demonstrate that this coordination significantly improves the speed, quality and likelihood of MARL convergence in large-scale, complex cooperative multi-agent systems. An Agent Interaction Model: This thesis proposes a new general agent interaction model. This interaction model formalizes a type of interactions among agents, called {\em joint-even-driven} interactions, and define a measure for capturing the strength of such interactions. Formal analysis reveals the relationship between interactions between agents and the performance of individual agents and the whole system. Self-Organization for Nearly-Decomposable Hierarchy: This thesis develops a distributed self-organization approach, based on the agent interaction model, that dynamically form a nearly decomposable hierarchy for large-scale multi-agent systems. This self-organization approach is integrated into supervisory control framework to automatically evolving supervisory organizations to better coordinating MARL during the learning process. Empirically results show that dynamically evolving supervisory organizations can perform better than static ones. Automating Coordination for Multi-Agent Learning: We tailor our supervision framework for coordinating MARL in ND-POMDPs. By exploiting structured interaction in ND-POMDPs, this tailored approach distributes the learning of the global joint policy among supervisors and employs DCOP techniques to automatically coordinate distributed learning to ensure the global learning performance. We prove that this approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability.
|
7 |
Playing is believing: the role of beliefs in multi-agent learningChang, Yu-Han, Kaelbling, Leslie P. 01 1900 (has links)
We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms and discuss some insights that can be gained. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the long-run against fair opponents. / Singapore-MIT Alliance (SMA)
|
8 |
Multi-Agent Reinforcement Learning Approaches for Distributed Job-Shop Scheduling ProblemsGabel, Thomas 10 August 2009 (has links)
Decentralized decision-making is an active research topic in artificial intelligence. In a distributed system, a number of individually acting agents coexist. If they strive to accomplish a common goal, the establishment of coordinated cooperation between the agents is of utmost importance. With this in mind, our focus is on multi-agent reinforcement learning (RL) methods which allow for automatically acquiring cooperative policies based solely on a specification of the desired joint behavior of the whole system.The decentralization of the control and observation of the system among independent agents, however, has a significant impact on problem complexity. Therefore, we address the intricacy of learning and acting in multi-agent systems by two complementary approaches.First, we identify a subclass of general decentralized decision-making problems that features regularities in the way the agents interact with one another. We show that the complexity of optimally solving a problem instance from this class is provably lower than solving a general one.Although a lower complexity class may be entered by sticking to certain subclasses of general multi-agent problems, the computational complexitymay be still so high that optimally solving it is infeasible. Hence, our second goal is to develop techniques capable of quickly obtaining approximate solutions in the vicinity of the optimum. To this end, we will develop and utilize various model-free reinforcement learning approaches.Many real-world applications are well-suited to be formulated in terms of spatially or functionally distributed entities. Job-shop scheduling represents one such application. We are going to interpret job-shop scheduling problems as distributed sequential decision-making problems, to employ the multi-agent RL algorithms we propose for solving such problems, and to evaluate the performance of our learning approaches in the scope of various established scheduling benchmark problems.
|
9 |
Dynamic opponent modelling in two-player gamesMealing, Richard Andrew January 2015 (has links)
This thesis investigates decision-making in two-player imperfect information games against opponents whose actions can affect our rewards, and whose strategies may be based on memories of interaction, or may be changing, or both. The focus is on modelling these dynamic opponents, and using the models to learn high-reward strategies. The main contributions of this work are: 1. An approach to learn high-reward strategies in small simultaneous-move games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, with (possibly discounted) rewards learnt from reinforcement learning, to lookahead using explicit tree search. Empirical results show that this gains higher average rewards per game than state-of-the-art reinforcement learning agents in three simultaneous-move games. They also show that several sequence prediction methods model these opponents effectively, supporting the idea of using them from areas such as data compression and string matching; 2. An online expectation-maximisation algorithm that infers an agent's hidden information based on its behaviour in imperfect information games; 3. An approach to learn high-reward strategies in medium-size sequential-move poker games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, which needs its hidden information (inferred by the online expectation-maximisation algorithm), to train a state-of-the-art no-regret learning algorithm by simulating games between the algorithm and the model. Empirical results show that this improves the no-regret learning algorithm's rewards when playing against popular and state-of-the-art algorithms in two simplified poker games; 4. Demonstrating that several change detection methods can effectively model changing categorical distributions with experimental results comparing their accuracies to empirical distributions. These results also show that their models can be used to outperform state-of-the-art reinforcement learning agents in two simultaneous-move games. This supports the idea of modelling changing opponent strategies with change detection methods; 5. Experimental results for the self-play convergence to mixed strategy Nash equilibria of the empirical distributions of plays of sequence prediction and change detection methods. The results show that they converge faster, and in more cases for change detection, than fictitious play.
|
10 |
Joint Trajectory and Handover Management for UAVs Co-existing with Terrestrial Users : Deep Reinforcement Learning Based Approaches / Gemensam bana och överlämnandehantering för UAV som samexisterar med markbundna användare : Deep Reinforcement Learning-baserade tillvägagångssättDeng, Yuhang January 2024 (has links)
Integrating unmanned aerial vehicles (UAVs) as aerial user equipments (UEs) into cellular networks is now considered as a promising solution to provide extensive wireless connectivity for supporting UAV-centric commercial or civilian applications. However, the co-existence of UAVs with conventional terrestrial UEs is one of the primary challenges for this solution. Flying at higher altitudes with maneuverability advantage, UAVs are able to establish line-of-sight (LoS) connectivity with more base stations (BSs) than terrestrial UEs. Although LoS connectivity reduces the communication delay of UAVs, they also simultaneously increase the interference that UAVs cause to terrestrial UEs. In scenarios involving multiple UAVs, LoS connectivity can even lead to interference issues among themselves. In addition, LoS connectivity leads to extensive overlapping coverage areas of multiple BSs for UAVs, forcing them to perform frequent handovers during the flight if the received signal strength (RSS)-based handover policy is employed. The trajectories and BS associations of UAVs, along with their radio resource allocation are essential design parameters aimed at enabling their seamless integration into cellular networks, with a particular focus on managing interference levels they generate and reducing the redundant handovers they performe. Hence, this thesis designs two joint trajectory and handover management approaches for single-UAV and multi-UAVs scenarios, respectively, aiming to minimize the weighted sum of three key performance indicators (KPIs): transmission delay, up-link interference, and handover numbers. The approaches are based on deep reinforcement learning (DRL) frameworks with dueling double deep Q-network (D3QN) and Q-learning with a MIXer network (QMIX) algorithms being selected as the training agents, respectively. The choice of these DRL algorithms is motivated by their capability in designing sequential decision-making policies consisting of trajectory design and handover management. Results show that the proposed approaches effectively address the aforementioned challenges while ensuring the low transmission delay of cellular-connected UAVs. These results are in contrast to the performance of benchmark scheme, which directs UAVs to follow the shortest path and perform handovers based on RSS. Specifically, when considering the single-UAV scenario, the D3QN-based approach reduces the up-link interference by 18% and the handover numbers by 90% with a 59% increase in transmission delay as compared to the benchmark. The equivalent delay increase is 15 microseconds, which is considered negligible. For the multi-UAVs scenario, the QMIX-based approach jointly optimizes three performance metrics as compared to the benchmark scheme, resulting in a 70% decrease in interference, a 91% decrease in handover numbers, and a 47% reduction in transmission delay. It is noteworthy that an increase of UAVs operating within the same network leads to performance degradation due to UAVs competing for communication resources and mutual interference. When transitioning from the single-UAV scenario to the multi-UAVs scenario, the performance of the benchmark scheme experiences a significant decline, with an increase of 199% in interference, 89% in handover numbers, and 652% in transmission delay. In contrast, the proposed QMIX algorithm effectively coordinates multiple UAVs, mitigating performance degradation and achieving performance similar to the D3QN algorithm applying in the single-UAV scenario: an interference increase of 9%, a handover numbers increase of 9% and a delay increase of 152%. The delay increase is attributed to the reduced communication resources available to each individual UAVs, given the constant communication resources of the network. / Att integrera obemannade flygfordon (UAV) som flyganvändarutrustning (UE) i cellulära nätverk anses nu vara en lovande lösning för att tillhandahålla omfattande trådlös anslutning för att stödja UAV-centrerade kommersiella eller civila tillämpningar. Men samexistensen av UAV med konventionella markbundna UE är en av de främsta utmaningarna för denna lösning. Flygande på högre höjder med manövrerbarhetsfördelar kan UAV:er etablera siktlinje (LoS)-anslutning med fler basstationer (BS) än markbundna UE. Även om LoS-anslutning minskar kommunikationsfördröjningen för UAV:er, ökar de samtidigt störningen som UAV:er orsakar för markbundna UE. I scenarier som involverar flera UAV:er kan LoS-anslutning till och med leda till störningsproblem sinsemellan. Dessutom leder LoS-anslutning till omfattande överlappande täckningsområden för flera BS:er för UAV, vilket tvingar dem att utföra frekventa överlämningar under flygningen om den mottagna signalstyrkan (RSS)-baserad överlämningspolicy används. UAV:s banor och BS-associationer, tillsammans med deras radioresursallokering, är väsentliga designparametrar som syftar till att möjliggöra deras sömlösa integrering i cellulära nätverk, med särskilt fokus på att hantera störningsnivåer de genererar och minska de redundanta handovers de utför. Därför designar denna avhandling två gemensamma bana och handover-hanteringsmetoder för en-UAV-respektive multi-UAV-scenarier, som syftar till att minimera den viktade summan av tre nyckelprestandaindikatorer (KPI:er): överföringsfördröjning, upplänksinterferens och överlämningsnummer . Tillvägagångssätten är baserade på ramverk för djup förstärkning inlärning (DRL) med duellerande dubbla djupa Q-nätverk (D3QN) och Q-lärande med ett MIXer-nätverk (QMIX) algoritmer som väljs som träningsagenter. Valet av dessa DRL-algoritmer motiveras av deras förmåga att utforma sekventiella beslutsfattande policyer som består av banadesign och handover-hantering. Resultaten visar att de föreslagna tillvägagångssätten effektivt tar itu med ovannämnda utmaningar samtidigt som de säkerställer den låga överföringsfördröjningen för mobilanslutna UAV:er. Dessa resultat står i kontrast till prestanda för benchmark-schemat, som styr UAV:er att följa den kortaste vägen och utföra överlämningar baserat på RSS. Närmare bestämt, när man överväger singel-UAV-scenariot, minskar det D3QN tillvägagångssättet upplänksinterferensen med 18% och överlämningssiffrorna med 90% med en 59% ökning av överföringsfördröjningen jämfört med riktmärket. Den ekvivalenta fördröjningsökningen är 15 mikrosekunder, vilket anses vara försumbart. För scenariot med flera UAV:er optimerar det QMIX-baserade tillvägagångssättet tillsammans tre prestandamått jämfört med benchmark-schemat, vilket resulterar i en 70% minskning av störningar, en 91% minskning av överlämningssiffror och en 47% minskning av överföringsfördröjningen. Det är anmärkningsvärt att en ökning av UAV:er som arbetar inom samma nätverk leder till prestandaförsämring på grund av UAV:er som konkurrerar om kommunikationsresurser och ömsesidig störning. Vid övergången från scenariot med en UAV till scenariot med flera UAV, upplever prestanda för benchmark-schemat en betydande nedgång, med en ökning på 199% av störningar, 89% i överlämnandetal och 652% i överföringsfördröjning. Däremot koordinerar den föreslagna QMIX-algoritmen effektivt flera UAV, vilket minskar prestandaförsämring och uppnår prestanda liknande D3QN-algoritmen som tillämpas i single-UAV-scenariot: en störningsökning på 9%, en ökning av antalet överlämningar med 9% och en fördröjningsökning på 152%. Ökningen av fördröjningen tillskrivs de minskade kommunikationsresurserna tillgängliga för varje enskild UAV, givet nätverkets konstanta kommunikationsresurser.
|
Page generated in 0.0931 seconds