Spelling suggestions: "subject:"[een] MULTI-AGENT SYSTEMS"" "subject:"[enn] MULTI-AGENT SYSTEMS""
161 |
Securing multi-robot systems with inter-robot observations and accusationsWardega, Kacper Tomasz 24 May 2023 (has links)
In various industries, such as manufacturing, logistics, agriculture, defense, search and rescue, and transportation, Multi-robot systems (MRSs) are increasingly gaining popularity. These systems involve multiple robots working together towards a shared objective, either autonomously or under human supervision. However, as MRSs operate in uncertain or even adversarial environments, and the sensors and actuators of each robot may be error-prone, they are susceptible to faults and security threats unique to MRSs. Classical techniques from distributed systems cannot detect or mitigate these threats. In this dissertation, novel techniques are proposed to enhance the security and fault-tolerance of MRSs through inter-robot observations and accusations.
A fundamental security property is proposed for MRSs, which ensures that forbidden deviations from a desired multi-robot motion plan by the system supervisor are detected. Relying solely on self-reported motion information from the robots for monitoring deviations can leave the system vulnerable to attacks from a single compromised robot. The concept of co-observations is introduced, which are additional data reported to the supervisor to supplement the self-reported motion information. Co-observation-based detection is formalized as a method of identifying deviations from the expected motion plan based on discrepancies in the sequence of co-observations reported. An optimal deviation-detecting motion planning problem is formulated that achieves all the original application objectives while ensuring that all forbidden plan-deviation attacks trigger co-observation-based detection by the supervisor. A secure motion planner based on constraint solving is proposed as a proof-of-concept to implement the deviation-detecting security property.
The security and resilience of MRSs against plan deviation attacks are further improved by limiting the information available to attackers. An efficient algorithm is proposed that verifies the inability of an attacker to stealthily perform forbidden plan deviation attacks with a given motion plan and announcement scheme. Such announcement schemes are referred to as horizon-limiting. An optimal horizon-limiting planning problem is formulated that maximizes planning lookahead while maintaining the announcement scheme as horizon-limiting. Co-observations and horizon-limiting announcements are shown to be efficient and scalable in protecting MRSs, including systems with hundreds of robots, as evidenced by a case study in a warehouse setting.
Lastly, the Decentralized Blocklist Protocol (DBP), a method for designing Byzantine-resilient decentralized MRSs, is introduced. DBP is based on inter-robot accusations and allows cooperative robots to identify misbehavior through co-observations and share this information through the network. The method is adaptive to the number of faulty robots and is widely applicable to various decentralized MRS applications. It also permits fast information propagation, requires fewer cooperative observers of application-specific variables, and reduces the worst-case connectivity requirement, making it more scalable than existing methods. Empirical results demonstrate the scalability and effectiveness of DBP in cooperative target tracking, time synchronization, and localization case studies with hundreds of robots.
The techniques proposed in this dissertation enhance the security and fault-tolerance of MRSs operating in uncertain and adversarial environments, aiding in the development of secure MRSs for emerging applications.
|
162 |
Evolving social behavior of caribou agents in wolf-caribou predator-prey pursuit problem / 狼とカリブー捕食者捕食問題におけるカリブーエージェントの社会的行為の進化に関する研究 / オオカミ ト カリブー ホショクシャ ホショク モンダイ ニオケル カリブー エージェント ノ シャカイテキ コウイ ノ シンカ ニカンスル ケンキュウ / Emergence of collective escaping strategies of various sized teams of empathic caribou agents in the wolf-caribou predator-prey problem黄 芳葳, Fang Wei Huang 22 March 2019 (has links)
We investigate an approach to apply Genetic Programming for the evolution of optimal escaping strategies of a team of caribou agents in the wolf-caribou predator prey problem (WCPPP) where the WCPPP is comprised of a team of caribou agents attempting to escape from a single yet superior (in terms of sensory abilities, raw speed, and maximum energy) wolf agent in a simulated twodimensional infinite toroidal world. We empirically verify our hypothesis that the incorporation of empathy in caribou agents significantly improves both the evolution efficiency of the escaping behavior and the effectiveness of such a behavior. This finding may be viewed as a verification of the survival value of empathy and the resulting compassionate behavior of the escaping caribou agents. Moreover, considering the fact that a single caribou cannot escape from the superior wolf, the ability of a team of empathic caribou agents to escape may also be viewed as an illustration of the emergent nature of a successful escaping behavior – in that the team-level properties are more than the mere sum of the properties of the individual entities. Within this context, we also present empirical results that verify the complex (nonlinear) nature of the relationship between the size of team of caribou agents and the efficiency of their escaping behavior. / 博士(工学) / Doctor of Philosophy in Engineering / 同志社大学 / Doshisha University
|
163 |
Limitations and Extensions of the WoLF-PHC AlgorithmCook, Philip R. 27 September 2007 (has links) (PDF)
Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other matrix games and shown to produce satisfactory results.
|
164 |
Satisficing Theory and Non-Cooperative GamesNokleby, Matthew S. 18 March 2008 (has links) (PDF)
Satisficing game theory is an alternative to traditional non-cooperative game theory which offers increased flexibility in modeling players' social interactions. However, satisficing players with conflicting attitudes may implement dysfunctional behaviors, leading to poor performance. In this thesis, we present two attempts to "bridge the gap" between satisficing and non-cooperative game theory. First, we present an evolutionary method by which players adapt their attitudes to increase raw payoff, allowing players to overcome dysfunction. We extend the Nash equilibrium concept to satisficing games, showing that the evolutionary method presented leads the players toward an equilibrium in their attitudes. Second, we introduce the conditional utility functions of satisficing theory into an otherwise traditional non-cooperative framework. While the conditional structure allows increased social flexibility in the players' behaviors, players maximize individual utility in the traditional sense, allowing us to apply the Nash equilibrium. We find that, by adjusting players' attitudes, we may alter the Nash equilibria that result.
|
165 |
A Multi-Agent Pickup and Delivery System for Automated Stores with Batched Tasks / Ett multiagentsystem för orderhantering i automatiserade butikerHolmgren, Evelina, Wijk Stranius, Simon January 2022 (has links)
Throughout today’s society, increasingly more areas are being automated. Grocery stores however have been the same for years. Only recently, self-checkout counters and online shopping have been utilised in this business area. This thesis aims to take it to the next step by introducing automated grocery stores using a multi-agent system. Orders will be given to the system, and on a small area, multiple agents will pick the products in a time-efficient way and deliver them to the customer. This can both increase the throughput but also decrease the food waste and energy consumption of grocery stores. This thesis investigates already existing solutions for the multi-agent pickup and delivery problem. It extends these to the important case of batched tasks in order to improve the customer experience. Batches of tasks represent shopping carts, where fast completion of whole batches gives greater customer satisfaction. This notion is not mentioned in related work, where completion of single tasks is the main goal. Because of this, the existing solution does not accommodate the need of batches or the importance of completing whole batches fast and in somewhat linear order. For this purpose, a new metric called batch ordering weighted error (BOWE) was created that takes these factors into consideration. Using BOWE, one existing algorithm has been extended into prioritizing completing whole batches and is now called B-PIBT. This new algorithm has significantly improved BOWE and even batch service time for the algorithm in key cases and is now superior in comparison to the other state-of-the-art algorithms.
|
166 |
Identifying Influential Agents In Social SystemsMaghami, Mahsa 01 January 2014 (has links)
This dissertation addresses the problem of influence maximization in social networks. In- fluence maximization is applicable to many types of real-world problems, including modeling contagion, technology adoption, and viral marketing. Here we examine an advertisement domain in which the overarching goal is to find the influential nodes in a social network, based on the network structure and the interactions, as targets of advertisement. The assumption is that advertisement budget limits prevent us from sending the advertisement to everybody in the network. Therefore, a wise selection of the people can be beneficial in increasing the product adoption. To model these social systems, agent-based modeling, a powerful tool for the study of phenomena that are difficult to observe within the confines of the laboratory, is used. To analyze marketing scenarios, this dissertation proposes a new method for propagating information through a social system and demonstrates how it can be used to develop a product advertisement strategy in a simulated market. We consider the desire of agents toward purchasing an item as a random variable and solve the influence maximization problem in steady state using an optimization method to assign the advertisement of available products to appropriate messenger agents. Our market simulation 1) accounts for the effects of group membership on agent attitudes 2) has a network structure that is similar to realistic human systems 3) models inter-product preference correlations that can be learned from market data. The results on synthetic data show that this method is significantly better than network analysis methods based on centrality measures. The optimized influence maximization (OIM) described above, has some limitations. For instance, it relies on a global estimation of the interaction among agents in the network, rendering it incapable of handling large networks. Although OIM is capable of finding the influential nodes in the social network in an optimized way and targeting them for advertising, in large networks, performing the matrix operations required to find the optimized solution is intractable. To overcome this limitation, we then propose a hierarchical influence maximization (HIM) iii algorithm for scaling influence maximization to larger networks. In the hierarchical method the network is partitioned into multiple smaller networks that can be solved exactly with optimization techniques, assuming a generalized IC model, to identify a candidate set of seed nodes. The candidate nodes are used to create a distance-preserving abstract version of the network that maintains an aggregate influence model between partitions. The budget limitation for the advertising dictates the algorithm’s stopping point. On synthetic datasets, we show that our method comes close to the optimal node selection, at substantially lower runtime costs. We present results from applying the HIM algorithm to real-world datasets collected from social media sites with large numbers of users (Epinions, SlashDot, and WikiVote) and compare it with two benchmarks, PMIA and DegreeDiscount, to examine the scalability and performance. Our experimental results reveal that HIM scales to larger networks but is outperformed by degreebased algorithms in highly-connected networks. However, HIM performs well in modular networks where the communities are clearly separable with small number of cross-community edges. This finding suggests that for practical applications it is useful to account for network properties when selecting an influence maximization method.
|
167 |
An integrated data- and capability-driven approach to the reconfiguration of agent-based production systemsScrimieri, Daniele, Adalat, Omar, Afazov, S., Ratchev, S. 13 December 2022 (has links)
Yes / Industry 4.0 promotes highly automated mechanisms for setting up and operating flexible manufacturing systems, using distributed control and data-driven machine intelligence. This paper presents an approach to reconfiguring distributed production systems based on complex product requirements, combining the capabilities of the available production resources. A method for both checking the “realisability” of a product by matching required operations and capabilities, and adapting resources is introduced. The reconfiguration is handled by a multi-agent system, which reflects the distributed nature of the production system and provides an intelligent interface to the user. This is all integrated with a self-adaptation technique for learning how to improve the performance of the production system as part of a reconfiguration. This technique is based on a machine learning algorithm that generalises from past experience on adjustments. The mechanisms of the proposed approach have been evaluated on a distributed robotic manufacturing system, demonstrating their efficacy. Nevertheless, the approach is general and it can be applied to other scenarios. / This work was supported by the SURE Research Projects Fund of the University of Bradford and the European Commission (grant agreement no. 314762). / Research Development Fund Publication Prize Award winner, Nov 2022
|
168 |
Exploiting Structure in Coordinating Multiple Decision MakersMostafa, Hala 01 September 2011 (has links)
This thesis is concerned with sequential decision making by multiple agents, whether they are acting cooperatively to maximize team reward or selfishly trying to maximize their individual rewards. The practical intractability of this general problem led to efforts in identifying special cases that admit efficient computation, yet still represent a wide enough range of problems. In our work, we identify the class of problems with structured interactions, where actions of one agent can have non-local effects on the transitions and/or rewards of another agent. We addressed the following research questions: 1) How can we compactly represent this class of problems? 2) How can we efficiently calculate agent policies that maximize team reward (for cooperative agents) or achieve equilibrium (selfinterested agents)? 3) How can we exploit structured interactions to make reasoning about communication offline tractable? For representing our class of problems, we developed a new decision-theoretic model, Event-Driven Interactions with Complex Rewards (EDI-CR), that explicitly represents structured interactions. EDI-CR is a compact yet general representation capable of capturing problems where the degree of coupling among agents ranges from complete independence to complete dependence. For calculating agent policies, we draw on several techniques from the field of mathematical optimization and adapt them to exploit the special structure in EDI-CR. We developed a Mixed Integer Linear Program formulation of EDI-CR with cooperative agents that results in programs much more compact and faster to solve than formulations ignoring structure. We also investigated the use of homotopy methods as an optimization technique, as well as formulation of self-interested EDI-CR as a system of non-linear equations. We looked at the issue of communication in both cooperative and self-interested settings. For the cooperative setting, we developed heuristics that assess the impact of potential communication points and add the ones with highest impact to the agents' decision problems. Our heuristics successfully pick communication points that improve team reward while keeping problem size manageable. Also, by controlling the amount of communication introduced by a heuristic, our approach allows us to control the tradeoff between solution quality and problem size. For self-interested agents, we look at an example setting where communication is an integral part of problem solving, but where the self-interested agents have a reason to be reticent (e.g. privacy concerns). We formulate this problem as a game of incomplete information and present a general algorithm for calculating approximate equilibrium profile in this class of games.
|
169 |
Scaling Multi-Agent Learning in Complex EnvironmentsZhang, Chongjie 01 September 2011 (has links)
Cooperative multi-agent systems (MAS) are finding applications in a wide variety of domains, including sensor networks, robotics, distributed control, collaborative decision support systems, and data mining. A cooperative MAS consists of a group of autonomous agents that interact with one another in order to optimize a global performance measure. A central challenge in cooperative MAS research is to design distributed coordination policies. Designing optimal distributed coordination policies offline is usually not feasible for large-scale complex multi-agent systems, where 10s to 1000s of agents are involved, there is limited communication bandwidth and communication delay between agents, agents have only limited partial views of the whole system, etc. This infeasibility is either due to a prohibitive cost to build an accurate decision model, or a dynamically evolving environment, or the intractable computation complexity. This thesis develops a multi-agent reinforcement learning paradigm to allow agents to effectively learn and adapt coordination policies in complex cooperative domains without explicitly building the complete decision models. With multi-agent reinforcement learning (MARL), agents explore the environment through trial and error, adapt their behaviors to the dynamics of the uncertain and evolving environment, and improve their performance through experiences. To achieve the scalability of MARL and ensure the global performance, the MARL paradigm developed in this thesis restricts the learning of each agent to using information locally observed or received from local interactions with a limited number of agents (i.e., neighbors) in the system and exploits non-local interaction information to coordinate the learning processes of agents. This thesis develops new MARL algorithms for agents to learn effectively with limited observations in multi-agent settings and introduces a low-overhead supervisory control framework to collect and integrate non-local information into the learning process of agents to coordinate their learning. More specifically, the contributions of already completed aspects of this thesis are as follows: Multi-Agent Learning with Policy Prediction: This thesis introduces the concept of policy prediction and augments the basic gradient-based learning algorithm to achieve two properties: best-response learning and convergence. The convergence property of multi-agent learning with policy prediction is proven for a class of static games under the assumption of full observability. MARL Algorithm with Limited Observability: This thesis develops PGA-APP, a practical multi-agent learning algorithm that extends Q-learning to learn stochastic policies. PGA-APP combines the policy gradient technique with the idea of policy prediction. It allows an agent to learn effectively with limited observability in complex domains in presence of other learning agents. The empirical results demonstrate that PGA-APP outperforms state-of-the-art MARL techniques in both benchmark games. MARL Application in Cloud Computing: This thesis illustrates how MARL can be applied to optimizing online distributed resource allocation in cloud computing. Empirical results show that the MARL approach performs reasonably well, compared to an optimal solution, and better than a centralized myopic allocation approach in some cases. A General Paradigm for Coordinating MARL: This thesis presents a multi-level supervisory control framework to coordinate and guide the agents' learning process. This framework exploits non-local information and introduces a more global view to coordinate the learning process of individual agents without incurring significant overhead and exploding their policy space. Empirical results demonstrate that this coordination significantly improves the speed, quality and likelihood of MARL convergence in large-scale, complex cooperative multi-agent systems. An Agent Interaction Model: This thesis proposes a new general agent interaction model. This interaction model formalizes a type of interactions among agents, called {\em joint-even-driven} interactions, and define a measure for capturing the strength of such interactions. Formal analysis reveals the relationship between interactions between agents and the performance of individual agents and the whole system. Self-Organization for Nearly-Decomposable Hierarchy: This thesis develops a distributed self-organization approach, based on the agent interaction model, that dynamically form a nearly decomposable hierarchy for large-scale multi-agent systems. This self-organization approach is integrated into supervisory control framework to automatically evolving supervisory organizations to better coordinating MARL during the learning process. Empirically results show that dynamically evolving supervisory organizations can perform better than static ones. Automating Coordination for Multi-Agent Learning: We tailor our supervision framework for coordinating MARL in ND-POMDPs. By exploiting structured interaction in ND-POMDPs, this tailored approach distributes the learning of the global joint policy among supervisors and employs DCOP techniques to automatically coordinate distributed learning to ensure the global learning performance. We prove that this approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability.
|
170 |
Multi-Agent Reinforcement Learning for Cooperative Edge Cloud Computing / 協調的エッジクラウドコンピューティングのためのマルチエージェント強化学習Ding, Shiyao 26 September 2022 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24261号 / 情博第805号 / 新制||情||136(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 伊藤 孝行, 教授 吉川 正俊, 教授 神田 崇行, 特定准教授 LIN Donghui / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
Page generated in 0.0714 seconds