Global ETD Search

531	Comparação de desenvolvimento orientado a agentes para jogos educacionais: um estudo de caso / Comparison of agents-oriented development in educational games: a study of case Vítor Manuel Fragoso Ferreira 23 March 2015 (has links) A tecnologia de agentes tem sido reconhecida como um paradigma promissor em sistemas educacionais da nova geração. Entretanto, o esforço e inflexibilidade de algumas metodologias próprias para agentesacarretam num alto custo, tempo e adaptação de escopo. Este trabalho visaavaliar alternativas de desenvolvimento de um jogo educacional médico orientado a agentes, através da aplicação de um estudo de caso, com o intuito de verificar se metodologias próprias para implementação de sistemas multiagentes trazem benefícios no resultado final da implementação do jogo, e também se os resultados alcançados na comparação de processos de desenvolvimento de cunho tradicional e ágil fazem diferença no resultado final. Desta forma, este trabalho compara três metodologias baseadas nos conceitos da Engenharia de Software através de um estudo de caso, sendo elas: O-MaSE que é uma metodologiatradicional de desenvolvimento de sistemas multiagentes e utiliza um processo de desenvolvimento tradicional; AgilePASSI que é baseada no processo de desenvolvimento ágil e específica para sistemas multiagentes; e, por último, Scrum que é uma metodologia ágil, não sendo específica para implementação de sistemas multiagentes / The agent technology has been recognized as a promising paradigm in educational systems of the new generation. However, the effort and inflexibility of some specific methodologies entail a high cost, time and adaptation scope. This work aims to validate options for developing an educational medical game oriented agents by applying an experiment in order to verify that methodologies specific to implement multi-agent systems provide benefits in the result of the implementation of the game, and also the results achieved by comparison of traditional and agile development processes makes a difference in the outcome. Thus, this paper compares three approaches based on the concepts of software engineering through an experiment, as follows: O-MaSE is a traditional methodology for the development of multi-agent systems and uses a traditional development process; AgilePASSI which is based on agile and specific development for multi-agent systems; and finally, Scrum that is an agile methodology, not specific to implementation of multi-agent systems. Agentes de Software Sistemas Multiagentes Engenharia de Software Metodologias Ágeis Metodologias de Sistemas Multiagentes Scrum O-MaSE AgilePASSI Software agents Multi-agent systems Software Engineering Agile methodologies Multi-agent systems Methodologies Scrum O-MaSE AgilePASSI ENGENHARIA DE SOFTWARE
532	Architecture distribuée à base d’agents pour optimiser la prise en charge des patients dans les services d’urgence en milieu hospitalier / Distributed agent based architecture to optimize the management of patients at the emergency department in hospitals Daknou, Amani 06 July 2011 (has links) Les établissements de santé sont confrontés à de nouveaux défis tels que le vieillissement de la population, la hausse des coûts des soins et les progrès rapides des technologies médicales. Les nouvelles politiques de contrôle du budget des soins ont été introduites pour augmenter l'efficacité, réduire les déchets et remodeler le système de santé.Ces établissements cibles présentent des réseaux complexes incluant de ressources humaines,financières, structurelles et technologiques visant à garantir les meilleurs soins. Ces enjeux concernent d’autant plus les services d’urgence engorgés par l’afflux massif des passages, qui doivent fournir des décisions rapides et assurer le dimensionnement de ses ressources afin de réduire les délais d’attente des patients sans compromettre la qualité de soin.L’objectif de cette thèse est de proposer des solutions appropriées aux services d’urgences permettant d’améliorer la prise en charge des patients en termes de temps d’attente. Nous avons commencé par analyser les problématiques de la filière des urgences afin d’engager une démarche d’amélioration. Par la suite, nous avons modélisé le processus de prise en charge des patients au service d’accueil des urgences à l’aide d’un système multi-agent ouvert et dynamique. Le système proposé permet de fournir une aide à la décision sur la planification de l’activité médicale et l’affectation des ressources humaines dans une unité où on se trouve souvent face à une situation d’urgence nécessitant une prise en charge rapide et efficace. Dans ce contexte, nous abordons le problème réactif d’optimisation de l’ordonnancement des opérations de soin et le problème de coordination du personnel médical. Nous nous intéressons au dimensionnement des ressources humaines au SU en adoptant une approche de prise en compte des compétences maîtrisées dans le but de trouver une adéquation avec celles requises par l’activité médicale afin avant tout d’accroitre la qualité, réduire les délais d’attente et permettre de dégager des gains de gestion / Health-care organizations are facing new challenges such as the aging population, the rise of health care costs and the rapid progress of medical technologies. New policies of health care budget control have been introduced to increase efficiency, reduce waste and reshape the entire health care system. Targeted organizations are complex networks of human,financial, structural and technological resources aiming at guarantying best public health care.These issues concern all the more Emergency Departments (ED) congested by the massive influx of passages and which must provide quick decisions and ensure the sizing of its resources to reduce waiting times for patients with out compromising quality of care.The objective of this thesis is to propose appropriate solutions to ED to improve carefor patients in terms of waiting time. We began by analyzing the problems of the emergency department in order to initiate a process of improvement. Subsequently, we modeled the process of care for patients at ED by using an open and dynamic multi-agent system. The proposed system can provide decision support on business planning and allocation of medical resources in a unit where one is often faced with an emergency situation requiring rapid and effective response. In this context, we study the reactive problem for optimizing scheduling of operations care and the coordination problem of medical staff. We take into account the skills mastered by human resources at ED in order to find a match with those required by the medical activity. This approach aims to increase quality, reduce time of expectation and provide pointers gains management Prise en charge des urgences Système multi-agent O-MaSE Optimisation Affectation du personnel médical Ordonnancement réactif Coordination Aide à la décision Mangement of emergencies Multi-agent system O-MaSE Optimization Assignment of medical staff Reactive scheduling Coordination Decision support
533	Architecture de COntrôle/COmmande dédiée aux systèmes Distribués Autonomes (ACO²DA) : application à une plate-forme multi-véhicules / Control and management architecture for distributed autonomous systems : application to multi-vehicles based platform Mouad, Mehdi 31 January 2014 (has links) La complexité associée à la coordination d’un groupe de robots mobiles est traitée dans cette thèse en investiguant plus avant les potentialités des architectures de commande multi-contrôleurs dont le but est de briser la complexité des tâches à exécuter. En effet, les robots mobiles peuvent évoluer dans des environnements très complexes et nécessitent de surcroît une coopération précise et sécurisée pouvant rapidement devenir inextricable. Ainsi, pour maîtriser cette complexité, le contrôleur dédié à la réalisation d’une tâche est décomposé en un ensemble de comportements/contrôleurs élémentaires (évitement d’obstacles et de collision entre les robots, attraction vers une cible, planification, etc.) qui lient les informations capteurs (provenant des capteurs locaux du robot, etc.) aux actionneurs des différentes entités robotiques. La tâche considérée dans cette thèse correspond à la navigation d’un groupe de robots mobiles dans des environnements peu ou pas connus en présence d’obstacles (statiques et dynamiques). La spécificité de l’approche théorique consiste à allier les avantages des architectures multi-contrôleurs à ceux des systèmes multi-agents et spécialement les modèles organisationnels afin d’apporter un haut niveau de coordination entre les agents/robots mobiles. Le groupe de robots mobiles est alors coordonné suivant les différentes normes et spécifications du modèle organisationnel. Ainsi, l’activation d’un comportement élémentaire en faveur d’un autre se fait en respectant les contraintes structurelles des robots en vue d’assurer le maximum de précision et de sécurité des mouvements coordonnés entre les différentes entités mobiles. La coopération se fait à travers un agent superviseur (centralisé) de façon à atteindre plus rapidement la destination désirée, les événements inattendus sont gérés quant à eux individuellement par les agents/robots mobiles de façon distribuée. L’élaboration du simulateur ROBOTOPIA nous a permis d’illustrer chacune des contributions de la thèse par un nombre important de simulations. / The difficulty of coordinating a group of mobile robots is adressed in this thesis by investigating control architectures which aim to break task complexity. In fact, multi-robot navigation may become rapidly inextricable, specifically if it is made in hazardous and dynamical environment requiring precise and secure cooperation. The considered task is the navigation of a group of mobile robots in unknown environments in presence of (static and dynamic) obstacles. To overcome its complexity, it is proposed to divide the overall task into a set of basic behaviors/controllers (obstacle avoidance, attraction to a dynamical target, planning, etc.). Applied control is chosen among these controllers according to sensors information (camera, local sensors, etc.). The specificity of the theoretical approach is to combine the benefits of multi-controller control architectures to those of multi-agent organizational models to provide a high level of coordination between mobile agents-robots systems. The group of mobile robots is then coordinated according to different norms and specifications of the organizational model. Thus, activating a basic behavior in favor of another is done in accordance with the structural constraints of the robots in order to ensure maximum safety and precision of the coordinated movements between robots. Cooperation takes place through a supervisor agent (centralized) to reach the desired destination faster ; unexpected events are individually managed by the mobile agents/robots in a distributed way. To guarantee performance criteria of the control architecture, hybrid systems tolerating the control of continuous systems in presence of discrete events are explored. In fact, this control allows coordinating (by discrete part) the different behaviors (continuous part) of the architecture. The development of ROBOTOPIA simulator allowed us to illustrate each contribution by many results of simulations. Systèmes multi-robots mobiles Systèmes multi-agents Évitement d’obstacles Architecture de contrôle/commande Modèles organisationnels multi-agents Simulateur ROBOTOPIA Multi-mobile robots systems Multi-agent systems Obstacle avoidance Control and Management Architecture Multi-agent organizationnal models ROBOTOPIA simulator
534	Méthodes d’ordonnancement et d’orchestration dynamique des tâches de soins pour optimiser la prise en charge des patients dans les urgences hospitalières / Scheduling and dynamic orchestration methods of care tasks to optimize the management of patients in hospital emergency department Ajmi, Faten 11 July 2019 (has links) Le service des urgences est un important service de soins qui représente le goulot d'étranglement de l'hôpital. Les urgences sont souvent confrontées à des problèmes de tension dans de nombreux pays à travers le monde. L'une des causes de la tension dans les urgences est l'interférence permanente entre trois types de patients : les patients déjà programmés, les patients non programmés et les patients non programmés urgents. Le but de cette thèse est de contribuer à l'étude et au développement d'un système d’aide à la décision pour améliorer la prise en charge des patients aussi bien en mode de fonctionnement normal qu’en mode tension. Deux principaux processus ont été développé. Un processus d’ordonnancement à horizon glissant en utilisant un algorithme mimétique avec l’intégration des opérateurs génétiques contrôlés pour déterminer un calendrier optimal de passage des patients. Le deuxième processus d’orchestration dynamique, à base d’agents communicants, tient compte de la nature dynamique et incertaine de l'environnement des urgences en actualisant continuellement ce calendrier. Cette orchestration pilote en temps réel le workflow du parcours patient, améliore pas à pas les indicateurs de performance durant l'exécution. Grâce aux comportements des agents et aux protocoles de communication, le système proposé a établi un lien direct en temps réel entre les performances requises sur le terrain et les actions afin de diminuer l'impact de la tension. Les résultats expérimentaux, mis en œuvre au CHRU de Lille, indiquent que l’application de nos approches permet d’améliorer les indicateurs de performance grâce aux pilotage par les agents du workflow en cours exécution. / The emergency department is an important care service that represents the hospital's bottleneck. Emergencies often face overcrowding problems in many countries worldwide. One of the causes of the emergency department overcrowding is the permanent interference between three types of arriving patients: already programmed patients, non-programmed patients and urgent non-programmed patients. The aim of this thesis is to contribute to the study and development a decision support system to improve patient management in both normal and overcrowding situation. Two main processes have been developed. A rolling-horizon scheduling process using a memetic algorithm with the integration of controlled genetic operators to determine an optimal schedule for patient. The second dynamic orchestration process, based on communicating agents, takes into account the dynamic and uncertain nature of the emergency environment by continually updating this schedule for patient. This orchestration monitoring in real time the workflow of the patient pathway improves step by step the performance indicators during the execution. Through agent behaviors and communication protocols, the proposed system has established a direct real-time link between the required performances and the effective actions in order to decrease the overcrowding impact. The experimental results in this thesis, implemented at the Regional University Hospital Center (RUHC) of Lille, justify the interest of the application of our approaches to improve the performance indicators thanks to the agents driven patient pathway workflows during their execution. Système de soins Gestion des tensions Ordonnancement à horizon glissant Orchestration Workflow collaboratif Système multi-agent Tâche à compétences multiples Logistique hospitalière Health care system Overcrowding management Rolling horizon scheduling Orchestration Collaborative workflow Multi-agent systems Multi-skill tasks Hospital logistics
535	DEEP LEARNING BASED MODELS FOR NOVELTY ADAPTATION IN AUTONOMOUS MULTI-AGENT SYSTEMS Marina Wagdy Wadea Haliem (13121685) 20 July 2022 (has links) <p>Autonomous systems are often deployed in dynamic environments and are challenged with unexpected changes (novelties) in the environments where they receive novel data that was not seen during training. Given the uncertainty, they should be able to operate without (or with limited) human intervention and they are expected to (1) Adapt to such changes while still being effective and efficient in performing their multiple tasks. The system should be able to provide continuous availability of its critical functionalities. (2) Make informed decisions independently from any central authority. (3) Be Cognitive: learns the new context, its possible actions, and be rich in knowledge discovery through mining and pattern recognition. (4) Be Reflexive: reacts to novel unknown data as well as to security threats without terminating on-going critical missions. These characteristics combine to create the workflow of autonomous decision-making process in multi-agent environments (i.e.,) any action taken by the system must go through these characteristic models to autonomously make an ideal decision based on the situation. </p> <p><br></p> <p>In this dissertation, we propose novel learning-based models to enhance the decision-making process in autonomous multi-agent systems where agents are able to detect novelties (i.e., unexpected changes in the environment), and adapt to it in a timely manner. For this purpose, we explore two complex and highly dynamic domains </p> <p>(1) Transportation Networks (e.g., Ridesharing application): where we develop AdaPool: a novel distributed diurnal-adaptive decision-making framework for multi-agent autonomous vehicles using model-free deep reinforcement learning and change point detection. (2) Multi-agent games (e.g., Monopoly): for which we propose a hybrid approach that combines deep reinforcement learning (for frequent but complex decisions) with a fixed-policy approach (for infrequent but straightforward decisions) to facilitate decision-making and it is also adaptive to novelties. (3) Further, we present a domain agnostic approach for decision making without prior knowledge in dynamic environments using Bootstrapped DQN. Finally, to enhance security of autonomous multi-agent systems, (4) we develop a machine learning based resilience testing of address randomization moving target defense. Additionally, to further improve the decision-making process, we present (5) a novel framework for multi-agent deep covering option discovery that is designed to accelerate exploration (which is the first step of decision-making for autonomous agents), by identifying potential collaborative agents and encouraging visiting the under-represented states in their joint observation space. </p> Autonomous agents and multiagent systems Adversarial machine learning Deep learning Reinforcement learning Deep Reinforcement Learning (DRL) Autonomous Systems Transportation Networks Multi-agent Systems Adversarial Machine Learning Multi-agent Games Ridesharing Novelty Adaptation Novelty detection Deep Learning Open World
536	Control Barrier Functions for Formation Control of Leader-follower Multi-agent Systems / Kontrollbarriärfunktioner för Formationskontroll av Leader-follower Multi-agent System Sun, Tianrun January 2023 (has links) This thesis studies formation control for a class of general leader-follower multi-agent systems with Control Barrier Functions (CBFs) such that connectivity maintenance is fulfilled for all the neighboring agents. In leader-follower multi-agent systems, only the leader agents are controlled by the externally designed input, while the followers are guided through their dynamic couplings with the neighboring agents. The main problem is how to keep all adjacent agents maintain within the communication distance during the formation process. In this thesis, Control Barrier Functions (CBFs) are utilized in order to maintain connectivity among the neighboring agents. This thesis firstly introduces a general first-order leader-follower multi-agent systems with proper connectivity constrains. All edges in the system are divided into three categories: follower-follower edges, leader-follower edges and leader-leader edges. Three different kinds of edges are discussed individually. For each category, the relevant topological conditions and control barrier functions are defined and proved for both tree graphs and general graphs. Several simulation examples are implemented to verify the developed results. Both theory and simulation results show that the developed results are a strong support for the formation control of leader-follower system in order to achieve connectivity maintenance. / Denna avhandling studerar formationskontroll för en klass av generella ledare-följare multi-agent-system med kontrollbarriärfunktioner (CBFs) så att anslutningsunderhållet uppfylls för alla angränsande agenter. I ledar-följare multi-agent-system är det bara ledaragenterna som styrs av den externt utformade ingången, medan följaren guidas genom sina dynamiska kopplingar med grannagenterna. Huvudproblemet är hur man kan hålla alla intilliggande agenter inom kommunikationsavståndet under bildningsprocessen. I det här examensarbetet används kontrollbarriärfunktioner (CBF) för att upprätthålla förbindelser mellan angränsande agenter. Detta examensarbete introducerar först ett allmänt första ordningens ledare-följare multi-agentsystem med korrekta anslutningsbegränsningar. Alla kanter i systemet är indelade i tre kategorier: efterföljarkanter, ledare-följarkanter och ledare-ledarkanter. Tre olika sorters kanter diskuteras individuellt. För varje kategori definieras och bevisas de relevanta topologiska förhållandena och kontrollbarriärfunktionerna för både trädgrafer och allmänna grafer. Flera simuleringsexempel implementeras för att verifiera de framtagna resultaten. Både teori- och simuleringsresultat visar att de utvecklade resultaten är ett starkt stöd för bildandet av ledare-följare-system för att uppnå anslutningsunderhåll Multi-agent System Leader-follower System Formation Control Control Barrier Function Connectivity Maintenance Multi-agent-system Leader-follower-system Formations Kontroll Kontroll Barriär Funktion Underhåll av Anslutningsmöjligheter Computer and Information Sciences Data- och informationsvetenskap
537	Consensus Seeking, Formation Keeping, and Trajectory Tracking in Multiple Vehicle Cooperative Control Ren, Wei 08 July 2004 (has links) (PDF) Cooperative control problems for multiple vehicle systems can be categorized as either formation control problems with applications to mobile robots, unmanned air vehicles, autonomous underwater vehicles, satellites, aircraft, spacecraft, and automated highway systems, or non-formation control problems such as task assignment, cooperative transport, cooperative role assignment, air traffic control, cooperative timing, and cooperative search. The cooperative control of multiple vehicle systems poses significant theoretical and practical challenges. For cooperative control strategies to be successful, numerous issues must be addressed. We consider three important and correlated issues: consensus seeking, formation keeping, and trajectory tracking. For consensus seeking, we investigate algorithms and protocols so that a team of vehicles can reach consensus on the values of the coordination data in the presence of imperfect sensors, communication dropout, sparse communication topologies, and noisy and unreliable communication links. The main contribution of this dissertation in this area is that we show necessary and/or sufficient conditions for consensus seeking with limited, unidirectional, and unreliable information exchange under fixed and switching interaction topologies (through either communication or sensing). For formation keeping, we apply a so-called "virtual structure" approach to spacecraft formation flying and multi-vehicle formation maneuvers. As a result, single vehicle path planning and trajectory generation techniques can be employed for the virtual structure while trajectory tracking strategies can be employed for each vehicle. The main contribution of this dissertation in this area is that we propose a decentralized architecture for multiple spacecraft formation flying in deep space with formation feedback introduced. This architecture ensures the necessary precision in the presence of actuator saturation, internal and external disturbances, and stringent inter-vehicle communication limitations. A constructive approach based on the satisficing control paradigm is also applied to multi-robot coordination in hardware. For trajectory tracking, we investigate nonlinear tracking controllers for fixed wing unmanned air vehicles and nonholonomic mobile robots with velocity and heading rate constraints. The main contribution of this dissertation in this area is that our proposed tracking controllers are shown to be robust to input uncertainties and measurement noise, and are computationally simple and can be implemented with low-cost, low-power microcontrollers. In addition, our approach allows piecewise continuous reference velocity and heading rate and can be extended to derive a variety of other trajectory tracking strategies. information consensus multi-agent systems cooperative control switched systems graph theory formation keeping formation flying multi-agent coordination unmanned air vehicles constrained control trajectory tracking satisficing control Electrical and Computer Engineering
538	Multi-robot coordination and planning with human-in-the-loop under STL specifications : Centralized and distributed frameworks / Multi-robotkoordination och planering med mänsklig interaktion under STL-specifikationer : Centraliserade och distribuerade ramverk Zhang, Yixiao January 2023 (has links) Recent urbanization and industrialization have brought tremendous pressure and challenges to modern autonomous systems. When considering multiple complex tasks, cooperation and coordination between multiple agents can improve efficiency in a system. In real-world applications, multi-agent systems (MAS) are widely used in various fields, such as robotics, unmanned aerial systems, autonomous vehicles, distributed sensor networks, etc. Unlike traditional MAS systems based on pre-defined algorithms and rules, a special human-in-loop (HIL) based MAS involves human interactions to enhance the system’s adaptability for special scenarios, as well as apply human preferences for robot control. However, existing HIL strategies are primarily based on human involvement at a low level, such as mixed-initiative control and mixed-agent scenarios with both human-driven and intelligent robots. There are fewer investigations on applying HIL in high-level coordination. In particular, designing a coordination strategy for multi-task multi-agent scenarios, which can also deal with real-time human commands, will be one of the key topics of this Master’s thesis project. In this thesis work, different kinds of tasks described by signal temporal logic (STL) are created for agents, which can be enforced by control barrier function (CBF) constraints. Both centralized and distributed frameworks are designed for agent coordination. In detail, the centralized strategy is developed for machine-to-infrastructure (M2I) communication, by using the nonlinear model predictive control (NMPC) method to obtain collision-free trajectories. The distributed strategy utilizing graph theory is proposed for machine-to-machine (M2M), in order to reduce computation time by offloading. Most importantly, a HIL model is generated for both frameworks to apply online human commands to the coordination, with a novel task allocation protocol. Simulations and experiments are carried out on both Matlab and Python-based ROS simulators, to show that proposed frameworks can achieve obvious performance advantages in safety, smoothness, and stability for task completion. Numerical results are provided to validate the feasibility and applicability of our algorithms. / Den senaste urbaniseringen och industrialiseringen har medfört enormt tryck och utmaningar för moderna autonoma system. Vid beaktande av flera komplexa uppgifter kan samarbete och samordning mellan flera agenter förbättra effektiviteten i ett system. I verkliga tillämpningar används multiagent-system (MAS) i stor utsträckning inom olika områden, såsom robotik, obemannade luftfarkoster, autonoma fordon, distribuerade sensorsystem etc. Till skillnad från traditionella MAS-system baserade på fördefinierade algoritmer och regler, innebär ett särskilt människa-i-loop (HIL)-baserat MAS mänsklig interaktion för att förbättra systemets anpassningsförmåga till speciella scenarier samt anpassa mänskliga preferenser för robotstyrning. Emellertid är befintliga HIL-strategier främst baserade på mänsklig inblandning på en låg nivå, såsom mixad-initiativkontroll och mixade agentscenarier med både människa-drivna och intelligenta robotar. Det finns färre undersökningar om att tillämpa HIL på högnivåkoordination. Särskilt att utforma en koordineringsstrategi för fleruppgiftsfleragent-scenarier, som också kan hantera mänskliga kommandon i realtid, kommer att vara ett av huvudämnena för detta masterprojekt. I detta examensarbete skapas olika typer av uppgifter beskrivna av signaltemporallogik (STL) för agenter, som kan upprätthållas genom styrbarriärfunktions (CBF) -begränsningar. Både centraliserade och distribuerade ramverk utformas för agentkoordination. Mer specifikt utvecklas den centraliserade strategin för maskin-till-infrastruktur (M2I)-kommunikation genom att använda icke-linjär modellprediktiv reglering (NMPC) för att erhålla kollisionsfria trajektorier. Den distribuerade strategin med användning av grafteori föreslås för maskin-till-maskin (M2M) för att minska beräkningstiden genom avlastning. Viktigast av allt genereras en HIL-modell för båda ramverken för att tillämpa online-mänskliga kommandon på koordinationen med en ny protokoll för uppgiftstilldelning. Simuleringar och experiment utförs på både Matlab och Python-baserade ROS-simulatorer för att visa att de föreslagna ramverken kan uppnå tydliga prestandafördelar när det gäller säkerhet, smidighet och stabilitet för uppgiftsslutförande. Numeriska resultat presenteras för att validera genomförbarheten och tillämpligheten hos våra algoritmer. Multi-agent systems Human-in-the-loop systems Signal temporal logic Cooperative control ROS Implementation Multi-agent-system Människa-i-loop-system Signaltemporallogik Samarbetande styrning ROS-implementering Computer and Information Sciences Data- och informationsvetenskap
539	Multi Agent Reinforcement Learning for Game Theory : Financial Graphs / Multi-agent förstärkning lärande för spelteori : Ekonomiska grafer Yu, Bryan January 2021 (has links) We present the rich research potential at the union of multi agent reinforcement learning (MARL), game theory, and financial graphs. We demonstrate how multiple game theoretic scenarios arise in three node financial graphs with minor modifications. We highlight six scenarios used in this study. We discuss how to setup an environment for MARL training and evaluation. We first investigate individual games and demonstrate that MARL agents consistently learn Nash Equilibrium strategies. We next investigate mixed games and find again that MARL agents learn Nash Equilibrium strategies given sufficient information and incentive (e.g. prosociality). We find introducing a embedding layer in agents deep network improves learned representations and as such, learned strategies, (2) MARL agents can learn a variety of complex strategies, and (3) selfishness improves strategies’ fairness and efficiency. Next we introduce populations and find that (1) pro social members in a population influences the action profile and that (2) complex strategies present in individual scenarios no longer emerge as populations’ portfolio of strategies converge to a main diagonal. We identify two challenges that arises in populations; namely (1) identifying partner’s prosociality and (2) identifying partner’s identity. We study three information settings which supplement agents observation set and find having knowledge of partners prosociality or identity to have negligible impact on how portfolio of strategies converges. / Vi presenterar den rika forskningspotentialen vid unionen av multi-agent förstärkningslärning (MARL), spelteori och finansiella grafer. Vi demonstrerar hur flera spelteoretiska scenarier uppstår i tre nodgrafikgrafer med mindre ändringar. Vi belyser sex scenarier som används i denna studie. Vi diskuterar hur man skapar en miljö för MARL -utbildning och utvärdering. Vi undersöker först enskilda spel och visar att MARL -agenter konsekvent lär sig Nash Equilibrium -strategier. Vi undersöker sedan blandade spel och finner igen att MARL -agenter lär sig Nash Equilibrium -strategier med tillräcklig information och incitament (t.ex. prosocialitet). Vi finner att införandet av ett inbäddande lager i agenternas djupa nätverk förbättrar inlärda representationer och som sådan inlärda strategier, (2) MARL-agenter kan lära sig en mängd komplexa strategier och (3) själviskhet förbättrar strategiernas rättvisa och effektivitet. Därefter introducerar vi populationer och upptäcker att (1) pro sociala medlemmar i en befolkning påverkar åtgärdsprofilen och att (2) komplexa strategier som finns i enskilda scenarier inte längre framkommer när befolkningens portfölj av strategier konvergerar till en huvuddiagonal. Vi identifierar två utmaningar som uppstår i befolkningen; nämligen (1) identifiera partnerns prosocialitet och (2) identifiera partnerns identitet. Vi studerar tre informationsinställningar som kompletterar agents observationsuppsättning och finner att kunskap om partners prosocialitet eller identitet har en försumbar inverkan på hur portföljen av strategier konvergerar. Multi-Agent Reinforcement Learning Reinforcement Learning Game Theory Financial Networks Financial Graphs Machine Learning Financial Graphs Computational Economics Multi-Agent Reinforcement Learning Reinforcement Learning Game Theory Financial Networks Financial Graphs Machine Learning Financial Graphs Computational Economics Computer Sciences Datavetenskap (datalogi)
540	Self-play for human-agent communication Gupta, Abhinav 11 1900 (has links) Les systèmes multi-agents fournissent un cadre pour jouer avec une population d’agents afin de simuler un comportement humain dans des environnements artificiels. Ils nous permettent de former des agents artificiels en utilisant le jeu en autonomie afin qu’ils puissent développer des stratégies pour résoudre des problèmes tout en collaborant/en rivalisant avec d’autres agents dans un environnement. La communication multi-agent imite cette configuration où les agents sont formés pour développer des langages émergents qui sont ensuite utilisés pour résoudre des tâches coopératives (ou mixtes). L’objectif final est de combler le fossé entre ces langages émergents et le langage naturel pour une communication efficace avec les humains. Ce travail vise à augmenter les agents artificiels avec la capacité d’utiliser et de comprendre le langage naturel. À cette fin, je présente quelques articles qui explorent différentes facettes de ce problème de recherche. J’étudie et propose des algorithmes qui montrent comment les populations et le jeu autonome peuvent aider à l’apprentissage de diverses stratégies qui peuvent faciliter la communication homme-agent. / Multi-agent systems provide a framework to play with a population of agents to simulate human-like behavior in artificial environments. They allow us to train artificial agents using self-play so that they can develop strategies to solve problems while collaborating/competing with other agents in an environment. Multi-agent communication imitates this setup where agents are trained to develop emergent languages that are then used to solve cooperative (or mixed) tasks. The eventual goal is to bridge the gap between these emergent languages and natural language for efficient communication with humans. This work aims to augment artificial agents with the capability to use and understand natural language. To this end, I present three articles that explore different facets of this research problem. I investigate and provide algorithms that show how populations and self-play can help in learning diverse strategies that can facilitate human-agent communication. Emergent languages Population-based learning Meta-learning Human-agent interaction Apprentissage basé sur la population Méta-apprentissage Langages émergents Interaction homme-agent Communication multi-agent Multi-agent communication

Search results