Global ETD Search

241	Prediction of Protein-Protein Interactions Using Deep Learning Techniques Soleymani, Farzan 24 April 2023 (has links) Proteins are considered the primary actors in living organisms. Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. PPI identification has been addressed by various experimental methods such as the yeast two-hybrid, mass spectrometry, and protein microarrays, to mention a few. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. Therefore a sequence-based framework called ProtInteract is developed to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequential pattern by extracting uncorrelated attributes and more expressive descriptors. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction. Three different scenarios formulate the prediction task. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The present study makes two significant contributions to the field of protein-protein interaction (PPI) prediction. Firstly, it addresses the computational challenges posed by the high dimensionality of protein datasets through the use of dimensionality reduction techniques, which extract highly informative sequence attributes. Secondly, the proposed framework, ProtInteract, utilises this information to identify the interaction characteristics of a protein based on its amino acid configuration. ProtInteract encodes the protein's primary structure into a lower-dimensional vector space, thereby reducing the computational complexity of PPI prediction. Our results provide evidence of the proposed framework's accuracy and efficiency in predicting protein-protein interactions. Long-short Term Memory Recurrent Neural Networks Protein-Protein Interaction Temporal Convolutional Network Convolutional Neural Network Autoencoder Reinforcement learning actor-critic portfolio management stock market prediction coverage control multi-agent system SARSA Q-learning Graph convolutional neural network GCN state-action-reward-state-action
242	Collaboration in Multi-agent Games : Synthesis of Finite-state Strategies in Games of Imperfect Information / Samarbete i multiagent-spel : Syntes av ändliga strategier i spel med ofullständig information Lundberg, Edvin January 2017 (has links) We study games where a team of agents needs to collaborate against an adversary to achieve a common goal. The agents make their moves simultaneously, and they have different perceptions about the system state after each move, due to different sensing capabilities. Each agent can only act based on its own experiences, since no communication is assumed during the game. However, before the game begins, the agents can agree on some strategy. A strategy is winning if it guarantees that the agents achieve their goal regardless of how the opponent acts. Identifying a winning strategy, or determining that none exists, is known as the strategy synthesis problem. In this thesis, we only consider a simple objective where the agents must force the game into a given state. Much of the literature is focused on strategies that either rely on that the agents (a) can remember everything that they have perceived or (b) can only remember the last thing that they have perceived. The strategy synthesis problem is (in the general case) undecidable in (a) and has exponential running time in (b). We are interested in the middle, where agents can have finite memory. Specifically, they should be able to keep a finite-state machine, which they update when they make new observations. In our case, the internal state of each agent represents its knowledge about the state of affairs. In other words, an agent is able to update its knowledge, and act based on it. We propose an algorithm for constructing the finite-state machine for each agent, and assigning actions to the internal states before the game begins. Not every winning strategy can be found by the algorithm, but we are convinced that the ones found are valid ones. An important building block for the algorithm is the knowledge-based subset construction (KBSC) used in the literature, which we generalise to games with multiple agents. With our construction, the game can be reduced to another game, still with uncertain state information, but with less or equal uncertainty. The construction can be applied arbitrarily many times, but it appears as if it stabilises (so that no new knowledge is gained) after only a few steps. We discuss this and other interesting properties of our algorithm in the final chapters of this thesis. / Vi studerar spel där ett lag agenter behöver samarbeta mot en motståndare för att uppnå ett mål. Agenterna agerar samtidigt, och vid varje steg av spelet så har de olika uppfattning om spelets tillstånd. De antas inte kunna kommunicera under spelets gång, så agenterna kan bara agera utifrån sina egna erfarenheter. Innan spelet börjar kan agenterna dock komma överrens om en strategi. En sådan strategi är vinnande om den garanterar att agenterna når sitt mål oavsett hur motståndaren beter sig. Att hitta en vinnande strategi är känt som syntesproblemet. I den här avhandlingen behandlar vi endast ett enkelt mål där agenterna måste tvinga in spelet i ett givet tillstånd. Mycket av litteraturen handlar om strategier där agenterna antingen antas (a) kunna minnas allt som de upplevt eller (b) bara kunna minnas det senaste de upplevt. Syntesproblemet är (i det generella fallet) oavgörbart i (a) och tar exponentiell tid i (b). Vi är intressede av fallet där agenter kan ha ändligt minne. De ska kunna ha en ändlig automat, som de kan uppdatera när de får nya observationer. I vårt fall så representerar det interna tillståndet agentens kunskap om spelets tillstånd. En agent kan då uppdatera sin kunskap och agera utifrån den. Vi föreslår en algoritm som konstruerar en ändlig automat åt varje agent, samt instruktioner för vad agenten ska göra i varje internt tillstånd. Varje vinnande strategi kan inte hittas av algoritmen, men vi är övertygade om att de som hittas är giltiga. En viktig byggsten är den kunskapsbaserade delmängskonstruktionen (KBSC), som vi generaliserar till spel med flera agenter. Med vår konstruktion kan spelet reduceras till ett annat spel som har mindre eller lika mycket osäkerhet. Detta kan göras godtyckligt många gånger, men det verkar som om att ingen ny kunskap tillkommer efter bara några gånger. Vi diskuterar detta vidare tillsammans med andra intressanta egenskaper hos algoritmen i de sista kapitlen i avhandlingen. Multi-agent games multiagent games multi-agent system imperfect information imperfect recall collaboration imperfect communication finite-state strategy concurrent games concurrent system strategy synthesis strategy construction automated programming automated problem solving automated collaboration verification knowledge-based subset construction knowledge tracking Computer Sciences Datavetenskap (datalogi)
243	A multi-agent nudge-based approach for disclosure mitigation online Ben Salem, Rim 08 1900 (has links) En 1993, alors qu’Internet faisait ses premiers pas, le New York Times publie un dessin de presse désormais célèbre avec la légende "Sur Internet, personne ne sait que tu es un chien". C’était une façon amusante de montrer qu’Internet offre à ses usagers un espace sûr à l’abri de tout préjugé, sarcasme, ou poursuites judiciaires. C’était aussi une annonce aux internautes qu’ils sont libres de ne montrer de leurs vies privées que ce qu’ils veulent laisser voir. Les années se succèdent pour faire de cette légende une promesse caduque qui n’a pu survivre aux attraits irrésistibles d’aller en ligne. Les principales tentations sont l’anonymat et la possibilité de se créer une identité imaginée, distincte de celle de la réalité. Hélas, la propagation exponentielle des réseaux sociaux a fait chevaucher les identités réelles et fictives des gens. Les usagers ressentent un besoin d’engagement de plus en plus compulsif. L’auto-divulgation bat alors son plein à cause de l’ignorance du public des conséquences de certains comportements. Pour s’attirer l’attention, les gens recourent au partage d’informations personnelles, d’appartenance de tous genres, de vœux, de désirs, etc. Par ailleurs, l’espoir et l’angoisse les incitent aussi à communiquer leurs inquiétudes concernant leurs états de santé et leurs expériences parfois traumatisantes au détriment de la confidentialité de leurs vies privées. L’ambition et l’envie de se distinguer incitent les gens à rendre publics leurs rituels, pratiques ou évènements festifs engageant souvent d’autres individus qui n’ont pas consenti explicitement à la publication du contenu. Des adolescents qui ont grandi à l’ère numérique ont exprimé leurs désapprobations quant à la façon dont leurs parents géraient leurs vies privées lorsqu’ils étaient enfants. Leurs réactions allaient d’une légère gêne à une action de poursuite en justice. La divulgation multipartite pose problème. Les professionnels, les artistes ainsi que les activistes de tout horizon ont trouvé aux réseaux sociaux un outil incontournable et efficace pour promouvoir leurs secteurs. Le télétravail qui se propage très rapidement ces dernières années a offert aux employés le confort de travailler dans un environnement familier, ils ont alors tendance à négliger la vigilance "du bureau" exposant ainsi les intérêts de leurs employeurs au danger. Ils peuvent aussi exprimer des opinions personnelles parfois inappropriées leur causant des répercussions néfastes. L’accroissement de l’insécurité liée au manque de vigilance en ligne et à l’ignorance des usagers a mené les chercheurs a puiser dans les domaines de sociologie, des sciences de comportement et de l’économie de la vie privée pour étudier les raisons et les motivations de la divulgation. Le "nudge", comme approche d’intervention pour améliorer le bien-être d’un individu ou d’un groupe de personnes, fût une solution largement adoptée pour la préservation de la vie privée. Deux concepts ont émergé. Le premier a adopté une solution "one-sizefits-all" qui est commune à tous les utilisateurs. Quoique relativement simple à mettre en œuvre et d’une protection satisfaisante de la vie privée, elle était rigide et peu attentive aux conditions individuelles des utilisateurs. Le second a plutôt privilégié les préférences des usagers pour résoudre, même en partie, la question de personnalisation des "nudges". Ce qui a été motivant pour les utilisateurs mais nuisible à leurs confidentialités. Dans cette thèse, l’idée principale est de profiter des mérites des deux concepts en les fusionnant. J’ai procédé à l’exploration de l’économie de la vie privée. Les acteurs de ce secteur sont, autres que le propriétaire de données lui-même, le courtier qui sert d’intermédiaire et l’utilisateur de ces données. Le mécanisme d’interaction entre eux est constitué par les échanges de données comme actifs et les compensations monétaires en retour. L’équilibre de cette relation est atteint par la satisfaction de ses parties prenantes. Pour faire de bons choix, l’équitabilité exige que le propriétaire de données ait les connaissances minimales nécessaires dans le domaine et qu’il soit conscient des contraintes qu’il subit éventuellement lors de la prise de décision. A la recherche d’un utilisateur éclairé, j’ai conçu un cadre que j’ai nommé Multipriv. Il englobe les facteurs d’influence sur la perception des gens de la vie privée. J’ai ensuite proposé un système multi-agents basé sur le "nudge" pour l’atténuation de la divulgation en ligne. Son principal composant comprend trois agents. Le premier est l’agent objectif Aegis qui se réfère aux solutions généralisées axées sur la protection des données personnelles. Le second est un agent personnel qui considère le contexte dans lequel se trouve le propriétaire de données. Le dernier est un agent multipartite qui représente les personnes impliquées dans le contenu en copropriété. Pour évaluer le système, une plateforme appelée Cognicy est implémentée et déployée. Elle imite de véritables plateformes de réseaux sociaux par l’offre de la possibilité de créer un profil, publier des statuts, joindre des photos, établir des liens avec d’autres, etc. Sur une population de 150 utilisateurs, ma proposition s’est classée meilleure que l’approche de base non spécifique au contexte en termes de taux d’acceptation des "nudges". Les retours des participants à la fin de leurs sessions expriment une appréciation des explications fournies dans les "nudges" et des outils mis à leur disposition sur la plateforme. / When the internet was in its infancy in 1993, the New York Times published a now-famous cartoon with the caption “On the Internet, nobody knows you’re a dog.”. It was an amusing way to denote that the internet offers a safe space and a shelter for people to be free of assumptions and to only disclose what they want to be shown of their personal lives. The major appeal to go online was anonymity and the ability to create a whole new persona separate from real life. However, the rising popularity of social media made people’s digital and physical existences collide. Social Networking Sites (SNS) feed the need for compulsive engagement and attention-seeking behaviour. This results in self-disclosure, which is the act of sharing personal information such as hopes, aspirations, fears, thoughts, etc. These platforms are fertile grounds for oversharing health information, traumatic experiences, casual partying habits, and co-owned posts that show or mention individuals other than the sharer. The latter practice is called multiparty disclosure and it is an issue especially when the other people involved do not explicitly consent to the shared content. Adolescents who grew up in the digital age expressed disapproval of how their parents handled their privacy as children. Their reactions ranged from slight embarrassment to pursuing legal action to regain a sense of control. The repercussions of privacy disclosure extend to professional lives since many people work from home nowadays and tend to be more complacent about privacy in their familiar environment. This can be damaging to employees who lose the trust of their employers, which can result in the termination of their contracts. Even when individuals do not disclose information related to their company, their professional lives can suffer the consequences of sharing unseemly posts that should have remained private. For the purpose of addressing the issue of oversharing, many researchers have studied and investigated the reasons and motivations behind it using multiple perspectives such as economics, behavioural science, and sociology. After the popularization of nudging as an intervention approach to improve the well-being of an individual or a group of people, there was an emerging interest in applying the concept to privacy preservation. After the initial wave of non-user-specific one-size-fits-all propositions, the scope of research extended to personalized solutions that consider individual preferences. The former are privacy-focused and more straightforward to implement than their personalized counterparts but they tend to be more rigid and less considerate of individual situations. On the other hand, the latter has the potential to understand users but can end up reinforcing biases and underperforming in their privacy protection objective. The main idea of my proposition is to merge the concepts introduced by the two waves to benefit from the merits of each. Because people exist within a larger ecosystem that governs their personal information, I start by exploring the economics of privacy in which the actors are presented as the data owner (individual), broker, and data user. I explain how they interact with one another through exchanges of data as assets and monetary compensation, in return. An equilibrium can be achieved where the user is satisfied with the level of anonymity they are afforded. However, in order to achieve this, the person whose information is used as a commodity needs to be aware and make the best choices for themselves. This is not always the case because users can lack knowledge to do so or they can be susceptible to contextual biases that warp their decision-making faculty. For this reason, my next objective was to design a framework called Multipriv, which encompasses the factors that influence people’s perception of privacy. Then, I propose a multi-agent nudge-based approach for disclosure mitigation online. Its core component includes an objective agent Aegis that is inspired by privacy-focused onesize-fits-all solutions. Furthermore, a personal agent represents the user’s context-specific perception, which is different from simply relying on preferences. Finally, a multiparty agent serves to give the other people involved in the co-owned content a voice. To evaluate the system, a platform called Cognicy is implemented and deployed. It mimics real social media platforms by offering the option of creating a profile, posting status updates, attaching photos, making connections with others, etc. Based on an evaluation using 150 users, my proposition proved superior to the baseline non-context-specific approach in terms of the nudge acceptance rate. Moreover, the feedback submitted by the participants at the end of their session expressed an appreciation of the explanations provided in the nudges, the visual charts, and the tools at their disposition on the platform. Divulgation Économie de la vie privée Informations personnelles Multipriv Système multi-agent Aegis Agent personnel Agent multipartite Copropriété Disclosure Self-disclosure Multiparty disclosure Economics of privacy Personal information Decision-making Multipriv Nudge-based multi-agent system Personal agent Multiparty agent Nudge
244	Deep Reinforcement Learning for Multi-Agent Path Planning in 2D Cost Map Environments : using Unity Machine Learning Agents toolkit Persson, Hannes January 2024 (has links) Multi-agent path planning is applied in a wide range of applications in robotics and autonomous vehicles, including aerial vehicles such as drones and other unmanned aerial vehicles (UAVs), to solve tasks in areas like surveillance, search and rescue, and transportation. In today's rapidly evolving technology in the fields of automation and artificial intelligence, multi-agent path planning is growing increasingly more relevant. The main problems encountered in multi-agent path planning are collision avoidance with other agents, obstacle evasion, and pathfinding from a starting point to an endpoint. In this project, the objectives were to create intelligent agents capable of navigating through two-dimensional eight-agent cost map environments to a static target, while avoiding collisions with other agents and simultaneously minimizing the path cost. The method of reinforcement learning was used by utilizing the development platform Unity and the open-source ML-Agents toolkit that enables the development of intelligent agents with reinforcement learning inside Unity. Perlin Noise was used to generate the cost maps. The reinforcement learning algorithm Proximal Policy Optimization was used to train the agents. The training was structured as a curriculum with two lessons, the first lesson was designed to teach the agents to reach the target, without colliding with other agents or moving out of bounds. The second lesson was designed to teach the agents to minimize the path cost. The project successfully achieved its objectives, which could be determined from visual inspection and by comparing the final model with a baseline model. The baseline model was trained only to reach the target while avoiding collisions, without minimizing the path cost. A comparison of the models showed that the final model outperformed the baseline model, reaching an average of $27.6\%$ lower path cost. / Multi-agent-vägsökning används inom en rad olika tillämpningar inom robotik och autonoma fordon, inklusive flygfarkoster såsom drönare och andra obemannade flygfarkoster (UAV), för att lösa uppgifter inom områden som övervakning, sök- och räddningsinsatser samt transport. I dagens snabbt utvecklande teknik inom automation och artificiell intelligens blir multi-agent-vägsökning allt mer relevant. De huvudsakliga problemen som stöts på inom multi-agent-vägsökning är kollisioner med andra agenter, undvikande av hinder och vägsökning från en startpunkt till en slutpunkt. I detta projekt var målen att skapa intelligenta agenter som kan navigera genom tvådimensionella åtta-agents kostnadskartmiljöer till ett statiskt mål, samtidigt som de undviker kollisioner med andra agenter och minimerar vägkostnaden. Metoden förstärkningsinlärning användes genom att utnyttja utvecklingsplattformen Unity och Unitys open-source ML-Agents toolkit, som möjliggör utveckling av intelligenta agenter med förstärkningsinlärning inuti Unity. Perlin Brus användes för att generera kostnadskartorna. Förstärkningsinlärningsalgoritmen Proximal Policy Optimization användes för att träna agenterna. Träningen strukturerades som en läroplan med två lektioner, den första lektionen var utformad för att lära agenterna att nå målet, utan att kollidera med andra agenter eller röra sig utanför gränserna. Den andra lektionen var utformad för att lära agenterna att minimera vägkostnaden. Projektet uppnådde framgångsrikt sina mål, vilket kunde fastställas genom visuell inspektion och genom att jämföra den slutliga modellen med en basmodell. Basmodellen tränades endast för att nå målet och undvika kollisioner, utan att minimera vägen kostnaden. En jämförelse av modellerna visade att den slutliga modellen överträffade baslinjemodellen, och uppnådde en genomsnittlig $27,6\%$ lägre vägkostnad. deep reinforcement learning reinforcement learning machine learning path planning cost map ML-agents unity artificial neural networks collision avoidance PPO multi agent multi-agent multi-agent system förstärkningsinlärning djup förstärkningsinlärning fleragentssystem kostnadkarta kostnadskartor artificiella neurala nätverk maskininlärning proximal policy optimization PPO svärmintelligens
245	Scalable Reinforcement Learning for Formation Control with Collision Avoidance : Localized policy gradient algorithm with continuous state and action space / Skalbar Förstärkande Inlärning för Formationskontroll med Kollisionsundvikande : Lokaliserad policygradientalgoritm med kontinuerligt tillstånds och handlingsutrymme Matoses Gimenez, Andreu January 2023 (has links) In the last decades, significant theoretical advances have been made on the field of distributed mulit-agent control theory. One of the most common systems that can be modelled as multi-agent systems are the so called formation control problems, in which a network of mobile agents is controlled to move towards a desired final formation. These problems additionally pose practical challenges, namely limited access to information about the global state of the system, which justify the use distributed and localized approaches for solving the control problem. The problem is further complicated if partial or no information is known about the dynamic model of the system. A widely used fundamental challenge of this approach in this setting is that the state-action space size scales exponentially with the number of agents, rendering the problem intractable for a large networks. This thesis presents a scalable and localized reinforcement learning approach to a traditional multi-agent formation control problem, with collision avoidance. A scalable reinforcement learning advantage actor critic algorithm is presented, based on previous work in the literature. Sub-optimal bounds are calculated for the accumulated reward and policy gradient localized approximations. The algorithm is tested on a two dimensional setting, with a network of mobile agents following simple integrator dynamics and stochastic localized policies. Neural networks are used to approximate the continuous value functions and policies. The formation control with collisions avoidance formulation and the algorithm presented show good scalability properties, with a polynomial increase in the number of function approximations parameters with number of agents. The reduced number of parameters decreases learning time for bigger networks, although the efficiency of computation is decreased compared to state of the art machine learning implementations. The policies obtained achieve probably safe trajectories although the lack of dynamic model makes it impossible to guarantee safety. / Under de senaste decennierna har betydande framsteg gjorts inom området för distribuerad mulit-agent reglerteori. Ett av de vanligaste systemen som kan modelleras som multiagentsystem är de så kallade formationskontrollproblemen, där ett nätverk av mobila agenter styrs för att röra sig mot en önskad slutlig formation. om systemets globala tillstånd, vilket motiverar användningen av distribuerade och lokaliserade tillvägagångssätt för att lösa det reglertekniska problemet. Problemet kompliceras ytterligare om delvis eller ingen information är känd om systemets dynamiska modell. Ett allmänt använt tillvägagångssätt för modellfri kontroll är reinforcement learning (RL). En grundläggande utmaning med detta tillvägagångssätt i den här miljön är att storleken på state-action utrymmet skalas exponentiellt med antalet agenter, vilket gör problemet svårlöst för ett stort nätverk. Detta examensarbete presenterar en skalbar och lokaliserad reinforcement learning metod på ett traditionellt reglertekniskt problem med flera agenter, med kollisionsundvikande. En reinforcement learning advantage actor critic algoritm presenteras, baserad på tidigare arbete i litteraturen. Suboptimala gränser beräknas för den ackumulerade belönings- och policygradientens lokaliserade approximationer. Algoritmen testas i en tvådimensionell miljö, med ett nätverk av mobila agenter som följer enkel integratordynamik och stokastiska lokaliserade policyer. Neurala nätverk används för att approximera de kontinuerliga värdefunktionerna och policyerna. Den presenterade formationsstyrningen med kollisionsundvikande formulering och algoritmen visar goda skalbarhetsegenskaper, med en polynomisk ökning av antalet funktionsapproximationsparametrar med antalet agenter. Det minskade antalet parametrar minskar inlärningstiden för större nätverk, även om effektiviteten i beräkningen minskar jämfört med avancerade maskininlärningsimplementeringar. De erhållna policyerna uppnår troligen säkra banor även om avsaknaden av dynamisk modell gör det omöjligt att garantera säkerheten. / En las últimas décadas, se han realizado importantes avances teóricos en el campo de la teoría del control multiagente distribuido. Uno de los sistemas más comunes que se pueden modelar como sistemas multiagente son los llamados problemas de control de formación, en los que se controla una red de agentes móviles para alcanzar una formación final deseada. Estos problemas plantean desafíos prácticos como el acceso limitado a la información del estado global del sistema, que justifican el uso de algoritmos distribuidos y locales para resolver el problema de control. El problema se complica aún más si solo se conoce información parcial o nada sobre el modelo dinámico del sistema. Un enfoque ampliamente utilizado para el control sin conocimiento del modelo dinámico es el reinforcement learning (RL). Un desafío fundamental de este método en este entorno es que el tamaño de la acción y el estado aumenta exponencialmente con la cantidad de agentes, lo que hace que el problema sea intratable para una red grande. Esta tesis presenta un algoritmo de RL escalable y local para un problema tradicional de control de formación con múltiples agentes, con prevención de colisiones. Se presenta un algoritmo “advantage actor-”critic, basado en trabajos previos en la literatura. Los límites subóptimos se calculan para las aproximaciones locales de la función Q y gradiente de la política. El algoritmo se prueba en un entorno bidimensional, con una red de agentes móviles que siguen una dinámica de integrador simple y políticas estocásticas localizadas. Redes neuronales se utilizan para aproximar las funciones y políticas de valor continuo. La formulación de del problema de formación con prevención de colisiones y el algoritmo presentado muestran buenas propiedades de escalabilidad, con un aumento polinómico en el número de parámetros con el número de agentes. El número reducido de parámetros disminuye el tiempo de aprendizaje para redes más grandes, aunque la eficiencia de la computación disminuye en comparación con las implementaciones de ML de última generación. Las politicas obtenidas alcanzan trayectorias probablemente seguras, aunque la falta de un modelo dinámico hace imposible garantizar la completa prevención de colisiones. / A les darreres dècades, s'han realitzat importants avenços teòrics en el camp de la teoria del control multiagent distribuït. Un dels sistemes més comuns que es poden modelar com a sistemes multiagent són els anomenats problemes de control de formació, en els què es controla una xarxa d'agents mòbils per assolir una formació final desitjada. Aquests problemes plantegen reptes pràctics com l'accés limitat a la informació de l'estat global del sistema, que justifiquen l'ús d'algorismes distribuïts i locals per resoldre el problema de control. El problema es complica encara més si només es coneix informació parcial sobre el model dinàmic del sistema. Un mètode àmpliament utilitzat per al control sense coneixement del model dinàmic és el reinforcement learning (RL). Un repte fonamental d'aquest mètode en aquest entorn és que la mida de l'acció i l'estat augmenta exponencialment amb la quantitat d'agents, cosa que fa que el problema sigui intractable per a una xarxa gran. Aquesta tesi presenta un algorisme de RL escalable i local per a un problema tradicional de control de formació amb múltiples agents, amb prevenció de col·lisions. Es presenta un algorisme “advantage actor-”critic, basat en treballs previs a la literatura. Els límits subòptims es calculen per a les aproximacions locals de la funció Q i gradient de la política.’ Lalgoritme es prova en un entorn bidimensional, amb una xarxa ’dagents mòbils que segueixen una dinàmica ’dintegrador simple i polítiques estocàstiques localitzades. Xarxes neuronals s'utilitzen per aproximar les funcions i les polítiques de valor continu. La formulació del problema de formació amb prevenció de col·lisions i l'algorisme presentat mostren bones propietats d'escalabilitat, amb un augment polinòmic en el nombre de paràmetres amb el nombre d'agents. El nombre reduït de paràmetres disminueix el temps d'aprenentatge per a les xarxes més grans, encara que l'eficiència de la computació disminueix en comparació amb les implementacions de ML d'última generació. Les polítiques obtingudes aconsegueixen trajectòries probablement segures, tot i que la manca d'un model dinàmic fa impossible garantir la prevenció completa de col·lisions. Control theory Multi-agent systems Distributed systems Formation control Collision avoidance Reinforcement learning Teoria de control Sistemes multiagent Sistemes distribuïts Control de formació Prevenció de col·lisions Reinforcement Learning Reglerteknik Multi-agent system Distribuerade system formationskontroll Kollisionsundvikande Reinforcement learning Teoría de control Sistemas multiagente Sistemas distribuidos Control de formación Prevención de colisiones Reinforcement Learning Control Engineering Reglerteknik Elektroteknik och elektronik
246	Distributed Algorithms for Multi-robot Autonomy Zehui Lu (18953791) 02 July 2024 (has links) <p dir="ltr">Autonomous robots can perform dangerous and tedious tasks, eliminating the need for human involvement. To deploy an autonomous robot in the field, a typical planning and control hierarchy is used, consisting of a high-level planner, a mid-level motion planner, and a low-level tracking controller. In applications such as simultaneous localization and mapping, package delivery, logistics, and surveillance, a group of autonomous robots can be more efficient and resilient than a single robot. However, deploying a multi-robot team by directly aggregating each robot's planning hierarchy into a larger, centralized hierarchy faces challenges related to scalability, resilience, and real-time computation. Distributed algorithms offer a promising solution for introducing effective coordination within a network of robots, addressing these issues. This thesis explores the application of distributed algorithms in multi-robot systems, focusing on several essential components required to enable distributed multi-robot coordination, both in general terms and for specific applications.</p> Automation engineering Control engineering Field robotics Manufacturing robotics Electrical machines and drives Distributed systems and algorithms Dynamical systems in applications Theoretical and applied mechanics Optimisation Distributed Algorithms Multi Agent System Multi Robot System Distributed Optimization Mission Planning Co-design Algorithms Model Predictive Control Mobile Manipulator Motion Planning and Control Differentiable Dynamics Modeling Motor Co-design

Page generated in 0.1148 seconds