Global ETD Search

91	Access Point Selection and Clustering Methods with Minimal Switching for Green Cell-Free Massive MIMO Networks He, Qinglong January 2022 (has links) As a novel beyond fifth-generation (5G) concept, cell-free massive MIMO (multiple-input multiple-output) recently has become a promising physical-layer technology where an enormous number of distributed access points (APs), coordinated by a central processing unit (CPU), cooperate to coherently serve a large number of user equipments (UEs) in the same time/frequency resource. However, denser AP deployment in cell-free networks as well as an exponentially growing number of mobile UEs lead to higher power consumption. What is more, similar to conventional cellular networks, cell-free massive MIMO networks are dimensioned to provide the required quality of service (QoS) to the UEs under heavy traffic load conditions, and thus they might be underutilized during low traffic load periods, leading to inefficient use of both spectral and energy resources. Aiming at the implementation of energy-efficient cell-free networks, several approaches have been proposed in the literature, which consider different AP switch ON/OFF (ASO) strategies for power minimization. Different from prior works, this thesis focuses on additional factors other than ASO that have an adverse effect not only on total power consumption but also on implementation complexity and operation cost. For instance, too frequent ON/OFF switching in an AP can lead to tapering off the potential power saving of ASO by incurring extra power consumption due to excessive switching. Indeed, frequent switching of APs might also result in thermal fatigue and serious lifetime degeneration. Moreover, time variations in the AP-UE association in favor of energy saving in a dynamic network bring additional signaling and implementation complexity. Thus, in the first part of the thesis, we propose a multi-objective optimization problem that aims to minimize the total power consumption together with AP switching and AP-UE association variations in comparison to the state of the network in the previous state. The proposed problem is cast in mixed integer quadratic programming form and solved optimally. Our simulation results show that by limiting AP switching (node switching) and AP-UE association reformation switching (link switching), the total power consumption from APs only slightly increases but the number of average switching drops significantly regardless of node switching or link switching. It achieves a good balance on the trade-off between radio power consumption and the side effects excessive switching will bring. In the second part of the thesis, we consider a larger cell-free massive MIMO network by dividing the total area into disjoint network-centric clusters, where the APs in each cluster are connected to a separate CPU. In each cluster, cell-free joint transmission is locally implemented to achieve a scalable network implementation. Motivated by the outcomes of the first part, we reshape our dynamic network simulator to keep the active APs for a given spatial traffic pattern the same as long as the mean arrival rates of the UEs are constant. Moreover, the initially formed AP-UE association for a particular UE is not allowed to change. In that way, we make the number of node and link switching zero throughout the considered time interval. For this dynamic network, we propose a deep reinforcement learning (DRL) framework that learns the policy of maximizing long-term energy efficiency (EE) for a given spatially-varying traffic density. The active AP density of each network-centric cluster and the boundaries of the clusters are learned by the trained agent to maximize the EE. The DRL algorithm is shown to learn a non-trivial joint cluster geometry and AP density with at least 7% improvement in terms of EE compared to the heuristically-developed benchmarks. / Som ett nytt koncept bortom den femte generationen (5G) har cellfri massiv MIMO (multiple input multiple output) nyligen blivit en lovande teknik för det fysiska lagret där ett enormt antal distribuerade åtkomstpunkter (AP), som samordnas av en central processorenhet (CPU), samarbetar för att på ett sammanhängande sätt betjäna ett stort antal användarutrustningar (UE) i samma tids- och frekvensresurs. En tätare utplacering av AP:er i cellfria nät samt ett exponentiellt växande antal mobila användare leder dock till högre energiförbrukning. Dessutom är cellfria massiva MIMO-nät, i likhet med konventionella cellulära nät, dimensionerade för att ge den erforderliga tjänstekvaliteten (QoS) till enheterna under förhållanden med hög trafikbelastning, och därför kan de vara underutnyttjade under perioder med låg trafikbelastning, vilket leder till ineffektiv användning av både spektral- och energiresurser. För att genomföra energieffektiva cellfria nät har flera metoder föreslagits i litteraturen, där olika ASO-strategier (AP switch ON/OFF) beaktas för att minimera energiförbrukningen. Till skillnad från tidigare arbeten fokuserar den här avhandlingen på andra faktorer än ASO som har en negativ effekt inte bara på den totala energiförbrukningen utan också på komplexiteten i genomförandet och driftskostnaden. Till exempel kan alltför frekventa ON/OFF-omkopplingar i en AP leda till att ASO:s potentiella energibesparingar avtar genom extra energiförbrukning på grund av överdriven omkoppling. Frekventa omkopplingar av AP:er kan också leda till termisk trötthet och allvarlig försämring av livslängden. Dessutom medför tidsvariationer i AP-UE-associationen till förmån för energibesparingar i ett dynamiskt nät ytterligare signalering och komplexitet i genomförandet. I den första delen av avhandlingen föreslår vi därför ett optimeringsproblem med flera mål som syftar till att minimera den totala energiförbrukningen tillsammans med växling av AP och variationer i AP-UE-associationen i jämförelse med nätets tillstånd i det föregående läget. Det föreslagna problemet är en blandad helhetsmässig kvadratisk programmering och löses optimalt. Våra simuleringsresultat visar att genom att begränsa växling av AP (node switching) och växling av AP-UE-association (link switching) ökar den totala energiförbrukningen från AP:erna endast något, men antalet genomsnittliga växlingar ökar, oavsett om det rör sig om node switching eller link switching. Det ger en bra balans mellan radiokraftförbrukning och de bieffekter som överdriven växling medför. I den andra delen av avhandlingen tar vi hänsyn till ett större cellfritt massivt MIMO-nätverk genom att dela upp det totala området i disjunkta nätverkscentrerade kluster, där AP:erna i varje kluster är anslutna till en separat CPU. I varje kluster genomförs cellfri gemensam överföring lokalt för att uppnå en skalbar nätverksimplementering. Motiverat av resultaten i den första delen omformar vi vår dynamiska nätverkssimulator så att de aktiva AP:erna för ett givet rumsligt trafikmönster är desamma så länge som den genomsnittliga ankomsthastigheten för de enskilda enheterna är konstant. Dessutom tillåts inte den ursprungligen bildade AP-UE-associationen för en viss användare att förändras. På så sätt gör vi antalet nod- och länkbyten till noll under hela det aktuella tidsintervallet. För detta dynamiska nätverk föreslår vi ett ramverk för djup förstärkningsinlärning (DRL) som lär sig en strategi för att maximera energieffektiviteten på lång sikt för en given rumsligt varierande trafiktäthet. Den aktiva AP-tätheten i varje nätverkscentrerat kluster och klustrens gränser lärs av den utbildade agenten för att maximera EE. Det visas att DRL-algoritmen lär sig en icke-trivial gemensam klustergeometri och AP-täthet med minst 7% förbättring av EE jämfört med de heuristiskt utvecklade riktmärkena. Cell-free massive MIMO multi-objective optimization deep reinforcement learning AP switch ON/OFF energy efficiency Cellfri massiv MIMO multiobjektiv optimering djup förstärkningsinlärning AP switch ON/OFF energieffektivitet Computer and Information Sciences Data- och informationsvetenskap
92	[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOP LEONARDO CARDIA DA CRUZ 10 November 2022 (has links) [pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria dos trabalhos em visão computacional concentra-se em propor e aplicar novos modelos e algoritmos de aprendizado de máquina. Para tarefas de aprendizado supervisionado, o desempenho dessas técnicas depende de uma grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente área de exploração são as reduções dos esforços na preparação de dados, deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico. Nossa abordagem consiste na criação de uma metodologia para treinamento de um agente virtual a fim de rotular automaticamente os dados, a partir do auxílio humano como professor desse agente. Implementamos o algoritmo Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada e comparada com outros métodos de aprendizado por reforço e interação humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento de uma metodologia de ensino para criação de uma abordagem autônoma com aconselhamento humano para criar anotações econômicas a partir de anotações escassas. / [en] Deep learning techniques have shown significant contributions in various fields, including image analysis. The vast majority of work in computer vision focuses on proposing and applying new machine learning models and algorithms. For supervised learning tasks, the performance of these techniques depends on a large amount of training data and labeled data. However, labeling is an expensive and time-consuming process. A recent area of exploration is the reduction of efforts in data preparation, leaving it without inconsistencies and noise so that current models can obtain greater performance. This new field of study is called Data-Centric AI. We present a new approach based on Deep Reinforcement Learning (DRL), whose work is focused on preparing a dataset, in object detection problems where the bounding box annotations are done autonomously and economically. Our approach consists of creating a methodology for training a virtual agent in order to automatically label the data, using human assistance as a teacher of this agent. We implemented the Deep Q-Network algorithm to create the virtual agent and developed a counseling approach to facilitate the communication of the human teacher with the virtual agent student. We used the active learning method to select cases where the agent has more significant uncertainty, requiring human intervention in the annotation process during training to complete our implementation. Our approach was evaluated and compared with other reinforcement learning methods and human-computer interaction in different datasets, where the virtual agent had to create new annotations in the form of bounding boxes. The results show that the use of our methodology has a positive impact on obtaining new annotations from a dataset with scarce labels, surpassing existing methods. In this way, we present the contribution in the field of Data-Centric AI, with the development of a teaching methodology to create an autonomous approach with human advice to create economic annotations from scarce annotations. [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] ANOTACOES [pt] AGENTE VIRTUAL [pt] DEEP Q-NETWORK [pt] ACONSELHAMENTO [pt] CONJUNTO DE DADOS [pt] CAIXA DELIMITADORA [en] DEEP REINFORCEMENT LEARNING [en] ANNOTATIONS [en] VIRTUAL AGENT [en] DEEP Q-NETWORK [en] ADVICES [en] DATASET [en] BOUNDING BOX DATASETS
93	Real-time adaptation of robotic knees using reinforcement control Daníel Sigurðarson, Leifur January 2023 (has links) Microprocessor-controlled knees (MPK’s) allow amputees to walk with increasing ease and safety as technology progresses. As an amputee is fitted with a new MPK, the knee’s internal parameters are tuned to the user’s preferred settings in a controlled environment. These parameters determine various gait control settings, such as flexion target angle or swing extension resistance. Though these parameters may work well during the initial fitting, the MPK experiences various internal & external environmental changes throughout its life-cycle, such as product wear, changes in the amputee’s muscle strength, temperature changes, etc. This work investigates the feasibility of using a reinforcement learning (RL) control to adapt the MPK’s swing resistance to consistently induce the amputee’s preferred swing performance in realtime. Three gait features were identified as swing performance indicators for the RL algorithm. Results show that the RL control is able to learn and improve its tuning performance in terms of Mean Absolute Error over two 40-45 minute training sessions with a human-in-the-loop. Additionally, results show promise in using transfer learning to reduce strenuous RL training times. / Mikroprocessorkontrollerade knän (MPK) gör att amputerade kan utföra fysiska aktiviteter med ökad lätthet och säkerhet allt eftersom tekniken fortskrider. När en ny MPK monteras på en amputerad person, anpassas knäts interna parametrar till användarens i ett kontrollerad miljö. Dessa parametrar styr olika gångkontrollinställningar, såsom flexionsmålvinkel eller svängförlängningsmotstånd. Även om parametrarna kan fungera bra under den initiala anpassningen, upplever den MPK olika interna och yttre miljöförändringar under sin hela livscykel, till exempel produktslitage, förändringar i den amputerades muskelstyrka, temperaturförändringar, etc. Detta arbete undersöker möjligheten av, med hjälp av en förstärkningsinlärningskontroll (RL), att anpassa MPK svängmotstånd för att konsekvent inducera den amputerades föredragna svängprestanda i realtid. Tre gångegenskaper identifierades som svingprestandaindikatorer för RL-algoritmen. Resultaten visar att RL-kontrollen kan lära sig och förbättra sin inställningsprestanda i termer av Mean Absolute Error under två 40-45 minuters träningspass med en människa-i-loopen. Dessutom är resultaten lovande när det gäller att använda överföringsinlärning för att minska ansträngande RL-träningstider. Machine learning deep reinforcement learning transfer learning medical device prosthetic prosthesis controls human-in-the-loop Maskininlärning djup förstärkningsinlärning överföringsinlärning medicinsk utrustning protes kontroller människa-i-loopen Computer and Information Sciences Data- och informationsvetenskap
94	Data Harvesting and Path Planning in UAV-aided Internet-of-Things Wireless Networks with Reinforcement Learning : KTH Thesis Report / Datainsamling och vägplanering i UAV-stödda Internet-of-Things trådlösa nätverk med förstärkningsinlärning : KTH Examensrapport Zhang, Yuming January 2023 (has links) In recent years, Unmanned aerial vehicles (UAVs) have developed rapidly due to advances in aerospace technology, and wireless communication systems. As a result of their versatility, cost-effectiveness, and flexibility of deployment, UAVs have been developed to accomplish a variety of large and complex tasks without terrain restrictions, such as battlefield operations, search and rescue under disaster conditions, monitoring, etc. Data collection and offloading missions in The internet of thingss (IoTs) networks can be accomplished with the use of UAVs as network edge nodes. The fundamental challenge in such scenarios is to develop a UAV movement policy that enhances the quality of mission completion and avoids collisions. Real-time learning based on neural networks has been proven to be an effective method for solving decision-making problems in a dynamic, unknown environment. In this thesis, we assume a real-life scenario in which a UAV collects data from Ground base stations (GBSs) without knowing the information of the environment. A UAV is responsible for the MOO including collecting data, avoiding obstacles, path planning, and conserving energy. Two Deep reinforcement learnings (DRLs) approaches were implemented in this thesis and compared. / Under de senaste åren har UAV utvecklats snabbt på grund av framsteg inom flygteknik och trådlösa kommunikationssystem. Som ett resultat av deras mångsidighet, kostnadseffektivitet och flexibilitet i utbyggnaden har UAV:er utvecklats för att utföra en mängd stora och komplexa uppgifter utan terrängrestriktioner, såsom slagfältsoperationer, sök och räddning under katastrofförhållanden, övervakning, etc. Data insamlings- och avlastningsuppdrag i IoT-nätverk kan utföras med användning av UAV:er som nätverkskantnoder. Den grundläggande utmaningen i sådana scenarier är att utveckla en UAV-rörelsepolicy som förbättrar kvaliteten på uppdragets slutförande och undviker kollisioner. Realtidsinlärning baserad på neurala nätverk har visat sig vara en effektiv metod för att lösa beslutsfattande problem i en dynamisk, okänd miljö. I den här avhandlingen utgår vi från ett verkligt scenario där en UAV samlar in data från GBS utan att känna till informationen om miljön. En UAV är ansvarig för MOO inklusive insamling av data, undvikande av hinder, vägplanering och energibesparing. Två DRL-metoder implementerades i denna avhandling och jämfördes. Unmanned aerial vehicle the Internet of Things data harvesting obstacle avoidance path planning deep reinforcement learning Obemannat luftfordon sakernas internet datainsamling undvikande av hinder vägplanering djup förstärkningsinlärning Computer and Information Sciences Data- och informationsvetenskap
95	Stabilizing Q-Learning for continuous control Hui, David Yu-Tung 12 1900 (has links) L'apprentissage profond par renforcement a produit des décideurs qui jouent aux échecs, au Go, au Shogi, à Atari et à Starcraft avec une capacité surhumaine. Cependant, ces algorithmes ont du mal à naviguer et à contrôler des environnements physiques, contrairement aux animaux et aux humains. Manipuler le monde physique nécessite la maîtrise de domaines d'actions continues tels que la position, la vitesse et l'accélération, contrairement aux domaines d'actions discretes dans des jeux de société et de vidéo. L'entraînement de réseaux neuronaux profonds pour le contrôle continu est instable: les agents ont du mal à apprendre et à conserver de bonnes habitudes, le succès est à haute variance sur hyperparamètres, graines aléatoires, même pour la même tâche, et les algorithmes ont du mal à bien se comporter en dehors des domaines dans lesquels ils ont été développés. Cette thèse examine et améliore l'utilisation de réseaux de neurones profonds dans l'apprentissage par renforcement. Le chapitre 1 explique comment le principe d'entropie maximale produit des fonctions d'objectifs pour l'apprentissage supervisé et non supervisé et déduit, à partir de la dynamique d'apprentissage des réseaux neuronaux profonds, certains termes régulisants pour stabiliser les réseaux neuronaux profonds. Le chapitre 2 fournit une justification de l'entropie maximale pour la forme des algorithmes acteur-critique et trouve une configuration d'un algorithme acteur-critique qui s'entraîne le plus stablement. Enfin, le chapitre 3 examine la dynamique d'apprentissage de l'apprentissage par renforcement profond afin de proposer deux améliorations aux réseaux cibles et jumeaux qui améliorent la stabilité et la convergence. Des expériences sont réalisées dans les simulateurs de physique idéale DeepMind Control, MuJoCo et Box2D. / Deep Reinforcement Learning has produced decision makers that play Chess, Go, Shogi, Atari, and Starcraft with superhuman ability. However, unlike animals and humans, these algorithms struggle to navigate and control physical environments. Manipulating the physical world requires controlling continuous action spaces such as position, velocity, and acceleration, unlike the discrete action spaces of board and video games. Training deep neural networks for continuous control is unstable: agents struggle to learn and retain good behaviors, performance is high variance across hyperparameters, random seed, and even multiple runs of the same task, and algorithms struggle to perform well outside the domains they have been developed in. This thesis finds principles behind the success of deep neural networks in other learning paradigms and examines their impact on reinforcement learning for continuous control. Chapter 1 explains how the maximum-entropy principle produces supervised and unsupervised learning loss functions and derives some regularizers used to stabilize deep networks from the training dynamics of deep learning. Chapter 2 provides a maximum-entropy justification for the form of actor-critic algorithms and finds a configuration of an actor-critic algorithm that trains most stably. Finally, Chapter 3 considers the training dynamics of deep reinforcement learning to propose two improvements to target and twin networks that improve stability and convergence. Experiments are performed within the DeepMind Control, MuJoCo, and Box2D ideal-physics simulators. Computer Science Aritifical Intelligence Deep Learning Reinforcement Learning Deep Reinforcement Learning Control Continuous Control Q-Learning MuJoCo Informatique Intelligence Artificielle Apprentissage Profond Apprentissage par Reinforcement Apprentissage par Reinforcement Profond Contrôle Contrôle Continu
96	Reinforcement learning applied to the real world : uncertainty, sample efficiency, and multi-agent coordination Mai, Vincent 12 1900 (has links) L'immense potentiel des approches d'apprentissage par renforcement profond (ARP) pour la conception d'agents autonomes a été démontré à plusieurs reprises au cours de la dernière décennie. Son application à des agents physiques, tels que des robots ou des réseaux électriques automatisés, est cependant confrontée à plusieurs défis. Parmi eux, l'inefficacité de leur échantillonnage, combinée au coût et au risque d'acquérir de l'expérience dans le monde réel, peut décourager tout projet d'entraînement d'agents incarnés. Dans cette thèse, je me concentre sur l'application de l'ARP sur des agents physiques. Je propose d'abord un cadre probabiliste pour améliorer l'efficacité de l'échantillonnage dans l'ARP. Dans un premier article, je présente la pondération BIV (batch inverse-variance), une fonction de perte tenant compte de la variance du bruit des étiquettes dans la régression bruitée hétéroscédastique. La pondération BIV est un élément clé du deuxième article, où elle est combinée avec des méthodes de pointe de prédiction de l'incertitude pour les réseaux neuronaux profonds dans un pipeline bayésien pour les algorithmes d'ARP avec différences temporelles. Cette approche, nommée apprentissage par renforcement à variance inverse (IV-RL), conduit à un entraînement nettement plus rapide ainsi qu'à de meilleures performances dans les tâches de contrôle. Dans le troisième article, l'apprentissage par renforcement multi-agent (MARL) est appliqué au problème de la réponse rapide à la demande, une approche prometteuse pour gérer l'introduction de sources d'énergie renouvelables intermittentes dans les réseaux électriques. En contrôlant la coordination de plusieurs climatiseurs, les agents MARL obtiennent des performances nettement supérieures à celles des approches basées sur des règles. Ces résultats soulignent le rôle potentiel que les agents physiques entraînés par MARL pourraient jouer dans la transition énergétique et la lutte contre le réchauffement climatique. / The immense potential of deep reinforcement learning (DRL) approaches to build autonomous agents has been proven repeatedly in the last decade. Its application to embodied agents, such as robots or automated power systems, is however facing several challenges. Among them, their sample inefficiency, combined to the cost and the risk of gathering experience in the real world, can deter any idea of training embodied agents. In this thesis, I focus on the application of DRL on embodied agents. I first propose a probabilistic framework to improve sample efficiency in DRL. In the first article, I present batch inverse-variance (BIV) weighting, a loss function accounting for label noise variance in heteroscedastic noisy regression. BIV is a key element of the second article, where it is combined with state-of-the-art uncertainty prediction methods for deep neural networks in a Bayesian pipeline for temporal differences DRL algorithms. This approach, named inverse-variance reinforcement learning (IV-RL), leads to significantly faster training as well as better performance in control tasks. In the third article, multi-agent reinforcement learning (MARL) is applied to the problem of fast-timescale demand response, a promising approach to the manage the introduction of intermittent renewable energy sources in power-grids. As MARL agents control the coordination of multiple air conditioners, they achieve significantly better performance than rule-based approaches. These results underline to the potential role that DRL trained embodied agents could take in the energetic transition and the fight against global warming. Uncertainty estimation Estimation d'incertitude Multi agent reinforcement learning Apprentissage par renforcement profond Deep reinforcement learning Heteroscedastic regression Régression hétéroscédastique Demand response Régulation de fréquence Réseau électrique Power grid
97	[en] A FRAMEWORK FOR AUTOMATED VISUAL INSPECTION OF UNDERWATER PIPELINES / [pt] UM FRAMEWORK PARA INSPEÇÃO VISUAL AUTOMATIZADA DE DUTOS SUBAQUÁTICOS EVELYN CONCEICAO SANTOS BATISTA 30 January 2024 (has links) [pt] Em ambientes aquáticos, o uso tradicional de mergulhadores ou veiculos subaquáticos tripulados foi substituído por veículos subaquáticos não tripulados (como ROVs ou AUVs). Com vantagens em termos de redução de riscos de segurança, como exposição à pressão, temperatura ou falta de ar. Além disso, conseguem acessar áreas de extrema profundidade que até então não eram possiveis para o ser humano. Esses veiculos não tripulados são amplamente utilizados para inspeções como as necessárias para o descomissionamento de plataformas de petróleo Neste tipo de fiscalização é necessário analisar as condições do solo, da tu- bulação e, principalmente, se foi criado um ecossistema próximo à tubulação. Grande parte dos trabalhos realizados para a automação desses veículos utilizam diferentes tipos de sensores e GPS para realizar a percepção do ambiente. Devido à complexidade do ambiente de navegação, diferentes algoritmos de controle e automação têm sido testados nesta área, O interesse deste trabalho é fazer com que o autômato tome decisões através da análise de eventos visuais. Este método de pesquisa traz a vantagem de redução de custos para o projeto, visto que as câmeras possuem um preço inferior em relação aos sensores ou dispositivos GPS. A tarefa de inspeção autônoma tem vários desafios: detectar os eventos, processar as imagens e tomar a decisão de alterar a rota em tempo real. É uma tarefa altamente complexa e precisa de vários algoritmos trabalhando juntos para ter um bom desempenho. A inteligência artificial apresenta diversos algoritmos para automatizar, como os baseados em aprendizagem por reforço entre outros na área de detecção e classificação de imagens Esta tese de doutorado consiste em um estudo para criação de um sistema avançado de inspeção autônoma. Este sistema é capaz de realizar inspeções apenas analisando imagens da câmera AUV, usando aprendizagem de reforço profundo profundo para otimizar o planejamento do ponto de vista e técnicas de detecção de novidades. Contudo, este quadro pode ser adaptado a muitas outras tarefas de inspecção. Neste estudo foram utilizados ambientes realistas complexos, nos quais o agente tem o desafio de chegar da melhor forma possível ao objeto de interesse para que possa classificar o objeto. Vale ressaltar, entretanto, que os ambientes de simulação utilizados neste contexto apresentam certo grau de simplicidade carecendo de recursos como correntes marítimas on dinâmica de colisão em seus cenários simulados Ao final deste projeto, o Visual Inspection of Pipelines (VIP) framework foi desenvolvido e testado, apresentando excelentes resultados e ilustrando a viabilidade de redução do tempo de inspeção através da otimização do planejamento do ponto de vista. Esse tipo de abordagem, além de agregar conhecimento ao robô autônomo, faz com que as inspeções subaquáticas exijam pouca presença de ser humano (human-in-the-loop), justificando o uso das técnicas empregadas. / [en] In aquatic environments, the traditional use of divers or manned underwater vehicles has been replaced by unmanned underwater vehicles (such as ROVs or AUVs). With advantages in terms of reducing safety risks, such as exposure to pressure, temperature or shortness of breath. In addition, they are able to access areas of extreme depth that were not possible for humans until then. These unmanned vehicles are widely used for inspections, such as those required for the decommissioning of oil platforms. In this type of inspection, it is necessary to analyze the conditions of the soil, the pipeline and, especially, if an ecosystem was created close to the pipeline. Most of the works carried out for the automation of these vehicles use different types of sensors and GPS to perform the perception of the environment. Due to the complexity of the navigation environment, different control and automation algorithms have been tested in this area. The interest of this work is to make the automaton take decisions through the analysis of visual events. This research method provides the advantage of cost reduction for the project, given that cameras have a lower price compared to sensors or GPS devices. The autonomous inspection task has several challenges: detecting the events, processing the images and making the decision to change the route in real time. It is a highly complex task and needs multiple algorithms working together to perform well. Artificial intelligence presents many algorithms to automate, such as those based on reinforcement learning, among others in the area of image detection and classification. This doctoral thesis consists of a study to create an advanced autonomous inspection system. This system is capable of performing inspections only by analyzing images from the AUV camera, using deep reinforcement learning, and novelty detection techniques. However, this framework can be adapted to many other inspection tasks. In this study, complex realistic environments were used, in which the agent has the challenge of reaching the object of interest in the best possible way so that it can classify the object. It is noteworthy, however, that the simulation environments utilized in this context exhibit a certain degree of simplicity, lacking features like marine currents or collision dynamics in their simulated scenarios. At the conclusion of this project, a Visual Inspection of Pipelines (VIP) framework was developed and tested, showcasing excellent results and illustrating the feasibility of reducing inspection time through the optimization of viewpoint planning. This type of approach, in addition to adding knowledge to the autonomous robot, means that underwater inspections require little pres- ence of a human being (human-in-the-loop), justifying the use of the techniques employed. [pt] CLASSIFICACAO [pt] PLANEJAMENTO DE PONTO DE VISTA [pt] DETECCAO DE ANOMALIA [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] ROV [pt] ROBO AUTONOMO [pt] AUV [pt] FRAMEWORK [en] CLASSIFICATION [en] VIEWPOINT PLANNING [en] ANOMALY DETECTION [en] DEEP REINFORCEMENT LEARNING [en] ROV [en] AUTONOMOUS ROBOT [en] AUV [en] FRAMEWORK
98	Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies Drotz, Axel, Hector, Markus January 2021 (has links) In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Deep Reinforcement Learning Adversarial Attacks Adversarial Attack Detection Fast Gradient Sign Method Deep Deterministic Policy Gradient Deep Q-Learning Likelihood Ratio Test CUSUM Elektroteknik och elektronik
99	Towards Novelty-Resilient AI: Learning in the Open World Trevor A Bonjour (18423153) 22 April 2024 (has links) <p dir="ltr">Current artificial intelligence (AI) systems are proficient at tasks in a closed-world setting where the rules are often rigid. However, in real-world applications, the environment is usually open and dynamic. In this work, we investigate the effects of such dynamic environments on AI systems and develop ways to mitigate those effects. Central to our exploration is the concept of \textit{novelties}. Novelties encompass structural changes, unanticipated events, and environmental shifts that can confound traditional AI systems. We categorize novelties based on their representation, anticipation, and impact on agents, laying the groundwork for systematic detection and adaptation strategies. We explore novelties in the context of stochastic games. Decision-making in stochastic games exercises many aspects of the same reasoning capabilities needed by AI agents acting in the real world. A multi-agent stochastic game allows for infinitely many ways to introduce novelty. We propose an extension of the deep reinforcement learning (DRL) paradigm to develop agents that can detect and adapt to novelties in these environments. To address the sample efficiency challenge in DRL, we introduce a hybrid approach that combines fixed-policy methods with traditional DRL techniques, offering enhanced performance in complex decision-making tasks. We present a novel method for detecting anticipated novelties in multi-agent games, leveraging information theory to discern patterns indicative of collusion among players. Finally, we introduce DABLER, a pioneering deep reinforcement learning architecture that dynamically adapts to changing environmental conditions through broad learning approaches and environment recognition. Our findings underscore the importance of developing AI systems equipped to navigate the uncertainties of the open world, offering promising pathways for advancing AI research and application in real-world settings.</p> Autonomous agents and multiagent systems Knowledge representation and reasoning Modelling and simulation Planning and decision making Context learning Deep learning Neural networks Reinforcement learning Novelty Context MDP CMDP Reinforcement Learning Deep Reinforcement Learning Collusion Novelty Detection Detection Adaptation
100	Optimizing the Fronthaul in C-RAN by Deep Reinforcement Learning : Latency Constrained Fronthaul optimization with Deep Reinforment Learning / Optimering av Fronthaul i C-RAN med Djup Förstärknings Inlärning : Latens begränsad Fronthaul Optimering med Djup Förstärknings Inlärning Grönland, Axel January 2023 (has links) Centralized Radio Access Networks or C-RAN for short is a type of network that aims to centralize perform some of it's computation at centralized locations. Since a lot of functionality is centralized we can show from multiplexing that the centralization leads to lower operating costs. The drawback with C-RAN are the huge bandwidth requirements over the fronthaul. We know that scenarios where all cells experience high load is a very low probability scenario. Since functions are centralized this also allows more adaptability, we can choose to change the communication standard for each cell depending on the load scenario. In this thesis we set out to create such a controller with the use of Deep Reinforcement Learning. The problem overall is difficult due to the complexity of modelling the problem, but also since C-RAN is a relatively new concept in the telecom world. We solved this problem with two traditional reinforcement learning algorithms, DQN and SAC. We define a constraint optimization problem and phrase it in such a way that the problem can be solved with a deep reinforcement learning algorithm. We found that the learning worked pretty well and we can show that our trained policies satisfy the constraint. With these results one could show that resource allocations problems can be solved pretty well by a deep reinforcement learning controller. / Centralized Radio Access Networks eller C-RAN som förkortning är en kommunications nätverk som siktar på att centralisera vissa funktioner i centrala platser. Eftersom mmånga funktioner är centraliserade så kan vi visa från statistisk multiplexing att hög trafik scenarion över många celler är av låg sannolikhet vilket leder till lägre service kostnader. Nackdelen med C-RAN är den höga bandbredds kravet över fronthaulen. Trafik scenarion där alla celler utsäts för hög last är väldigt låg sannolikhet så kan vi dimensionera fronthaulen för att klara mindre än det värsta trafik scenariot. Eftersom funktioner är centralizerade så tillåter det även att vi kan adaptivt anpassa resurser för trafiken. I denna uppsats så kommer vi att skapa en sådan kontroller med djup reinforcement learning. Problemet är komplext att modellera och C-RAN är ett relativt nytt concept i telecom världen. Vi löser detta problem med två traditionella algoritmer, deep Q networks(DQN) och soft actor critic(SAC). Vi definierar ett vilkorligt optimerings problem och visar hur det kan formuleras som ett inlärnings problem. Vi visar att denna metod funkar rätt bra som en lösning till problemet och att den uppfyller bivilkoren. Våra resultat visar att resurs allokerings problem kan lösas nära optimalitet med reinforcement learning. Machine learning Deep Reinforcement Learning C-RAN Fronthaul Performance Evaluation Maskininlärning Djup Förstärkningslärning C-RAN Fronthaul Prestanda Utvärdering Computer Sciences Datavetenskap (datalogi) Information Systems Telecommunications Telekommunikation

Search results