• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Multi-Agent Reinforcement Learning: Analysis and Application

Paulo Cesar Heredia (12428121) 20 April 2022 (has links)
<p>With the increasing availability of data and the rise of networked systems such as autonomous vehicles, drones, and smart girds, the application of data-driven, machine learning methods with multi-agents systems have become an important topic. In particular, reinforcement learning has gained a lot of popularity due to its similarities with optimal control, with the potential of allowing us to develop optimal control systems using only observed data and without the need for a model of a system's state dynamics. In this thesis work, we explore the application of reinforcement learning with multi-agents systems, which is known as multi-agent reinforcement learning (MARL). We have developed algorithms that address some challenges in the cooperative setting of MARL. We have also done work on better understanding the convergence guarantees of some known multi-agent reinforcement learning algorithms, which combine reinforcement learning with distributed consensus methods. And, with the aim of making MARL better suited to real-world problems, we have also developed algorithms to address some practical challenges with MARL and we have applied MARL on a real-world problem.</p> <p>In the first part of this thesis, we focus on developing algorithms to address some open problems in MARL. One of these challenges is learning with output feedback, which is known as partial observability in the reinforcement learning literature. One of the main assumptions of reinforcement learning in the singles agent case is that the agent can fully observe the state of the plant it is controlling (we note the “plant" is often referred to as the “environment" in the reinforcement learning literature. We will use these terms interchangeably). In the single agent case this assumption can be reasonable since it only requires one agent to fully observe its environment. In the multi-agent setting, however, this assumption would require all agents to fully observe the state and furthermore since each agent could affect the plant (or environment) with its actions, the assumption would also require that agent's know the actions of other agents. We have also developed algorithms to address practical issues that may arise when applying reinforcement learning (RL) or MARL on large-scale real-world systems. One such algorithm is a distributed reinforcement learning algorithm that allows us to learn in cases where the states and actions are both continuous and of large dimensionality, which is the case for many real-world applications. Without the ability to handle continuous states and actions, many algorithms require discretization, which with high dimensional systems can become impractical. We have also developed a distributed reinforcement learning algorithm that addresses data scalability of RL. By data scalability we mean how to learn from a very large dataset that cannot be efficiently processed by a single agent with limited resources.</p> <p>In the second part of this thesis, we provide a finite-sample analysis of some distributed reinforcement learning algorithms. By finite-sample analysis, we mean we provide an upper bound on the squared error of the algorithm for a given iteration of the algorithm. Or equivalently, since each iteration uses one data sample, we provide an upper bound of the squared error for a given number of data samples used. This type of analysis had been missing in the MARL literature, where most works on MARL have only provided asymptotic results for their proposed algorithms, which only tells us how the algorithmic error behaves as the number of samples used goes to infinity. </p> <p>The third part of this thesis focuses on applications with real-world systems. We have explored a real-world problem, namely transactive energy systems (TES), which can be represented as a multi-agent system. We have applied various reinforcement learning algorithms with the aim of learning an optimal control policy for this system. Through simulations, we have compared the performance of these algorithms and have illustrated the effect of partial observability (output feedback) when compared to full state feedback.</p> <p>In the last part we present some other work, specifically we present a distributed observer that aims to address learning with output feedback by estimating the state. The proposed algorithm is designed so that we do not require a complete model of state dynamics, and instead we use a parameterized model where the parameters are estimated along with the state.</p>
2

PERFORMANCE ASSURANCE FOR CLOUD-NATIVE APPLICATIONS

Zabad, Bassam January 2021 (has links)
Preserving the performance of cloud services according to service level agreements (SLAs) is one of the most important challenges in cloud infrastructure. Since the workload is always changing incrementally or decremental, managing the cloud resources efficiently is considered an important challenge to satisfy non-functional requirements like high availability and cost. Although many common approaches like predictive autoscaling could solve this problem, it is still not so efficient because of its constraints like requiring a workload pattern as training data. Reinforcement machine learning (RL) can be considered a significant solution for this problem. Even though reinforcement learning needs some time to be stable and needs many trials to decide the value of factors like discount rate, this approach can adapt with the dynamic workload. In this  thesis, through a controlled experiment research method, we show how a model-free reinforcement algorithm like Q-learning can adapt to the dynamic workload by applying horizontal autoscaling to keep the performance of cloud services at the required level. Furthermore, the Amazon web services (AWS) platform is used to demonstrate the efficiency of the Q-learning algorithm in dealing with dynamic workload and achieving high availability.
3

Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning

Zachery Peter Berg (12455928) 25 April 2022 (has links)
<p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p>
4

Training an Adversarial Non-Player Character with an AI Demonstrator : Applying Unity ML-Agents

Jlali, Yousra Ramdhana January 2022 (has links)
Background. Game developers are continuously searching for new ways of populating their vast game worlds with competent and engaging Non-Player Characters (NPCs), and researchers believe Deep Reinforcement Learning (DRL) might be the solution for emergent behavior. Consequently, fusing NPCs with DRL practices has surged in recent years, however, proposed solutions rarely outperform traditional script-based NPCs. Objectives. This thesis explores a novel method of developing an adversarial DRL NPC by combining Reinforcement Learning (RL) algorithms. Our goal is to produce an agent that surpasses its script-based opponents by first mimicking their actions. Methods. The experiment commences with Imitation Learning (IL) before proceeding with supplementary DRL training where the agent is expected to improve its strategies. Lastly, we make all agents participate in 100-deathmatch tournaments to statistically evaluate and differentiate their deathmatch performances. Results. Statistical tests reveal that the agents reliably differ from one another and that our learning agent performed poorly in comparison to its script-based opponents. Conclusions. Based on our computed statistics, we can conclude that our solution was unsuccessful in developing a talented hostile DRL agent as it was unable to convey any form of proficiency in deathmatches. No further improvements could be applied to our ML agent due to the time constraints. However, we believe our outcome can be used as a stepping-stone for future experiments within this branch of research.
5

Network layer reliability and security in energy harvesting wireless sensor networks

Yang, Jing 08 December 2023 (has links) (PDF)
Wireless sensor networks (WSNs) have become pivotal in precision agriculture, environmental monitoring, and smart healthcare applications. However, the challenges of energy consumption and security, particularly concerning the reliance on large battery-operated nodes, pose significant hurdles for these networks. Energy-harvesting wireless sensor networks (EH-WSNs) emerged as a solution, enabling nodes to replenish energy from the environment remotely. Yet, the transition to EH-WSNs brought forth new obstacles in ensuring reliable and secure data transmission. In our initial study, we tackled the intermittent connectivity issue prevalent in EH-WSNs due to the dynamic behavior of energy harvesting nodes. Rapid shifts between ON and OFF states led to frequent changes in network topology, causing reduced link stability. To counter this, we introduced the hybrid routing method (HRM), amalgamating grid-based and opportunistic-based routing. HRM incorporated a packet fragmentation mechanism and cooperative localization for both static and mobile networks. Simulation results demonstrated HRM's superior performance, enhancing key metrics such as throughput, packet delivery ratio, and energy consumption in comparison to existing energy-aware adaptive opportunistic routing approaches. Our second research focused on countering emerging threats, particularly the malicious energy attack (MEA), which remotely powers specific nodes to manipulate routing paths. We developed intelligent energy attack methods utilizing Q-learning and Policy Gradient techniques. These methods enhanced attacking capabilities across diverse network settings without requiring internal network information. Simulation results showcased the efficacy of our intelligent methods in diverting traffic loads through compromised nodes, highlighting their superiority over traditional approaches. In our third study, we developed a deep learning-based two-stage framework to detect MEAs. Utilizing a stacked residual network (SR-Net) for global classification and a stacked LSTM network (SL-Net) to pinpoint specific compromised nodes, our approach demonstrated high detection accuracy. By deploying trained models as defenses, our method outperformed traditional threshold filtering techniques, emphasizing its accuracy in detecting MEAs and securing EH-WSNs. In summary, our research significantly advances the reliability and security of EH-WSN, particularly focusing on enhancing the network layer. These findings offer promising avenues for securing the future of wireless sensor technologies.

Page generated in 0.0755 seconds