Global ETD Search

1	Multi-Agent Reinforcement Learning: Analysis and Application Paulo Cesar Heredia (12428121) 20 April 2022 (has links) <p>With the increasing availability of data and the rise of networked systems such as autonomous vehicles, drones, and smart girds, the application of data-driven, machine learning methods with multi-agents systems have become an important topic. In particular, reinforcement learning has gained a lot of popularity due to its similarities with optimal control, with the potential of allowing us to develop optimal control systems using only observed data and without the need for a model of a system's state dynamics. In this thesis work, we explore the application of reinforcement learning with multi-agents systems, which is known as multi-agent reinforcement learning (MARL). We have developed algorithms that address some challenges in the cooperative setting of MARL. We have also done work on better understanding the convergence guarantees of some known multi-agent reinforcement learning algorithms, which combine reinforcement learning with distributed consensus methods. And, with the aim of making MARL better suited to real-world problems, we have also developed algorithms to address some practical challenges with MARL and we have applied MARL on a real-world problem.</p> <p>In the first part of this thesis, we focus on developing algorithms to address some open problems in MARL. One of these challenges is learning with output feedback, which is known as partial observability in the reinforcement learning literature. One of the main assumptions of reinforcement learning in the singles agent case is that the agent can fully observe the state of the plant it is controlling (we note the “plant" is often referred to as the “environment" in the reinforcement learning literature. We will use these terms interchangeably). In the single agent case this assumption can be reasonable since it only requires one agent to fully observe its environment. In the multi-agent setting, however, this assumption would require all agents to fully observe the state and furthermore since each agent could affect the plant (or environment) with its actions, the assumption would also require that agent's know the actions of other agents. We have also developed algorithms to address practical issues that may arise when applying reinforcement learning (RL) or MARL on large-scale real-world systems. One such algorithm is a distributed reinforcement learning algorithm that allows us to learn in cases where the states and actions are both continuous and of large dimensionality, which is the case for many real-world applications. Without the ability to handle continuous states and actions, many algorithms require discretization, which with high dimensional systems can become impractical. We have also developed a distributed reinforcement learning algorithm that addresses data scalability of RL. By data scalability we mean how to learn from a very large dataset that cannot be efficiently processed by a single agent with limited resources.</p> <p>In the second part of this thesis, we provide a finite-sample analysis of some distributed reinforcement learning algorithms. By finite-sample analysis, we mean we provide an upper bound on the squared error of the algorithm for a given iteration of the algorithm. Or equivalently, since each iteration uses one data sample, we provide an upper bound of the squared error for a given number of data samples used. This type of analysis had been missing in the MARL literature, where most works on MARL have only provided asymptotic results for their proposed algorithms, which only tells us how the algorithmic error behaves as the number of samples used goes to infinity. </p> <p>The third part of this thesis focuses on applications with real-world systems. We have explored a real-world problem, namely transactive energy systems (TES), which can be represented as a multi-agent system. We have applied various reinforcement learning algorithms with the aim of learning an optimal control policy for this system. Through simulations, we have compared the performance of these algorithms and have illustrated the effect of partial observability (output feedback) when compared to full state feedback.</p> <p>In the last part we present some other work, specifically we present a distributed observer that aims to address learning with output feedback by estimating the state. The proposed algorithm is designed so that we do not require a complete model of state dynamics, and instead we use a parameterized model where the parameters are estimated along with the state.</p> Aerospace Engineering multi-agent systems reinforcement learning (RL) Distributed Algorithms
2	Reinforcement Learning for the Cybersecurity of Grid-Forming and Grid-Following Inverters Kwiatkowski, Brian Michael 06 December 2024 (has links) The U.S. movement toward clean energy generation has increased the number of installed inverter-based resources (IBR) in the grid, introducing new challenges in IBR control and cybersecurity. IBRs receive their set point through the communication link, which may expose them to cyber threats. Previous work has developed various techniques to detect and mitigate cyberattacks on IBRs, developing schemes for new inverters being installed in the grid. This work focuses on developing model-free control techniques for already installed IBR in the grid without the need to access IBR internal control parameters. The proposed method is tested for both the grid-forming and grid-following inverter control. Separate detection and mitigation algorithms are used to enhance the accuracy of the proposed method. The proposed method is tested using the modified CIGRE 14-bus North American grid with 7 IBRs in PSCAD/EMTDC. Finally, the performance of the detection algorithm is tested under grid normal transients, such as set point change, load change, and short-circuit fault, to make sure the proposed detection method does not provide false positives. / Master of Science / Due to the increasing presence of renewable energy resources such as photovoltaic and solar has introduced new challenges to the grid as the United States shifts towards clean energy. Those resources rely on devices called inverters to transform the energy to match the conditions of the grid. Inverters receive instructions to change their values before making the connection, making them potentially vulnerable to cyberattacks. While there has been progress in developing protection methods for inverters, existing inverters require additional protection to ensure their safe and reliable function. This work proposes a way to improve the reliability of existing inverters without changing the values of their internal settings. The method, tested under several conditions, successfully detects and counters potential cyberattacks without mistaking normal grid operations such as adjustments in demand and short circuit events. Cyberattack inverter-based resources (IBR) power system control reinforcement learning (RL) renewable energy sources
3	Building A More Efficient Mobile Vision System Through Adaptive Video Analytics Junpeng Guo (20349582) 17 December 2024 (has links) <p dir="ltr">Mobile vision is becoming the norm, transforming our daily lives. It powers numerous applications, enabling seamless interactions between the digital and physical worlds, such as augmented reality, real-time object detection, and many others. The popularity of mobile vision has spurred advancements from both computer vision (CV) and mobile edge computing (MEC) communities. The former focuses on improving analytics accuracy through the use of proper deep neural networks (DNNs), while the latter addresses the resource limitations of mobile environments by coordinating tasks between mobile and edge devices, determining which data to transmit and process to enable real-time performance. </p><p dir="ltr"> Despite recent advancements, existing approaches typically integrate the functionalities of the two camps at a basic task level. They rely on a uniform on-device processing scheme that streams the same type of data and uses the same DNN model for identical CV tasks, regardless of the analytical complexity of the current input, input size, or latency requirements. This lack of adaptability to dynamic contexts limits their ability to achieve optimal efficiency in scenarios involving diverse source data, varying computational resources, and differing application requirements. </p><p dir="ltr">Our approach seeks to move beyond task-level adaptation by emphasizing customized optimizations tailored to dynamic use scenarios. This involves three key adaptive strategies: dynamically compressing source data based on contextual information, selecting the appropriate computing model (e.g., DNN or sub-DNN) for the vision task, and establishing a feedback mechanism for context-aware runtime tuning. Additionally, for scenarios involving movable cameras, the feedback mechanism guides the data capture process to further enhance performance. These innovations are explored across three use cases categorized by the capture device: one stationary camera, one moving camera, and cross-camera analytics. </p><p dir="ltr">My dissertation begins with a stationary camera scenario, where we improve efficiency by adapting to the use context on both the device and edge sides. On the device side, we explore a broader compression space and implement adaptive compression based on data context. Specifically, we leverage changes in confidence scores as feedback to guide on-device compression, progressively reducing data volume while preserving the accuracy of visual analytics. On the edge side, instead of training a specialized DNN for each deployment scenario, we adaptively select the best-fit sub-network for the given context. A shallow sub-network is used to “test the waters”, accelerating the search for a deep sub-network that maximizes analytical accuracy while meeting latency requirements.</p><p dir="ltr"> Next, we explore scenarios involving a moving camera, such as those mounted on drones. These introduce new challenges, including increased data encoding demands due to camera movement and degraded analytics performance (e.g., tracking) caused by changing perspectives. To address these issues, we leverage drone-specific domain knowledge to optimize compression for object detection by applying global motion compensation and assigning different resolutions at a tile-granularity level based on the far-near effect. Furthermore, we tackle the more complex task of object tracking and following, where the analytics results directly influence the drone’s navigation. To enable effective target following with minimal processing overhead, we design an adaptive frame rate tracking mechanism that dynamically adjusts based on changing contexts.</p><p dir="ltr"> Last but not least, we extend the work to cross-camera analytics, focusing on coordination between one stationary ground-based camera and one moving aerial camera. The primary challenge lies in addressing significant misalignments (e.g., scale, rotation, and lighting variations) between the two perspectives. To overcome these issues, we propose a multi-exit matching mechanism that prioritizes local feature matching while incorporating global features and additional cues, such as color and location, to refine matches as needed. This approach ensures accurate identification of the same target across viewpoints while minimizing computational overhead by dynamically adapting to the complexity of the matching task. </p><p dir="ltr">While the current work primarily addresses ideal conditions, assuming favorable weather, optimal lighting, and reliable network performance, it establishes a solid foundation for future innovations in adaptive video processing under more challenging conditions. Future efforts will focus on enhancing robustness against adversarial factors, such as sensing data drift and transmission losses. Additionally, we plan to explore multi-camera coordination and multimodal data integration, leveraging the growing potential of large language models to further advance this field.</p> Computer vision Mobile computing Video Analytics Mobile Computing,
4	PERFORMANCE ASSURANCE FOR CLOUD-NATIVE APPLICATIONS Zabad, Bassam January 2021 (has links) Preserving the performance of cloud services according to service level agreements (SLAs) is one of the most important challenges in cloud infrastructure. Since the workload is always changing incrementally or decremental, managing the cloud resources efficiently is considered an important challenge to satisfy non-functional requirements like high availability and cost. Although many common approaches like predictive autoscaling could solve this problem, it is still not so efficient because of its constraints like requiring a workload pattern as training data. Reinforcement machine learning (RL) can be considered a significant solution for this problem. Even though reinforcement learning needs some time to be stable and needs many trials to decide the value of factors like discount rate, this approach can adapt with the dynamic workload. In this thesis, through a controlled experiment research method, we show how a model-free reinforcement algorithm like Q-learning can adapt to the dynamic workload by applying horizontal autoscaling to keep the performance of cloud services at the required level. Furthermore, the Amazon web services (AWS) platform is used to demonstrate the efficiency of the Q-learning algorithm in dealing with dynamic workload and achieving high availability. Performance of cloud services dynamic workload cloud infrastructure reinforcement learning (RL) machine learning service level agreements (SLAs) Amazon web services (AWS) Software Engineering Programvaruteknik
5	Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning Zachery Peter Berg (12455928) 25 April 2022 (has links) <p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p> Theoretical Computer Science Deep Reinforcement Learning (DRL) Reinforcement Learning (RL) Double descent Policy network size bias-variance tradeoff Reinforcement Learning Generalization overparameterization
6	Training an Adversarial Non-Player Character with an AI Demonstrator : Applying Unity ML-Agents Jlali, Yousra Ramdhana January 2022 (has links) Background. Game developers are continuously searching for new ways of populating their vast game worlds with competent and engaging Non-Player Characters (NPCs), and researchers believe Deep Reinforcement Learning (DRL) might be the solution for emergent behavior. Consequently, fusing NPCs with DRL practices has surged in recent years, however, proposed solutions rarely outperform traditional script-based NPCs. Objectives. This thesis explores a novel method of developing an adversarial DRL NPC by combining Reinforcement Learning (RL) algorithms. Our goal is to produce an agent that surpasses its script-based opponents by first mimicking their actions. Methods. The experiment commences with Imitation Learning (IL) before proceeding with supplementary DRL training where the agent is expected to improve its strategies. Lastly, we make all agents participate in 100-deathmatch tournaments to statistically evaluate and differentiate their deathmatch performances. Results. Statistical tests reveal that the agents reliably differ from one another and that our learning agent performed poorly in comparison to its script-based opponents. Conclusions. Based on our computed statistics, we can conclude that our solution was unsuccessful in developing a talented hostile DRL agent as it was unable to convey any form of proficiency in deathmatches. No further improvements could be applied to our ML agent due to the time constraints. However, we believe our outcome can be used as a stepping-stone for future experiments within this branch of research. Artificial intelligence (AI) reinforcement learning (RL) imitation learning (IL) non-player character (NPC) deathmatch Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik Computer Engineering Datorteknik Computer and Information Sciences Data- och informationsvetenskap
7	Network layer reliability and security in energy harvesting wireless sensor networks Yang, Jing 08 December 2023 (has links) (PDF) Wireless sensor networks (WSNs) have become pivotal in precision agriculture, environmental monitoring, and smart healthcare applications. However, the challenges of energy consumption and security, particularly concerning the reliance on large battery-operated nodes, pose significant hurdles for these networks. Energy-harvesting wireless sensor networks (EH-WSNs) emerged as a solution, enabling nodes to replenish energy from the environment remotely. Yet, the transition to EH-WSNs brought forth new obstacles in ensuring reliable and secure data transmission. In our initial study, we tackled the intermittent connectivity issue prevalent in EH-WSNs due to the dynamic behavior of energy harvesting nodes. Rapid shifts between ON and OFF states led to frequent changes in network topology, causing reduced link stability. To counter this, we introduced the hybrid routing method (HRM), amalgamating grid-based and opportunistic-based routing. HRM incorporated a packet fragmentation mechanism and cooperative localization for both static and mobile networks. Simulation results demonstrated HRM's superior performance, enhancing key metrics such as throughput, packet delivery ratio, and energy consumption in comparison to existing energy-aware adaptive opportunistic routing approaches. Our second research focused on countering emerging threats, particularly the malicious energy attack (MEA), which remotely powers specific nodes to manipulate routing paths. We developed intelligent energy attack methods utilizing Q-learning and Policy Gradient techniques. These methods enhanced attacking capabilities across diverse network settings without requiring internal network information. Simulation results showcased the efficacy of our intelligent methods in diverting traffic loads through compromised nodes, highlighting their superiority over traditional approaches. In our third study, we developed a deep learning-based two-stage framework to detect MEAs. Utilizing a stacked residual network (SR-Net) for global classification and a stacked LSTM network (SL-Net) to pinpoint specific compromised nodes, our approach demonstrated high detection accuracy. By deploying trained models as defenses, our method outperformed traditional threshold filtering techniques, emphasizing its accuracy in detecting MEAs and securing EH-WSNs. In summary, our research significantly advances the reliability and security of EH-WSN, particularly focusing on enhancing the network layer. These findings offer promising avenues for securing the future of wireless sensor technologies. Hybrid Routing Method (HRM) Intermittent Connectivity Energy Harvesting Malicious Energy Attack (MEA) Reinforcement Learning (RL) Malicious Energy Attack Detection Deep Learning(DL)

1

Page generated in 0.1141 seconds