Spelling suggestions: "subject:"deep reinforcement learning"" "subject:"keep reinforcement learning""
Task Offloading and Resource Allocation Using Deep Reinforcement LearningZhang, Kaiyi 01 December 2020 (has links)
Rapid urbanization poses huge challenges to people's daily lives, such as traffic congestion, environmental pollution, and public safety. Mobile Internet of things (MIoT) applications serving smart cities bring the promise of innovative and enhanced public services such as air pollution monitoring, enhanced road safety and city resources metering and management. These applications rely on a number of energy constrained MIoT units (MUs) (e.g., robots and drones) to continuously sense, capture and process data and images from their environments to produce immediate adaptive actions (e.g., triggering alarms, controlling machinery and communicating with citizens). In this thesis, we consider a scenario where a battery constrained MU executes a number of time-sensitive data processing tasks whose arrival times and sizes are stochastic in nature. These tasks can be executed locally on the device, offloaded to one of the nearby edge servers or to a cloud data center within a mobile edge computing (MEC) infrastructure. We first formulate the problem of making optimal offloading decisions that minimize the cost of current and future tasks as a constrained Markov decision process (CMDP) that accounts for the constraints of the MU battery and the limited reserved resources on the MEC infrastructure by the application providers. Then, we relax the CMDP problem into regular Markov decision process (MDP) using Lagrangian primal-dual optimization. We then develop advantage actor-critic (A2C) algorithm, one of the model-free deep reinforcement learning (DRL) method to train the MU to solve the relaxed problem. The training of the MU can be carried-out once to learn optimal offloading policies that are repeatedly employed as long as there are no large changes in the MU environment. Simulation results are presented to show that the proposed algorithm can achieve performance improvement over offloading decisions schemes that aim at optimizing instantaneous costs.
Approaches for Efficient Autonomous Exploration using Deep Reinforcement LearningThomas Molnar (8735079) 24 April 2020 (has links)
<p>For autonomous exploration of complex and unknown environments, existing Deep Reinforcement Learning (Deep RL) approaches struggle to generalize from computer simulations to real world instances. Deep RL methods typically exhibit low sample efficiency, requiring a large amount of data to develop an optimal policy function for governing an agent's behavior. RL agents expect well-shaped and frequent rewards to receive feedback for updating policies. Yet in real world instances, rewards and feedback tend to be infrequent and sparse.</p><p> </p><p>For sparse reward environments, an intrinsic reward generator can be utilized to facilitate progression towards an optimal policy function. The proposed Augmented Curiosity Modules (ACMs) extend the Intrinsic Curiosity Module (ICM) by Pathak et al. These modules utilize depth image and optical flow predictions with intrinsic rewards to improve sample efficiency. Additionally, the proposed Capsules Exploration Module (Caps-EM) pairs a Capsule Network, rather than a Convolutional Neural Network, architecture with an A2C algorithm. This provides a more compact architecture without need for intrinsic rewards, which the ICM and ACMs require. Tested using ViZDoom for experimentation in visually rich and sparse feature scenarios, both the Depth-Augmented Curiosity Module (D-ACM) and Caps-EM improve autonomous exploration performance and sample efficiency over the ICM. The Caps-EM is superior, using 44% and 83% fewer trainable network parameters than the ICM and D-ACM, respectively. On average across all “My Way Home” scenarios, the Caps-EM converges to a policy function with 1141% and 437% time improvements over the ICM and D-ACM, respectively.</p>
Mobile Robot Obstacle Avoidance based on Deep Reinforcement LearningFeng, Shumin January 2018 (has links)
Obstacle avoidance is one of the core problems in the field of autonomous navigation. An obstacle avoidance approach is developed for the navigation task of a reconfigurable multi-robot system named STORM, which stands for Self-configurable and Transformable Omni-Directional Robotic Modules. Various mathematical models have been developed in previous work in this field to avoid collision for such robots. In this work, the proposed collision avoidance algorithm is trained via Deep Reinforcement Learning, which enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. The trained neural network architecture is capable of choosing an action directly based on the input sensor data using the trained neural network architecture. A virtual STORM locomotion module was trained to explore a Gazebo simulation environment without collision, using the proposed collision avoidance strategies based on DRL. The mathematical model of the avoidance algorithm was derived from the simulation and then applied to the prototype of the locomotion module and validated via experiments. Universal software architecture was also designed for the STORM modules. The software architecture has extensible and reusable features that improve the design efficiency and enable parallel development. / Master of Science / In this thesis, an obstacle avoidance approach is described to enable autonomous navigation of a reconfigurable multi-robot system, STORM. The Self-configurable and Transformable Omni-Directional Robotic Modules (STORM) is a novel approach towards heterogeneous swarm robotics. The system has two types of robotic modules, namely the locomotion module and the manipulation module. Each module is able to navigate and perform tasks independently. In addition, the systems are designed to autonomously dock together to perform tasks that the modules individually are unable to accomplish. The proposed obstacle avoidance approach is designed for the modules of STORM, but can be applied to mobile robots in general. In contrast to the existing collision avoidance approaches, the proposed algorithm was trained via deep reinforcement learning (DRL). This enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. In order to avoid damage to the real robot during the learning phase, a virtual robot was trained inside a Gazebo simulation environment with obstacles. The mathematical model for the collision avoidance strategy obtained through DRL was then validated on a locomotion module prototype of STORM. This thesis also introduces the overall STORM architecture and provides a brief overview of the generalized software architecture designed for the STORM modules. The software architecture has expandable and reusable features that apply well to the swarm architecture while allowing for design efficiency and parallel development.
Cost and Power Loss Aware Coalitions under Uncertainty in Transactive Energy SystemsSadeghi, Mohammad 02 June 2022 (has links)
The need to cope with the rapid transformation of the conventional electrical grid into the future smart grid, with multiple connected microgrids, has led to the investigation of optimal smart grid architectures. The main components of the future smart grids such as generators, substations, controllers, smart meters and collector nodes are evolving; however, truly effective integration of these elements into the microgrid context to guarantee intelligent and dynamic functionality across the whole smart grid remains an open issue. Energy trading is a significant part of this integration. In microgrids, energy trading refers to the use of surplus energy in one microgrid to satisfy the demand of another microgrid or a group of microgrids that form a microgrid community. Different techniques are employed to manage the energy trading process such as optimization-based and conventional game-theoretical methods, which bring about several challenges including complexity, scalability and ability to learn dynamic environments. A common challenge among all of these methods is adapting to changing circumstances. Optimization methods, for example, show promising performance in static scenarios where the optimal solution is achieved for a specific snapshot of the system. However, to use such a technique in a dynamic environment, finding the optimal solutions for all the time slots is needed, which imposes a significant complexity. Challenges such as this can be best addressed using game theory techniques empowered with machine learning methods across grid infrastructure and microgrid communities. In this thesis, novel Bayesian coalitional game theory-based and Bayesian reinforcement learning-based coalition formation algorithms are proposed, which allow the microgrids to exchange energy with their coalition members while minimizing the associated cost and power loss. In addition, a deep reinforcement learning scheme is developed to address the problem of large convergence time resulting from the sizeable state-action space of the methods mentioned above. The proposed algorithms can ideally overcome the uncertainty in the system. The advantages of the proposed methods are highlighted by comparing them with the conventional coalitional game theory-based techniques, Q-learning-based technique, random coalition formation, as well as with the case with no coalitions. The results show the superiority of the proposed methods in terms of power loss and cost minimization in dynamic environments.
Reliable Low Latency Machine Learning for Resource Management in Wireless NetworksTaleb Zadeh Kasgari, Ali 30 March 2022 (has links)
Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique rate, reliability, and latency requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing an efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contribute to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because many hidden factors might contribute to the resource usage pattern of users or machine-type devices that can only be modeled using an end-to-end machine learning solution. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users. First, by explicitly modeling the limitations of the human brain, a concrete measure for the delay perception of human users in a wireless network is introduced. Then, a new probabilistic model for this delay perception is learned based on the brain features of a human user. Given the learned model for the delay perception of the human brain, a brain-aware resource management algorithm is proposed for allocating radio resources to human users while minimizing the transmit power and taking into account the reliability of both machine type devices and human users. Next, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC) in the downlink of a wireless network. The proposed, experienced deep-RL framework can guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users' traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. After that, the problem of network slicing is studied in the context of a wireless system having a time-varying number of users that require two types of slices: reliable low latency (RLL) and self-managed (capacity limited) slices. To address this problem, a novel control framework for stochastic optimization is proposed based on the Lyapunov drift-plus-penalty method. This new framework enables the system to minimize power, maintain slice isolation, and provide reliable and low latency end-to-end communication for RLL slices. Then, a novel concept of three-dimensional (3D) cellular networks, that integrate drone base stations (drone-BS) and cellular-connected drone users (drone-UEs), is introduced. For this new 3D cellular architecture, a novel framework for network planning for drone-BSs as well as latency-minimal cell association for drone-UEs is proposed. For network planning, a tractable method for drone-BSs' deployment based on the notion of truncated octahedron shapes is proposed that ensures full coverage for a given space with minimum number of drone-BSs. In addition, to characterize frequency planning in such 3D wireless networks, an analytical expression for the feasible integer frequency reuse factors is derived. Subsequently, an optimal 3D cell association scheme is developed for which the drone-UEs' latency, considering transmission, computation, and backhaul delays, is minimized. Finally, the concept of super environments is introduced. After formulating this concept mathematically, it is shown that any two markov decision process (MDP) can be a member of a super environment if sufficient additional state space is added. Then the effect of this additional state space on model-free and model-based deep-RL algorithms is investigated. Next, the tradeoff caused by adding the extra state space on the speed of convergence and the optimality of the solution is discussed. In summary, this dissertation led to the development of machine learning algorithms for statistically modeling complex parts in the resource management system. Also, it developed a model-free controller that can control the resource management system reliably, with low latency, and optimally. / Doctor of Philosophy / Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing a new and efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contributes to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because of the data-driven nature of machine learning algorithms that can help us to model many hidden factors that might contribute to the resource usage pattern of users or devices. These hidden factors can only be modeled using an end-to-end machine learning solution. By end-to-end, we mean the system only relies on its observation of the quality of service (QoS) for users. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users.
ACADIA: Efficient and Robust Adversarial Attacks Against Deep Reinforcement LearningAli, Haider 05 January 2023 (has links)
Existing adversarial algorithms for Deep Reinforcement Learning (DRL) have largely focused on identifying an optimal time to attack a DRL agent. However, little work has been explored in injecting efficient adversarial perturbations in DRL environments. We propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. ACADIA provides a set of efficient and robust perturbation-based adversarial attacks to disturb the DRL agent's decision-making based on novel combinations of techniques utilizing momentum, ADAM optimizer (i.e., Root Mean Square Propagation or RMSProp), and initial randomization. These kinds of DRL attacks with novel integration of such techniques have not been studied in the existing Deep Neural Networks (DNNs) and DRL research. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without the state-of-the-art defenses in DRL (i.e., RADIAL and ATLA). Our results demonstrate that the proposed ACADIA outperforms existing gradient-based counterparts under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini and Wagner (CW) method with better performance under defenses of DRL. / Master of Science / Artificial Intelligence (AI) techniques such as Deep Neural Networks (DNN) and Deep Reinforcement Learning (DRL) are prone to adversarial attacks. For example, a perturbed stop sign can force a self-driving car's AI algorithm to increase the speed rather than stop the vehicle. There has been little work developing attacks and defenses against DRL. In DRL, a DNN-based policy decides to take an action based on the observation of the environment and gets the reward in feedback for its improvements. We perturb that observation to attack the DRL agent. There are two main aspects to developing an attack on DRL. One aspect is to identify an optimal time to attack (when-to-attack?). The second aspect is to identify an eﬀicient method to attack (how-to-attack?). To answer the second aspect, we propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under DRL environments of Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without state-of-the-art defenses. Our results demonstrate that the proposed ACADIA outperforms state-of-the-art perturbation methods under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini and Wagner (CW) method with better performance under the defenses of DRL.
Reinforcement Learning for Hydrobatic AUVs / Reinforcement learning för Hydrobatiska AUVWoźniak, Grzegorz January 2022 (has links)
This master thesis focuses on developing a Reinforcement Learning (RL) controller to perform hydrobatic maneuvers on an Autonomous Underwater Vehicle (AUV) successfully. This work also aims to analyze the robustness of the RL controller, as well as provide a comparison between RL algorithms and Proportional Integral Derivative (PID) control. Training of the algorithms is initially conducted in a Numpy simulation in Python. We show how to model the Equations of Motion (EOM) of the AUV and how to use it to train the RL controllers. We use the stablebaselines3 RL framework and create a training environment with the OpenAI gym. The Twin-Delay Deep Deterministic Policy Gradient (TD3) algorithm offers good performance in the simulation. The following maneuvers are studied: trim control, waypoint following, and an inverted pendulum. We test the maneuvers both in the Numpy simulation and Stonefish simulator. Also, we test the robustness of the RL trim controller by simulating noise in the state feedback. Lastly, we run the RL trim controller on a real AUV hardware called SAM. We show that the RL algorithm trained in the Numpy simulator can achieve similar performance to the PID controller in the Stonefish simulator. We generate a policy that can perform the trim control and the Inverted Pendulum maneuver in the Numpy simulation. We show that we can generate a robust policy that executes other types of maneuvers by providing a parameterized cost function to the RL algorithm. We discuss the results of every maneuver we perform with the SAM AUV and provide a discussion about the advantages and disadvantages of this control method applied to underwater robotics. We conclude that RL can be used to create policies that perform hydrobatic maneuvers. This data-driven approach can be applied in the future to more complex problems in underwater robotics. / Denna masteruppsats fokuserar på att utveckla en Reinforcement Learning (RL) kontroller för att framgångsrikt utföra hydrobatiska manövrar på ett autonomt undervattensfordon (AUV). Detta arbete syftar också till att analysera robustheten hos RL-kontrollern, samt tillhandahålla en jämförelse mellan RL-algoritmer och Proportional Integral Derivative (PID) kontroll. Träning av algoritmerna utförs initialt i Numpy-simuleringen i Python. Vi visar hur man modellerar rörelseekvationerna (EOM) för AUV, och hur man använder den för att träna RL-kontrollerna. Vi använder ramverket stablebaselines3 RL och skapar en träningsmiljö med gymmet OpenAI. Algoritmen Twin-Delay Deep Deterministic Policy Gradient (TD3) erbjuder bra prestanda i simuleringen. Följande manövrar studeras: trimkontroll, waypointföljning och en inverterad pendel. Vi testar manövrarna både i Numpy-simulering och Stonefish-simulator. Vi testar också robustheten hos RL-trimkontrollern genom att simulera bruset i tillståndsåterkopplingen. Slutligen kör vi RL-trimkontrollern på den riktiga SAM AUV-hårdvaran. Vi visar att RL-algoritmen tränad i Numpy-simulatorn kan uppnå liknande prestanda som PID-regulatorn i Stonefish-simulatorn. Vi genererar en policy som kan utföra trimkontrollen och manövern med inverterad pendel i Numpy-simuleringen. Vi visar att vi kan generera en robust policy som utför andra typer av manövrar genom att tillhandahålla en parameteriserad kostnadsfunktion till RL-algoritmen. Vi diskuterar resultaten av varje manöver vi utför med SAM AUV och ger en diskussion om fördelarna och nackdelarna med denna kontrollmetod som tillämpas på undervattensrobotik. Vi drar slutsatsen att RL kan användas för att skapa policyer som utför hydrobatiska manövrar. Detta datadrivna tillvägagångssätt kan tillämpas i framtiden på mer komplexa problem inom undervattensrobotik.
Using Deep Reinforcement Learning For Adaptive Traffic Control in Four-Way IntersectionsJörneskog, Gustav, Kandelan, Josef January 2019 (has links)
The consequences of traffic congestion include increased travel time, fuel consumption, and the number of crashes. Studies suggest that most traffic delays are due to nonrecurring traffic congestion. Adaptive traffic control using real-time data is effective in dealing with nonrecurring traffic congestion. Many adaptive traffic control algorithms used today are deterministic and prone to human error and limitation. Reinforcement learning allows the development of an optimal traffic control policy in an unsupervised manner. We have implemented a reinforcement learning algorithm that only requires information about the number of vehicles and the mean speed of each incoming road to streamline traffic in a four-way intersection. The reinforcement learning algorithm is evaluated against a deterministic algorithm and a fixed-time control schedule. Furthermore, it was tested whether reinforcement learning can be trained to prioritize emergency vehicles while maintaining good traffic flow. The reinforcement learning algorithm obtains a lower average time in the system than the deterministic algorithm in eight out of nine experiments. Moreover, the reinforcement learning algorithm achieves a lower average time in the system than the fixed-time schedule in all experiments. At best, the reinforcement learning algorithm performs 13% better than the deterministic algorithm and 39% better than the fixed-time schedule. Moreover, the reinforcement learning algorithm could prioritize emergency vehicles while maintaining good traffic flow.
Deep Reinforcement Learning for Intelligent Road Maintenance in Small Island Developing States Vulnerable to Climate Change : Using Artificial Intelligence to Adapt Communities to Climate ChangeElvira, Boman January 2018 (has links)
The consequences of climate change are already noticeable in small island developing states. Road networks are crucial for a functioning society, and are particularly vulnerable to extreme weather, floods, landslides and other effects of climate change. Road systems in small island developing states are therefore in special need of climate adaptation efforts. Climate adaptation of road systems also has to be cost-efficient since these small island states have limited economical resources. Recent advances in deep reinforcement learning, a subfield of artificial intelligence, has proven that intelligent agents can achieve superhuman level at a number of tasks, setting hopes high for possible future applications of the algorithms. To investigate wether deep reinforcement learning is suitable for climate adaptation of road maintenance systems a simulator has been set up, together with three deep reinforcement learning agents, and two non-intelligent agents for performance comparisons. The results of the project indicate that deep reinforcement learning is suitable for use in intelligent road maintenance systems for climate adaptation in small island developing states.
Extension on Adaptive MAC Protocol for Space CommunicationsLi, Max Hongming 06 December 2018 (has links)
This work devises a novel approach for mitigating the effects of Catastrophic Forgetting in Deep Reinforcement Learning-based cognitive radio engine implementations employed in space communication applications. Previous implementations of cognitive radio space communication systems utilized a moving window- based online learning method, which discards part of its understanding of the environment each time the window is moved. This act of discarding is called Catastrophic Forgetting. This work investigated ways to control the forgetting process in a more systematic manner, both through a recursive training technique that implements forgetting in a more controlled manner and an ensemble learning technique where each member of the ensemble represents the engine's understanding over a certain period of time. Both of these techniques were integrated into a cognitive radio engine proof-of-concept, and were delivered to the SDR platform on the International Space Station. The results were then compared to the results from the original proof-of-concept. Through comparison, the ensemble learning technique showed promise when comparing performance between training techniques during different communication channel contexts.
Page generated in 1.4957 seconds