Global ETD Search

1	Task Offloading and Resource Allocation Using Deep Reinforcement Learning Zhang, Kaiyi 01 December 2020 (has links) Rapid urbanization poses huge challenges to people's daily lives, such as traffic congestion, environmental pollution, and public safety. Mobile Internet of things (MIoT) applications serving smart cities bring the promise of innovative and enhanced public services such as air pollution monitoring, enhanced road safety and city resources metering and management. These applications rely on a number of energy constrained MIoT units (MUs) (e.g., robots and drones) to continuously sense, capture and process data and images from their environments to produce immediate adaptive actions (e.g., triggering alarms, controlling machinery and communicating with citizens). In this thesis, we consider a scenario where a battery constrained MU executes a number of time-sensitive data processing tasks whose arrival times and sizes are stochastic in nature. These tasks can be executed locally on the device, offloaded to one of the nearby edge servers or to a cloud data center within a mobile edge computing (MEC) infrastructure. We first formulate the problem of making optimal offloading decisions that minimize the cost of current and future tasks as a constrained Markov decision process (CMDP) that accounts for the constraints of the MU battery and the limited reserved resources on the MEC infrastructure by the application providers. Then, we relax the CMDP problem into regular Markov decision process (MDP) using Lagrangian primal-dual optimization. We then develop advantage actor-critic (A2C) algorithm, one of the model-free deep reinforcement learning (DRL) method to train the MU to solve the relaxed problem. The training of the MU can be carried-out once to learn optimal offloading policies that are repeatedly employed as long as there are no large changes in the MU environment. Simulation results are presented to show that the proposed algorithm can achieve performance improvement over offloading decisions schemes that aim at optimizing instantaneous costs. Offloading Deep reinforcement learning
2	Reliable deep reinforcement learning: stable training and robust deployment Queeney, James 30 August 2023 (has links) Deep reinforcement learning (RL) represents a data-driven framework for sequential decision making that has demonstrated the ability to solve challenging control tasks. This data-driven, learning-based approach offers the potential to improve operations in complex systems, but only if it can be trusted to produce reliable performance both during training and upon deployment. These requirements have hindered the adoption of deep RL in many real-world applications. In order to overcome the limitations of existing methods, this dissertation introduces reliable deep RL algorithms that deliver (i) stable training from limited data and (ii) robust, safe deployment in the presence of uncertainty. The first part of the dissertation addresses the interactive nature of deep RL, where learning requires data collection from the environment. This interactive process can be expensive, time-consuming, and dangerous in many real-world settings, which motivates the need for reliable and efficient learning. We develop deep RL algorithms that guarantee stable performance throughout training, while also directly considering data efficiency in their design. These algorithms are supported by novel policy improvement lower bounds that account for finite-sample estimation error and sample reuse. The second part of the dissertation focuses on the uncertainty present in real-world applications, which can impact the performance and safety of learned control policies. In order to reliably deploy deep RL in the presence of uncertainty, we introduce frameworks that incorporate safety constraints and provide robustness to general disturbances in the environment. Importantly, these frameworks make limited assumptions on the training process, and can be implemented in settings that require real-world interaction for training. This motivates deep RL algorithms that deliver robust, safe performance at deployment time, while only using standard data collection from a single training environment. Overall, this dissertation contributes new techniques to overcome key limitations of deep RL for real-world decision making and control. Experiments across a variety of continuous control tasks demonstrate the effectiveness of our algorithms. Engineering Deep reinforcement learning
3	Beats, Bots, and Bananas: Modeling reinforcement learning of sensorimotor synchronization Ommi, Yassaman January 2024 (has links) This thesis investigates the computational principles underlying sensorimotor synchronization (SMS) through the novel application of deep reinforcement learning (RL). SMS, the coordination of rhythmic movement with external stimuli, is essential for human activities like music performance and social interaction, yet its neural mechanisms and learning processes are not fully understood. We present a computational framework utilizing recurrent neural networks with Long Short-Term Memory (LSTM) units, trained via RL, to model SMS behavior. This approach allows for the exploration of how different reward structures shape the acquisition and execution of synchronization skills. Our model is evaluated on both steady-state synchronization and perturbation response tasks, paralleling human SMS studies. Key findings reveal that agents trained with a combined reward—minimizing next-beat asynchrony and maintaining interval accuracy—exhibit human-like adaptive behaviors. Notably, these agents exhibited asymmetric error correction, making larger adjustments for late versus early taps, a phenomenon documented in human subjects. This suggests that such asymmetry may arise from the inherent reward structure of the task rather than from specific neural architectures. While our model did not consistently reproduce the negative mean asynchrony observed in human steady-state tapping, it demonstrated anticipatory behavior in response to perturbations. This offers new insights into how the brain might learn and execute rhythmic tasks, indicating that anticipatory strategies in human synchronization could naturally arise from processing rewards and timing errors. Our work contributes to the growing integration of machine learning techniques with cognitive neuroscience, offering new computational insights into the acquisition of timing skills. It establishes a flexible framework, which can be extended for future investigations in studying more complex rhythms, coordination between individuals, and even the neural basis of rhythm perception and production. / Thesis / Master of Science (MSc) / Have you ever wondered how we naturally tap our foot in time with music? This thesis investigates this human ability, known as sensorimotor synchronization, using artificial intelligence. By creating artificial agents that learn to tap along with a steady beat through reinforcement learning—like a person tapping to a metronome—we aimed to understand how the brain acquires this skill. Our experiments showed that how we define success, significantly affects how the agents learn the skill. Notably, when we rewarded both precise timing and consistent tapping, the agents' behavior closely resembled that of humans. They even exhibited a human-like pattern in error correction, making larger adjustments when tapping too late rather than too early. This research offers new insights into how our brains process and learn rhythm and timing. It also lays the groundwork for developing AI systems capable of replicating human-like timing behaviors, with potential applications in music technology and robotics. Deep Reinforcement Learning Sensorimotor Synchronization
4	Approaches for Efficient Autonomous Exploration using Deep Reinforcement Learning Thomas Molnar (8735079) 24 April 2020 (has links) <p>For autonomous exploration of complex and unknown environments, existing Deep Reinforcement Learning (Deep RL) approaches struggle to generalize from computer simulations to real world instances. Deep RL methods typically exhibit low sample efficiency, requiring a large amount of data to develop an optimal policy function for governing an agent's behavior. RL agents expect well-shaped and frequent rewards to receive feedback for updating policies. Yet in real world instances, rewards and feedback tend to be infrequent and sparse.</p><p> </p><p>For sparse reward environments, an intrinsic reward generator can be utilized to facilitate progression towards an optimal policy function. The proposed Augmented Curiosity Modules (ACMs) extend the Intrinsic Curiosity Module (ICM) by Pathak et al. These modules utilize depth image and optical flow predictions with intrinsic rewards to improve sample efficiency. Additionally, the proposed Capsules Exploration Module (Caps-EM) pairs a Capsule Network, rather than a Convolutional Neural Network, architecture with an A2C algorithm. This provides a more compact architecture without need for intrinsic rewards, which the ICM and ACMs require. Tested using ViZDoom for experimentation in visually rich and sparse feature scenarios, both the Depth-Augmented Curiosity Module (D-ACM) and Caps-EM improve autonomous exploration performance and sample efficiency over the ICM. The Caps-EM is superior, using 44% and 83% fewer trainable network parameters than the ICM and D-ACM, respectively. On average across all “My Way Home” scenarios, the Caps-EM converges to a policy function with 1141% and 437% time improvements over the ICM and D-ACM, respectively.</p> Computer Engineering deep reinforcement learning Capsule Network exploration performance Autonomous
5	Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning Feng, Shumin January 2018 (has links) Obstacle avoidance is one of the core problems in the field of autonomous navigation. An obstacle avoidance approach is developed for the navigation task of a reconfigurable multi-robot system named STORM, which stands for Self-configurable and Transformable Omni-Directional Robotic Modules. Various mathematical models have been developed in previous work in this field to avoid collision for such robots. In this work, the proposed collision avoidance algorithm is trained via Deep Reinforcement Learning, which enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. The trained neural network architecture is capable of choosing an action directly based on the input sensor data using the trained neural network architecture. A virtual STORM locomotion module was trained to explore a Gazebo simulation environment without collision, using the proposed collision avoidance strategies based on DRL. The mathematical model of the avoidance algorithm was derived from the simulation and then applied to the prototype of the locomotion module and validated via experiments. Universal software architecture was also designed for the STORM modules. The software architecture has extensible and reusable features that improve the design efficiency and enable parallel development. / Master of Science / In this thesis, an obstacle avoidance approach is described to enable autonomous navigation of a reconfigurable multi-robot system, STORM. The Self-configurable and Transformable Omni-Directional Robotic Modules (STORM) is a novel approach towards heterogeneous swarm robotics. The system has two types of robotic modules, namely the locomotion module and the manipulation module. Each module is able to navigate and perform tasks independently. In addition, the systems are designed to autonomously dock together to perform tasks that the modules individually are unable to accomplish. The proposed obstacle avoidance approach is designed for the modules of STORM, but can be applied to mobile robots in general. In contrast to the existing collision avoidance approaches, the proposed algorithm was trained via deep reinforcement learning (DRL). This enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. In order to avoid damage to the real robot during the learning phase, a virtual robot was trained inside a Gazebo simulation environment with obstacles. The mathematical model for the collision avoidance strategy obtained through DRL was then validated on a locomotion module prototype of STORM. This thesis also introduces the overall STORM architecture and provides a brief overview of the generalized software architecture designed for the STORM modules. The software architecture has expandable and reusable features that apply well to the swarm architecture while allowing for design efficiency and parallel development. Robotic Systems Neural Networks Obstacle Avoidance Deep Reinforcement Learning
6	Cost and Power Loss Aware Coalitions under Uncertainty in Transactive Energy Systems Sadeghi, Mohammad 02 June 2022 (has links) The need to cope with the rapid transformation of the conventional electrical grid into the future smart grid, with multiple connected microgrids, has led to the investigation of optimal smart grid architectures. The main components of the future smart grids such as generators, substations, controllers, smart meters and collector nodes are evolving; however, truly effective integration of these elements into the microgrid context to guarantee intelligent and dynamic functionality across the whole smart grid remains an open issue. Energy trading is a significant part of this integration. In microgrids, energy trading refers to the use of surplus energy in one microgrid to satisfy the demand of another microgrid or a group of microgrids that form a microgrid community. Different techniques are employed to manage the energy trading process such as optimization-based and conventional game-theoretical methods, which bring about several challenges including complexity, scalability and ability to learn dynamic environments. A common challenge among all of these methods is adapting to changing circumstances. Optimization methods, for example, show promising performance in static scenarios where the optimal solution is achieved for a specific snapshot of the system. However, to use such a technique in a dynamic environment, finding the optimal solutions for all the time slots is needed, which imposes a significant complexity. Challenges such as this can be best addressed using game theory techniques empowered with machine learning methods across grid infrastructure and microgrid communities. In this thesis, novel Bayesian coalitional game theory-based and Bayesian reinforcement learning-based coalition formation algorithms are proposed, which allow the microgrids to exchange energy with their coalition members while minimizing the associated cost and power loss. In addition, a deep reinforcement learning scheme is developed to address the problem of large convergence time resulting from the sizeable state-action space of the methods mentioned above. The proposed algorithms can ideally overcome the uncertainty in the system. The advantages of the proposed methods are highlighted by comparing them with the conventional coalitional game theory-based techniques, Q-learning-based technique, random coalition formation, as well as with the case with no coalitions. The results show the superiority of the proposed methods in terms of power loss and cost minimization in dynamic environments. Microgrid Bayesian Reinforcement Learning Energy Trading Deep Reinforcement Learning
7	A Comparative Study on Service Migration for Mobile Edge Computing Based on Deep Learning Park, Sung woon 15 June 2023 (has links) Over the past few years, Deep Learning (DL), a promising technology leading the next generation of intelligent environments, has attracted significant attention and has been intensively utilized in various fields in the fourth industrial revolution era. The applications of Deep Learning in the area of Mobile Edge Computing (MEC) have achieved remarkable outcomes. Among several functionalities of MEC, the service migration frameworks have been proposed to overcome the shortcomings of the traditional methodologies in supporting high-mobility users with real-time responses. The service migration in MEC is a complex optimization problem that considers several dynamic environmental factors to make an optimal decision on whether, when, and where to migrate. In line with the trend, various service migration frameworks based on a variety of optimization algorithms have been proposed to overcome the limitations of the traditional methodologies. However, it is required to devise a more sophisticated and realistic model by solving the computational complexity and improving the inefficiency of existing frameworks. Therefore, an efficient service migration mechanism that is able to capture the environmental variables comprehensively is required. In this thesis, we propose an enhanced service migration model to address user proximity issues. We first introduce innovative service migration models for single-user and multi-user to overcome the users’ proximity issue while enforcing the service execution efficiency. Secondly, We formulate the service migration process as a complicated optimization problem and utilize Deep Reinforcement Learning (DRL) to estimate the optimal policy to minimize the migration cost, transaction cost, and consumed energy jointly. Lastly, we compare the proposed models with existing migration methodologies through analytical simulations from various aspects. The numerical results demonstrate that the proposed models can estimate the optimal policy despite the computational complexity caused by the dynamic environment and high-mobility users. Mobile edge computing Service migration Deep reinforcement learning Migration cost
8	Reliable Low Latency Machine Learning for Resource Management in Wireless Networks Taleb Zadeh Kasgari, Ali 30 March 2022 (has links) Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique rate, reliability, and latency requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing an efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contribute to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because many hidden factors might contribute to the resource usage pattern of users or machine-type devices that can only be modeled using an end-to-end machine learning solution. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users. First, by explicitly modeling the limitations of the human brain, a concrete measure for the delay perception of human users in a wireless network is introduced. Then, a new probabilistic model for this delay perception is learned based on the brain features of a human user. Given the learned model for the delay perception of the human brain, a brain-aware resource management algorithm is proposed for allocating radio resources to human users while minimizing the transmit power and taking into account the reliability of both machine type devices and human users. Next, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC) in the downlink of a wireless network. The proposed, experienced deep-RL framework can guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users' traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. After that, the problem of network slicing is studied in the context of a wireless system having a time-varying number of users that require two types of slices: reliable low latency (RLL) and self-managed (capacity limited) slices. To address this problem, a novel control framework for stochastic optimization is proposed based on the Lyapunov drift-plus-penalty method. This new framework enables the system to minimize power, maintain slice isolation, and provide reliable and low latency end-to-end communication for RLL slices. Then, a novel concept of three-dimensional (3D) cellular networks, that integrate drone base stations (drone-BS) and cellular-connected drone users (drone-UEs), is introduced. For this new 3D cellular architecture, a novel framework for network planning for drone-BSs as well as latency-minimal cell association for drone-UEs is proposed. For network planning, a tractable method for drone-BSs' deployment based on the notion of truncated octahedron shapes is proposed that ensures full coverage for a given space with minimum number of drone-BSs. In addition, to characterize frequency planning in such 3D wireless networks, an analytical expression for the feasible integer frequency reuse factors is derived. Subsequently, an optimal 3D cell association scheme is developed for which the drone-UEs' latency, considering transmission, computation, and backhaul delays, is minimized. Finally, the concept of super environments is introduced. After formulating this concept mathematically, it is shown that any two markov decision process (MDP) can be a member of a super environment if sufficient additional state space is added. Then the effect of this additional state space on model-free and model-based deep-RL algorithms is investigated. Next, the tradeoff caused by adding the extra state space on the speed of convergence and the optimality of the solution is discussed. In summary, this dissertation led to the development of machine learning algorithms for statistically modeling complex parts in the resource management system. Also, it developed a model-free controller that can control the resource management system reliably, with low latency, and optimally. / Doctor of Philosophy / Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing a new and efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contributes to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because of the data-driven nature of machine learning algorithms that can help us to model many hidden factors that might contribute to the resource usage pattern of users or devices. These hidden factors can only be modeled using an end-to-end machine learning solution. By end-to-end, we mean the system only relies on its observation of the quality of service (QoS) for users. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users. Machine Learning Deep Reinforcement Learning Wireless Resource Management
9	Deep Reinforcement Learning for Next Generation Wireless Networks with Echo State Networks Chang, Hao-Hsuan 26 August 2021 (has links) This dissertation considers a deep reinforcement learning (DRL) setting under the practical challenges of real-world wireless communication systems. The non-stationary and partially observable wireless environments make the learning and the convergence of the DRL agent challenging. One way to facilitate learning in partially observable environments is to combine recurrent neural network (RNN) and DRL to capture temporal information inherent in the system, which is referred to as deep recurrent Q-network (DRQN). However, training DRQN is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted wireless applications in the 5G and future 6G wireless networks, the available training data is very limited. Therefore, it is important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment that only requires limited training overhead. In this dissertation, we design efficient DRL frameworks by utilizing echo state network (ESN), which is a special type of RNNs where only the output weights are trained. To be specific, we first introduce the deep echo state Q-network (DEQN) by adopting ESN as the kernel of deep Q-networks. Next, we introduce federated ESN-based policy gradient (Fed-EPG) approach that enables multiple agents collaboratively learn a shared policy to achieve the system goal. We designed computationally efficient training algorithms by utilizing the special structure of ESNs, which have the advantage of learning a good policy in a short time with few training data. Theoretical analyses are conducted for DEQN and Fed-EPG approaches to show the convergence properties and to provide a guide to hyperparameter tuning. Furthermore, we evaluate the performance under the dynamic spectrum sharing (DSS) scenario, which is a key enabling technology that aims to utilize the precious spectrum resources more efficiently. Compared to a conventional spectrum management policy that usually grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to dynamically share the spectrum with the primary system. Our work sheds light on the real deployments of DRL techniques in next generation wireless systems. / Doctor of Philosophy / Model-free reinforcement learning (RL) algorithms such as Q-learning are widely used because it can learn the policy directly through interactions with the environment without estimating a model of the environment, which is useful when the underlying system model is complex. Q-learning performs poorly for large-scale models because the training has to updates every element in a large Q-table, which makes training difficult or even impossible. Therefore, deep reinforcement learning (DRL) exploits the powerful deep neural network to approximate the Q-table. Furthermore, a deep recurrent Q-network (DRQN) is introduced to facilitate learning in partially observable environments. However, DRQN training requires a large amount of training data and a long training time to achieve convergence, which is impractical in wireless systems with non-stationary environments and limited training data. Therefore, in this dissertation, we introduce two efficient DRL approaches: deep echo state Q-network (DEQN) and federated ESN-based policy gradient (Fed-EPG) approaches. Theoretical analyses of DEQN and Fed-EPG are conducted to provide the convergence properties and the guideline for designing hyperparameters. We evaluate and demonstrate the performance benefits of the DEQN and Fed-EPG under the dynamic spectrum sharing (DSS) scenario, which is a critical technology to efficiently utilize the precious spectrum resources in 5G and future 6G wireless networks. Deep Reinforcement Learning Echo State Network Dynamic Spectrum Sharing
10	ACADIA: Efficient and Robust Adversarial Attacks Against Deep Reinforcement Learning Ali, Haider 05 January 2023 (has links) Existing adversarial algorithms for Deep Reinforcement Learning (DRL) have largely focused on identifying an optimal time to attack a DRL agent. However, little work has been explored in injecting efficient adversarial perturbations in DRL environments. We propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. ACADIA provides a set of efficient and robust perturbation-based adversarial attacks to disturb the DRL agent's decision-making based on novel combinations of techniques utilizing momentum, ADAM optimizer (i.e., Root Mean Square Propagation or RMSProp), and initial randomization. These kinds of DRL attacks with novel integration of such techniques have not been studied in the existing Deep Neural Networks (DNNs) and DRL research. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without the state-of-the-art defenses in DRL (i.e., RADIAL and ATLA). Our results demonstrate that the proposed ACADIA outperforms existing gradient-based counterparts under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini and Wagner (CW) method with better performance under defenses of DRL. / Master of Science / Artificial Intelligence (AI) techniques such as Deep Neural Networks (DNN) and Deep Reinforcement Learning (DRL) are prone to adversarial attacks. For example, a perturbed stop sign can force a self-driving car's AI algorithm to increase the speed rather than stop the vehicle. There has been little work developing attacks and defenses against DRL. In DRL, a DNN-based policy decides to take an action based on the observation of the environment and gets the reward in feedback for its improvements. We perturb that observation to attack the DRL agent. There are two main aspects to developing an attack on DRL. One aspect is to identify an optimal time to attack (when-to-attack?). The second aspect is to identify an eﬀicient method to attack (how-to-attack?). To answer the second aspect, we propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under DRL environments of Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without state-of-the-art defenses. Our results demonstrate that the proposed ACADIA outperforms state-of-the-art perturbation methods under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini and Wagner (CW) method with better performance under the defenses of DRL. Secure AI Deep Reinforcement Learning Adversarial Learning Adversarial Attacks

Search results