Spelling suggestions: "subject:"deep reinforcement 1earning"" "subject:"deep reinforcement c1earning""
41 |
Information Freshness: How To Achieve It and Its Impact On Low- Latency Autonomous SystemsChoudhury, Biplav 03 June 2022 (has links)
In the context of wireless communications, low latency autonomous systems continue to grow in importance. Some applications of autonomous systems where low latency communication is essential are (i) vehicular network's safety performance depends on how recently the vehicles are updated on their neighboring vehicle's locations, (ii) updates from IoT devices need to be aggregated appropriately at the monitoring station before the information gets stale to extract temporal and spatial information from it, and (iii) sensors and controllers in a smart grid need to track the most recent state of the system to tune system parameters dynamically, etc. Each of the above-mentioned applications differs based on the connectivity between the source and the destination. First, vehicular networks involve a broadcast network where each of the vehicles broadcasts its packets to all the other vehicles. Secondly, in the case of UAV-assisted IoT networks, packets generated at multiple IoT devices are transmitted to a final destination via relays. Finally for the smart grid and generally for distributed systems, each source can have varying and unique destinations. Therefore in terms of connectivity, they can be categorized into one-to-all, all-to-one, and variable relationship between the number of sources and destinations. Additionally, some of the other major differences between the applications are the impact of mobility, the importance of a reduced AoI, centralized vs distributed manner of measuring AoI, etc. Thus the wide variety of application requirements makes it challenging to develop scheduling schemes that universally address minimizing the AoI.
All these applications involve generating time-stamped status updates at a source which are then transmitted to their destination over a wireless medium. The timely reception of these updates at the destination decides the operating state of the system. This is because the fresher the information at the destination, the better its awareness of the system state for making better control decisions. This freshness of information is not the same as maximizing the throughput or minimizing the delay. While ideally throughput can be maximized by sending data as fast as possible, this may saturate the receiver resulting in queuing, contention, and other delays. On the other hand, these delays can be minimized by sending updates slowly, but this may cause high inter-arrival times. Therefore, a new metric called the Age of Information (AoI) has been proposed to measure the freshness of information that can account for many facets that influence data availability. In simple terms, AoI is measured at the destination as the time elapsed since the generation time of the most recently received update. Therefore AoI is able to incorporate both the delay and the inter-packet arrival time. This makes it a much better metric to measure end-to-end latency, and hence characterize the performance of such time-sensitive systems. These basic characteristics of AoI are explained in detail in Chapter 1. Overall, the main contribution of this dissertation is developing scheduling and resource allocation schemes targeted at improving the AoI of various autonomous systems having different types of connectivity, namely vehicular networks, UAV-assisted IoT networks, and smart grids, and then characterizing and quantifying the benefits of a reduced AoI from the application perspective.
In the first contribution, we look into minimizing AoI for the case of broadcast networks having one-to-all connectivity between the source and destination devices by considering the case of vehicular networks. While vehicular networks have been studied in terms of AoI minimization, the impact of mobility and the benefit of a reduced AoI from the application perspective has not been investigated. The mobility of the vehicles is realistically modeled using the Simulation of Urban Mobility (SUMO) software to account for overtaking, lane changes, etc. We propose a safety metric that indicates the collision risk of a vehicle and do a simulation-based study on the ns3 simulator to study its relation to AoI. We see that the broadcast rate in a Dedicated Short Range Network (DSRC) that minimizes the system AoI also has the least collision risk, therefore signifying that reducing AoI improves the on-road safety of the vehicles. However, we also show that this relationship is not universally true and the mobility of the vehicles becomes a crucial aspect. Therefore, we propose a new metric called the Trackability-aware AoI (TAoI) which ensures that vehicles with unpredictable mobility broadcast at a faster rate while vehicles that are predicable are broadcasting at a reduced rate. The results obtained show that minimizing TAoI provides much better on-road safety as compared to plain AoI minimizing, which points to the importance of mobility in such applications.
In the second contribution, we focus on networks with all-to-one connectivity where packets from multiple sources are transmitted to a single destination by taking an example of IoT networks. Here multiple IoT devices measure a physical phenomenon and transmit these measurements to a central base station (BS). However, under certain scenarios, the BS and IoT devices are unable to communicate directly and this necessitates the use of UAVs as relays. This creates a two-hop scenario that has not been studied for AoI minimization in UAV networks. In the first hop, the packets have to be sampled from the IoT devices to the UAV and then updated from the UAVs to the BS in the second hop. Such networks are called UAV-assisted IoT networks. We show that under ideal conditions with a generate-at-will traffic generation model and lossless wireless channels, the Maximal Age Difference (MAD) scheduler is the optimal AoI minimizing scheduler. When the ideal conditions are not applicable and more practical conditions are considered, a reinforcement learning (RL) based scheduler is desirable that can account for packet generation patterns and channel qualities. Therefore we propose to use a Deep-Q-Network (DQN)-based scheduler and it outperforms MAD and all other schedulers under general conditions. However, the DQN-based scheduler suffers from scalability issues in large networks. Therefore, another type of RL algorithm called Proximal Policy Optimization (PPO) is proposed to be used for larger networks. Additionally, the PPO-based scheduler can account for changes in the network conditions which the DQN-based scheduler was not able to do. This ensures the trained model can be deployed in environments that might be different than the trained environment.
In the final contribution, AoI is studied in networks with varying connectivity between the source and destination devices. A typical example of such a distributed network is the smart grid where multiple devices exchange state information to ensure the grid operates in a stable state. To investigate AoI minimization and its impact on the smart grid, a co-simulation platform is designed where the 5G network is modeled in Python and the smart grid is modeled in PSCAD/MATLAB. In the first part of the study, the suitability of 5G in supporting smart grid operations is investigated. Based on the encouraging results that 5G can support a smart grid, we focus on the schedulers at the 5G RAN to minimize the AoI. It is seen that the AoI-based schedulers provide much better stability compared to traditional 5G schedulers like the proportional fairness and round-robin. However, the MAD scheduler which has been shown to be optimal for a variety of scenarios is no longer optimal as it cannot account for the connectivity among the devices. Additionally, distributed networks with heterogeneous sources will, in addition to the varying connectivity, have different sized packets requiring a different number of resource blocks (RB) to transmit, packet generation patterns, channel conditions, etc. This motivates an RL-based approach. Hence we propose a DQN-based scheduler that can take these factors into account and results show that the DQN-based scheduler outperforms all other schedulers in all considered conditions. / Doctor of Philosophy / Age of information (AoI) is an exciting new metric as it is able to characterize the freshness of information, where freshness means how representative the information is of the current system state. Therefore it is being actively investigated for a variety of autonomous systems that rely on having the most up-to-date information on the current state. Some examples are vehicular networks, UAV networks, and smart grids. Vehicular networks need the real-time location of their neighbor vehicles to make maneuver decisions, UAVs have to collect the most recent information from IoT devices for monitoring purposes, and devices in a smart grid need to ensure that they have the most recent information on the desired system state. From a communication point of view, each of these scenarios presents a different type of connectivity between the source and the destination. First, the vehicular network is a broadcast network where each vehicle broadcasts its packets to every other vehicle. Secondly, in the UAV network, multiple devices transmit their packets to a single destination via a relay. Finally, with the smart grid and the generally distributed networks, every source can have different and unique destinations. In these applications, AoI becomes a natural choice to measure the system performance as the fresher the information at the destination, the better its awareness of the system state which allows it to take better control decisions to reach the desired objective.
Therefore in this dissertation, we use mathematical analysis and simulation-based approaches to investigate different scheduling and resource allocation policies to improve the AoI for the above-mentioned scenarios. We also show that the reduced AoI improves the system performance, i.e., better on-road safety for vehicular networks and better stability for smart grid applications. The results obtained in this dissertation show that when designing communication and networking protocols for time-sensitive applications requiring low latency, they have to be optimized to improve AoI. This is in contrast to most modern-day communication protocols that are targeted at improving the throughput or minimizing the delay.
|
42 |
Enabling IoV Communication through Secure Decentralized Clustering using Federated Deep Reinforcement LearningScott, Chandler 01 August 2024 (has links) (PDF)
The Internet of Vehicles (IoV) holds immense potential for revolutionizing transporta- tion systems by facilitating seamless vehicle-to-vehicle and vehicle-to-infrastructure communication. However, challenges such as congestion, pollution, and security per- sist, particularly in rural areas with limited infrastructure. Existing centralized solu- tions are impractical in such environments due to latency and privacy concerns. To address these challenges, we propose a decentralized clustering algorithm enhanced with Federated Deep Reinforcement Learning (FDRL). Our approach enables low- latency communication, competitive packet delivery ratios, and cluster stability while preserving data privacy. Additionally, we introduce a trust-based security framework for IoV environments, integrating a central authority and trust engine to establish se- cure communication and interaction among vehicles and infrastructure components. Through these innovations, we contribute to safer, more efficient, and trustworthy IoV deployments, paving the way for widespread adoption and realizing the transfor- mative potential of IoV technologies.
|
43 |
Real-Time Resource Optimization for Wireless NetworksHuang, Yan 11 January 2021 (has links)
Resource allocation in modern wireless networks is constrained by increasingly stringent real-time requirements. Such real-time requirements typically come from, among others, the short coherence time on a wireless channel, the small time resolution for resource allocation in OFDM-based radio frame structure, or the low-latency requirements from delay-sensitive applications. An optimal resource allocation solution is useful only if it can be determined and applied to the network entities within its expected time. For today's wireless networks such as 5G NR, such expected time (or real-time requirement) can be as low as 1 ms or even 100 μs. Most of the existing resource optimization solutions to wireless networks do not explicitly take real-time requirement as a constraint when developing solutions. In fact, the mainstream of research works relies on the asymptotic complexity analysis for designing solution algorithms. Asymptotic complexity analysis is only concerned with the growth of its computational complexity as the input size increases (as in the big-O notation). It cannot capture the real-time requirement that is measured in wall-clock time. As a result, existing approaches such as exact or approximate optimization techniques from operations research are usually not useful in wireless networks in the field. Similarly, many problem-specific heuristic solutions with polynomial-time asymptotic complexities may suffer from a similar fate, if their complexities are not tested in actual wall-clock time.
To address the limitations of existing approaches, this dissertation presents novel real- time solution designs to two types of optimization problems in wireless networks: i) problems that have closed-form mathematical models, and ii) problems that cannot be modeled in closed-form. For the first type of problems, we propose a novel approach that consists of (i) problem decomposition, which breaks an original optimization problem into a large number of small and independent sub-problems, (ii) search intensification, which identifies the most promising problem sub-space and selects a small set of sub-problems to match the available GPU processing cores, and (iii) GPU-based large-scale parallel processing, which solves the selected sub-problems in parallel and finds a near-optimal solution to the original problem. The efficacy of this approach has been illustrated by our solutions to the following two problems.
• Real-Time Scheduling to Achieve Fair LTE/Wi-Fi Coexistence: We investigate a resource optimization problem for the fair coexistence between LTE and Wi-Fi in the unlicensed spectrum. The real-time requirement for finding the optimal channel division and LTE resource allocation solution is on 1 ms time scale. This problem involves the optimal division of transmission time for LTE and Wi-Fi across multi- ple unlicensed bands, and the resource allocation among LTE users within the LTE's "ON" periods. We formulate this optimization problem as a mixed-integer linear pro- gram and prove its NP-hardness. Then by exploiting the unique problem structure, we propose a real-time solution design that is based on problem decomposition and GPU-based parallel processing techniques. Results from an implementation on the NVIDIA GPU/CUDA platform demonstrate that the proposed solution can achieve near-optimal objective and meet the 1 ms timing requirement in 4G LTE.
• An Ultrafast GPU-based Proportional Fair Scheduler for 5G NR: We study the popular proportional-fair (PF) scheduling problem in a 5G NR environment. The real-time requirement for determining the optimal (with respect to the PF objective) resource allocation and MCS selection solution is 125 μs (under 5G numerology 3). In this problem, we need to allocate frequency-time resource blocks on an operating channel and assign modulation and coding scheme (MCS) for each active user in the cell. We present GPF+ — a GPU based real-time PF scheduler. With GPF+, the original PF optimization problem is decomposed into a large number of small and in- dependent sub-problems. We then employ a cross-entropy based search intensification technique to identify the most promising problem sub-space and select a small set of sub-problems to fit into a GPU. After solving the selected sub-problems in parallel using GPU cores, we find the best sub-problem solution and use it as the final scheduling solution. Evaluation results show that GPF+ is able to provide near-optimal PF performance in a 5G cell while meeting the 125 μs real-time requirement.
For the second type of problems, where there is no closed-form mathematical formulation, we propose to employ model-free deep learning (DL) or deep reinforcement learning (DRL) techniques along with judicious consideration of timing requirement throughout the design. Under DL/DRL, we employ deep function approximators (neural networks) to learn the unknown objective function of an optimization problem, approximate an optimal algorithm to find resource allocation solutions, or discover important mapping functions related to the resource optimization. To meet the real-time requirement, we propose to augment DL or DRL methods with optimization techniques at the input or output of the deep function approximators to reduce their complexities and computational time. Under this approach, we study the following two problems:
• A DRL-based Approach to Dynamic eMBB/URLLC Multiplexing in 5G NR: We study the problem of dynamic multiplexing of eMBB and URLLC on the same channel through preemptive resource puncturing. The real-time requirement for determining the optimal URLLC puncturing solution is 1 ms (under 5G numerology 0). A major challenge in solving this problem is that it cannot be modeled using closed-form mathematical expressions. To address this issue, we develop a model-free DRL approach which employs a deep neural network to learn an optimal algorithm to allocate the URLLC puncturing over the operating channel, with the objective of minimizing the adverse impact from URLLC traffic on eMBB. Our contributions include a novel learning method that exploits the intrinsic properties of the URLLC puncturing optimization problem to achieve a fast and stable learning convergence, and a mechanism to ensure feasibility of the deep neural network's output puncturing solution. Experimental results demonstrate that our DRL-based solution significantly outperforms state-of-the-art algorithms proposed in the literature and meets the 1 ms real-time requirement for dynamic multiplexing.
• A DL-based Link Adaptation for eMBB/URLLC Multiplexing in 5G NR: We investigate MCS selection for eMBB traffic under the impact of URLLC preemptive puncturing. The real-time requirement for determining the optimal MCSs for all eMBB transmissions scheduled in a transmission interval is 125 μs (under 5G numerology 3). The objective is to have eMBB meet a given block-error rate (BLER) target under the adverse impact of URLLC puncturing. Since this problem cannot be mathematically modeled in closed-form, we proposed a DL-based solution design that uses a deep neural network to learn and predict the BLERs of a transmission under each MCS level. Then based on the BLER predictions, an optimal MCS can be found for each transmission that can achieve the BLER target. To meet the 5G real-time requirement, we implement this design through a hybrid CPU and GPU architecture to minimize the execution time. Extensive experimental results show that our design can select optimal MCS under the impact of preemptive puncturing and meet the 125 μs timing requirement. / Doctor of Philosophy / In modern wireless networks such as 4G LTE and 5G NR, the optimal allocation of radio resources must be performed within a real-time requirement of 1 ms or even 100 μs time scale. Such a real-time requirement comes from the physical properties of wireless channels, the short time resolution for resource allocation defined in the wireless communication standards, and the low-latency requirement from delay-sensitive applications.
Real-time requirement, although necessary for wireless networks in the field, has hardly been considered as a key constraint for solution design in the research community. Existing solutions in the literature mostly consider theoretical computational complexities, rather than actual computation time as measured by wall clock.
To address the limitations of existing approaches, this dissertation presents real-time solution designs to two types of optimization problems in wireless networks: i) problems that have mathematical models, and ii) problems that cannot be modeled mathematically. For the first type of problems, we propose a novel approach that consists of (i) problem decomposition, (ii) search intensification, and (iii) GPU-based large-scale parallel processing techniques. The efficacy of this approach has been illustrated by our solutions to the following two problems.
• Real-Time Scheduling to Achieve Fair LTE/Wi-Fi Coexistence: We investigate a resource optimization problem for the fair coexistence between LTE and Wi-Fi users in the same (unlicensed) spectrum. The real-time requirement for finding the optimal LTE resource allocation solution is on 1 ms time scale.
• An Ultrafast GPU-based Proportional Fair Scheduler for 5G NR: We study the popular proportional-fair (PF) scheduling problem in a 5G NR environment. The real-time requirement for determining the optimal resource allocation and modulation and coding scheme (MCS) for each user is 125 μs.
For the second type of problems, where there is no mathematical formulation, we propose to employ model-free deep learning (DL) or deep reinforcement learning (DRL) techniques along with judicious consideration of timing requirement throughout the design. Under this approach, we study the following two problems:
• A DRL-based Approach to Dynamic eMBB/URLLC Multiplexing in 5G NR: We study the problem of dynamic multiplexing of eMBB and URLLC on the same channel through preemptive resource puncturing. The real-time requirement for determining the optimal URLLC puncturing solution is 1 ms.
• A DL-based Link Adaptation for eMBB/URLLC Multiplexing in 5G NR: We investigate MCS selection for eMBB traffic under the impact of URLLC preemptive puncturing. The real-time requirement for determining the optimal MCSs for all eMBB transmissions scheduled in a transmission interval is 125 μs.
|
44 |
End-to-End Autonomous Driving with Deep Reinforcement Learning in Simulation EnvironmentsWang, Bingyu 10 April 2024 (has links)
In the rapidly evolving field of autonomous driving, the integration of Deep Reinforcement Learning (DRL) promises significant advancements towards achieving reliable and efficient vehicular systems. This study presents a comprehensive examination of DRL’s application within a simulated autonomous driving context, with a focus on the nuanced impact of representation learning parameters on the performance of end-to-end models. An overview of the theoretical underpinnings of machine learning, deep learning, and reinforcement learning is provided, laying the groundwork for their application in autonomous driving scenarios. The methodology outlines a detailed framework for training autonomous vehicles in the Duckietown simulation environment, employing both non-end-to-end and end-to-end models to investigate the effectiveness of various reinforcement learning algorithms and representation learning techniques.
At the heart of this research are extensive simulation experiments designed to evaluate the Proximal Policy Optimization (PPO) algorithm’s effectiveness within the established framework. The study delves into reward structures and the impact of representation learning parameters on the performance of end-to-end models. A critical comparison of the models in the validation chapter highlights the significant role of representation learning parameters in the outcomes of DRL-based autonomous driving systems.
The findings reveal that meticulous adjustment of representation learning parameters markedly influences the end-to-end training process. Notably, image segmentation techniques significantly enhance feature recognizability and model performance.:Contents
List of Figures
List of Tables
List of Abbreviations
List of Symbols
1 Introduction
1.1 Autonomous Driving Overview
1.2 Problem Description
1.3 Research Structure
2 Research Background
2.1 Theoretical Basis
2.1.1 Machine Learning
2.1.2 Deep Learning
2.1.3 Reinforcement Learning
2.2 Related Work
3 Methodology
3.1 Problem Definition
3.2 Simulation Platform
3.3 Observation Space
3.3.1 Observation Space of Non-end-to-end model
3.3.2 Observation Space of end-to-end model
3.4 Action Space
3.5 Reward Shaping
3.5.1 speed penalty
3.5.2 position reward
3.6 Map and training dataset
3.6.1 Map Design
3.6.2 Training Dataset
3.7 Variational Autoencoder Structure
3.7.1 Mathematical fundation for VAE
3.8 Reinforcement Learning Framework
3.8.1 Actor-Critic Method
3.8.2 Policy Gradient
3.8.3 Trust Region Policy Optimization
3.8.4 Proximal Policy Optimization
4 Simulation Experiments
4.1 Experimental Setup
4.2 Representation Learning Model
4.3 End-to-end Model
5 Result
6 Validation and Evaluation
6.1 Validation of End-to-end Model
6.2 Evaluation of End-to-end Model
6.2.1 Comparison with Baselines
6.2.2 Comparison with Different Representation Learning Model
7 Conclusion and Future Work
7.1 Summary
7.2 Future Research
|
45 |
AI-Based Self-Adaptive Software in a 5G Simulation / AI-baserad självanpassande programvara i 5G-simuleringJönsson, Axel, Hammarhjelm, Erik January 2024 (has links)
5G has emerged to revolutionize the telecommunications industry. With its many possibilities, there are also great challenges, such as maintaining the increased complexity of themany parameters in these new networks. It is a common practice to test new features of thenetworks before employing them, and this is often done in a simulated environment. Thetask of this thesis was to investigate if self-adaptive software, in simulations at Ericsson,could dynamically change the bandwidth to increase the net throughput while minimizingthe packet loss, i.e. to maximize the overall quality of service on the network, without theneed of human intervention. A simple simulation of a 5G network was created to trainand test the effect of two proposed AI-models. The models tested were Proximal PolicyOptimization and Deep Deterministic Policy Gradient, where the former model showedpromising results while the latter did not yield any significant improvements comparedto the benchmarks. The study indicates that self-adaptive software, in simulated environments, can effectively be achieved by using AI while increasing the quality of service.
|
46 |
Dynamic Maze Puzzle Navigation Using Deep Reinforcement LearningChiu, Luisa Shu Yi 01 September 2024 (has links) (PDF)
The implementation of deep reinforcement learning in mobile robotics offers a great solution for the development of autonomous mobile robots to efficiently complete tasks and transport objects. Reinforcement learning continues to show impressive potential in robotics applications through self-learning and biological plausibility. Despite its advancements, challenges remain in applying these machine learning techniques in dynamic environments. This thesis explores the performance of Deep Q-Networks (DQN), using images as an input, for mobile robot navigation in dynamic maze puzzles and aims to contribute to advancements in deep reinforcement learning applications for simulated and real-life robotic systems. This project is a step towards implementation in a hardware-based system. The proposed approach uses a DQN algorithm with experience replay and an epsilon-greedy annealing schedule. Experiments are conducted to train DQN agents in static and dynamic maze environments, and various reward functions and training strategies are explored to optimize learning outcomes. In this context, the dynamic aspect involves training the agent on fixed mazes and then testing its performance on modified mazes, where obstacles like walls alter previously optimal paths to the goal. In game play, the agent achieved a 100\% win rate in both 4x4 and 10x10 static mazes, successfully making it to the goal regardless of slip conditions. The number of rewards obtained during the game-play episodes indicates that the agent took the optimal path in all 100 episodes of the 4x4 maze without the slip condition, whereas it took the shortest, most optimal path in 99 out of 100 episodes in the 4x4 maze with the slip condition. Compared to the 4x4 maze, the agent more frequently chose sub-optimal paths in the larger 10x10 maze, as indicated by the amount of times the agent maximized rewards obtained. In the 10x10 static maze game-play, the agent took the optimal path in 96 out of 100 episodes for the no slip condition, while it took the shortest path in 93 out of 100 episodes for the slip condition. In the dynamic maze experiment, the agent successfully solved 7 out of 8 mazes with a 100\% win rate in both original and modified maze environments. The results indicate that adequate exploration, well-designed reward functions, and diverse training data significantly impacted both training performance and game play outcomes. The findings suggest that DQN approaches are plausible solutions to stochastic outcomes, but expanding upon the proposed method and more research is needed to improve this methodology. This study highlights the need for further efforts in improving deep reinforcement learning applications in dynamic environments.
|
47 |
Autonomous Navigation with Deep Reinforcement Learning in Carla SimulatorWang, Peilin 08 December 2023 (has links)
With the rapid development of autonomous driving and artificial intelligence technology, end-to-end autonomous driving technology has become a research hotspot. This thesis aims to explore the application of deep reinforcement learning in the realizing of end-to-end autonomous driving. We built a deep reinforcement learning virtual environment in the Carla simulator, and based on it, we trained a policy model to control a vehicle along a preplanned route. For the selection of the deep reinforcement learning algorithms, we have used the Proximal Policy Optimization algorithm due to its stable performance. Considering the complexity of end-to-end autonomous driving, we have also carefully designed a comprehensive reward function to train the policy model more efficiently. The model inputs for this study are of two types: firstly, real-time road information and vehicle state data obtained from the Carla simulator, and secondly, real-time images captured by the vehicle's front camera. In order to understand the influence of different input information on the training effect and model performance, we conducted a detailed comparative analysis. The test results showed that the accuracy and significance of the information has a significant impact on the learning effect of the agent, which in turn has a direct impact on the performance of the model. Through this study, we have not only confirmed the potential of deep reinforcement learning in the field of end-to-end autonomous driving, but also provided an important reference for future research and development of related technologies.
|
48 |
Design and Validation of a Myoelectric Bilateral Cable-driven Upper Body Exosuit and a Deep Reinforcement Learning-based Motor Controller for an Upper Extremity SimulatorFu, Jirui 01 January 2024 (has links) (PDF)
Upper Limb work-related musculoskeletal disorders (WMSDs) present a significant health risk to industrial workers. To address this, rigid-body exoskeletons have been widely used in industrial settings to mitigate these risks while exosuits offer advantages such as reduced weight, lower inertia, and no need for precise joint alignment, However, they remain in the early stages of development, especially for reducing muscular effort in repetitive and forceful tasks like heavy lifting and overhead work. This study introduces a multiple degrees-of-freedom cable-driven upper limb bilateral exosuit for human power augmentation. Two control schemes were developed and compared: an IMU based controller, and a myoelectric controller to compensate for joint torque exerted by the wearer. The results of preliminary experiments showed a substantial reduction in muscular effort with the exosuit's assistance, with the myoelectric control scheme exhibiting reduced operational delay.
In parallel, the neuromusculoskeletal modeling and simulator (NMMS) has been widely applied in various fields. Most of the research works implements the PD-based internal model of human’s central nervous system to simulate the generated muscle activation. However, the PD-based internal models in recent works are tuned by the empirical data which requires empirical data from human subject experiments. In this dissertation, an off-policy DRL algorithm, Deep Deterministic Policy Gradient was implemented to tune the PD-based internal model of human’s central nervous system. Compared to the conventional approaches, the DRL-based auto-tuner can learn the optimal policy through trial-and-error which doesn’t require human subject experiment and empirical data. The experiment this work showed promising results of this DRL-based auto-tuner for internal-model of human’s central nervous system.
|
49 |
Explainability in Deep Reinforcement LearningKeller, Jonas 29 October 2024 (has links)
With the combination of Reinforcement Learning (RL) and Artificial Neural Networks (ANNs), Deep Reinforcement Learning (DRL) agents are shifted towards being non-interpretable black-box models. Developers of DRL agents, however, could benefit from enhanced interpretability of the agents’ behavior, especially during the training process. Improved interpretability could enable developers to make informed adaptations, leading to better overall performance. The explainability methods Partial Dependence Plot (PDP), Accumulated Local Effects (ALE) and SHapley Additive exPlanations (SHAP) were considered to provide insights into how an agent’s behavior evolves during training. Additionally, a decision tree as a surrogate model was considered to enhance the interpretability of a trained agent. In a case study, the methods were tested on a Deep Deterministic Policy Gradient (DDPG) agent that was trained in an Obstacle Avoidance (OA) scenario. PDP, ALE and SHAP were evaluated towards their ability to provide explanations as well as the feasibility of their application in terms of computational overhead. The decision tree was evaluated towards its ability to approximate the agent’s policy as a post-hoc method. Results demonstrated that PDP, ALE and SHAP were able to provide valuable explanations during the training. Each method contributed additional information with their individual advantages. However, the decision tree failed to approximate the agent’s actions effectively to be used as a surrogate model.:List of Figures
List of Tables
List of Abbreviations
1 Introduction
2 Foundations
2.1 Machine Learning
2.1.1 Deep Learning
2.2 Reinforcement Learning
2.2.1 Markov Decision Process
2.2.2 Limitations of Optimal Solutions
2.2.3 Deep Reinforcement Learning
2.3 Explainability
2.3.1 Obstacles for Explainability Methods
3 Applied Explainability Methods
3.1 Real-Time Methods
3.1.1 Partial Dependence Plot
3.1.1.1 Incremental Partial Dependence Plots for Dynamic Modeling Scenarios
3.1.1.2 PDP-based Feature Importance
3.1.2 Accumulated Local Effects
3.1.3 SHapley Additive exPlanations
3.2 Post-Hoc Method: Global Surrogate Model
4 Case Study: Obstacle Avoidance
4.1 Environment Representation
4.2 Agent
4.3 Application Settings
5 Results
5.1 Problems of the Incremental Partial Dependence Plot
5.2 Real-Time Methods
5.2.1 Feature Importance
5.2.2 Computational Overhead
5.3 Global Surrogate Model
6 Discussion
7 Conclusion
Bibliography
Appendix
A Incremental Partial Dependence Results
|
50 |
Intelligent autoscaling in Kubernetes : the impact of container performance indicators in model-free DRL methods / Intelligent autoscaling in Kubernetes : påverkan av containerprestanda-indikatorer i modellfria DRL-metoderPraturlon, Tommaso January 2023 (has links)
A key challenge in the field of cloud computing is to automatically scale software containers in a way that accurately matches the demand for the services they run. To manage such components, container orchestrator tools such as Kubernetes are employed, and in the past few years, researchers have attempted to optimise its autoscaling mechanism with different approaches. Recent studies have showcased the potential of Actor-Critic Deep Reinforcement Learning (DRL) methods in container orchestration, demonstrating their effectiveness in various use cases. However, despite the availability of solutions that integrate multiple container performance metrics to evaluate autoscaling decisions, a critical gap exists in understanding how model-free DRL algorithms interact with a state space based on those metrics. Thus, the primary objective of this thesis is to investigate the impact of the state space definition on the performance of model-free DRL methods in the context of horizontal autoscaling within Kubernetes clusters. In particular, our findings reveal distinct behaviours associated with various sets of metrics. Notably, those sets that exclusively incorporate parameters present in the reward function demonstrate superior effectiveness. Furthermore, our results provide valuable insights when compared to related works, as our experiments demonstrate that a careful metric selection can lead to remarkable Service Level Agreement (SLA) compliance, with as low as 0.55% violations and even surpassing baseline performance in certain scenarios. / En viktig utmaning inom området molnberäkning är att automatiskt skala programvarubehållare på ett sätt som exakt matchar efterfrågan för de tjänster de driver. För att hantera sådana komponenter, container orkestratorverktyg som Kubernetes används, och i det förflutna några år har forskare försökt optimera dess autoskalning mekanism med olika tillvägagångssätt. Nyligen genomförda studier har visat potentialen hos Actor-Critic Deep Reinforcement Learning (DRL) metoder i containerorkestrering, som visar deras effektivitet i olika användningsfall. Men trots tillgången på lösningar som integrerar flera behållarprestandamått att utvärdera autoskalningsbeslut finns det ett kritiskt gap när det gäller att förstå hur modellfria DRLalgoritmer interagerar med ett tillståndsutrymme baserat på dessa mätvärden. Det primära syftet med denna avhandling är alltså att undersöka vilken inverkan statens rymddefinition har på prestandan av modellfria DRL-metoder i samband med horisontell autoskalning inom Kubernetes-kluster. I synnerhet visar våra resultat distinkta beteenden associerade med olika uppsättningar mätvärden. Särskilt de set som uteslutande innehåller parametrar som finns i belöningen funktion visar överlägsen effektivitet. Dessutom våra resultat ge värdefulla insikter jämfört med relaterade verk, som vår experiment visar att ett noggrant urval av mätvärden kan leda till anmärkningsvärt Service Level Agreement (SLA) efterlevnad, med så låg som 0, 55% överträdelser och till och med överträffande baslinjeprestanda i vissa scenarier.
|
Page generated in 0.1295 seconds