<p dir="ltr">The field of reinforcement learning has made significant progress in recent years, with deep reinforcement learning (RL) being a major contributor. However, there are still challenges associated with the effective training of RL algorithms, particularly with respect to sample efficiency and generalization. This thesis aims to address these challenges by developing RL algorithms capable of generalizing to unseen environments and adapting to dynamic conditions, thereby expanding the practical applicability of RL in real-world tasks. The first contribution of this thesis is the development of novel policy optimization techniques that enhance the generalization capabilities of RL agents. These techniques include the Thinker method, which employs style transfer to diversify observation trajectories, and Bootstrap Advantage Estimation, which improves policy and value function learning through augmented data. These methods have demonstrated superior performance in standard benchmarks, outperforming existing data augmentation and policy optimization techniques. Additionally, this thesis introduces Robust Policy Optimization, a method that enhances exploration in policy gradient-based RL by perturbing action distributions. This method addresses the limitations of traditional methods, such as entropy collapse and primacy bias, resulting in improved sample efficiency and adaptability in continuous action spaces. The thesis further explores the potential of natural language descriptions as an alternative to image-based state representations in RL. This approach enhances interpretability and generalization in tasks involving complex visual observations by leveraging large language models. Furthermore, this work contributes to the field of semi-autonomous teleoperated robotic surgery by developing systems capable of performing complex surgical tasks remotely, even under challenging conditions such as communication delays and data scarcity. The creation of the DESK dataset supports knowledge transfer across different robotic platforms, further enhancing the capabilities of these systems. Overall, the advancements presented in this thesis represent significant steps toward developing more robust, adaptable, and efficient autonomous agents. These contributions have broad implications for various real-world applications, including autonomous systems, robotics, and safety-critical tasks such as medical surgery.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/27188148 |
Date | 08 October 2024 |
Creators | Md Masudur Rahman (19818171) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/ENHANCING_POLICY_OPTIMIZATION_FOR_IMPROVED_SAMPLE_EFFICIENCY_AND_GENERALIZATION_IN_DEEP_REINFORCEMENT_LEARNING/27188148 |
Page generated in 0.0025 seconds