Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
271 |
Individual differences in structure learningNewlin, Philip 13 May 2022 (has links)
Humans have a tendency to impute structure spontaneously even in simple learning tasks, however the way they approach structure learning can vary drastically. The present study sought to determine why individuals learn structure differently. One hypothesized explanation for differences in structure learning is individual differences in cognitive control. Cognitive control allows individuals to maintain representations of a task and may interact with reinforcement learning systems. It was expected that individual differences in propensity to apply cognitive control, which shares component processes with hierarchical reinforcement learning, may explain how individuals learn structure differently in a simple structure learning task. Results showed that proactive control and model-based control explained differences in the rate at which individuals applied structure learning.
|
272 |
Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning ApproachYang, Zhaoyuan, Yang 15 August 2018 (has links)
No description available.
|
273 |
Deep Reinforcement Learning for Open Multiagent SystemZhu, Tianxing 20 September 2022 (has links)
No description available.
|
274 |
COMPARING AND CONTRASTING THE USE OF REINFORCEMENT LEARNING TO DRIVE AN AUTONOMOUS VEHICLE AROUND A RACETRACK IN UNITY AND UNREAL ENGINE 5Muhammad Hassan Arshad (16899882) 05 April 2024 (has links)
<p dir="ltr">The concept of reinforcement learning has become increasingly relevant in learning- based applications, especially in the field of autonomous navigation, because of its fundamental nature to operate without the necessity of labeled data. However, the infeasibility of training reinforcement learning based autonomous navigation applications in a real-world setting has increased the popularity of researching and developing on autonomous navigation systems by creating simulated environments in game engine platforms. This thesis investigates the comparative performance of Unity and Unreal Engine 5 within the framework of a reinforcement learning system applied to autonomous race car navigation. A rudimentary simulated setting featuring a model car navigating a racetrack is developed, ensuring uniformity in environmental aspects across both Unity and Unreal Engine 5. The research employs reinforcement learning with genetic algorithms to instruct the model car in race track navigation; while the tools and programming methods for implementing reinforcement learning vary between the platforms, the fundamental concept of reinforcement learning via genetic algorithms remains consistent to facilitate meaningful comparisons. The implementation includes logging of key performance variables during run times on each platform. A comparative analysis of the performance data collected demonstrates Unreal Engine's superior performance across the collected variables. These findings contribute insights to the field of autonomous navigation systems development and reinforce the significance of choosing an optimal underlying simulation platform for reinforcement learning applications.</p>
|
275 |
Agent Contribution in Multi-Agent Reinforcement Learning : A Case Study in Remote Electrical TiltEmanuelsson, William January 2024 (has links)
As multi-agent reinforcement learning (MARL) continues to evolve and find applications in complex real-world systems, the imperative for explainability in these systems becomes increasingly critical. Central to enhancing this explainability is tackling the credit assignment problem, a key challenge in MARL that involves quantifying the individual contributions of agents toward a common goal. In addressing this challenge, this thesis introduces and explores the application of Local and Global Shapley Values (LSV and GSV) within MARL contexts. These novel adaptations of the traditional Shapley value from cooperative game theory are investigated particularly in the context of optimizing remote electrical tilt in telecommunications antennas. Using both predator-prey and remote electrical tilt environments, the study delves into local and global explanations, examining how the Shapley value can illuminate changes in agent contributions over time and across different states, as well as aggregate these insights over multiple episodes. The research findings demonstrate that the use of Shapley values enhances the understanding of individual agent behaviors, offers insights into policy suboptimalities and environmental nuances, and aids in identifying agent redundancies—a feature with potential applications in energy savings in real-world systems. Altogether, this thesis highlights the considerable potential of employing the Shapley value as a tool in explainable MARL. / I takt med utvecklingen och tillämpningen av multi-agent förstärkningsinlärning (MARL) i komplexa verkliga system, blir behovet av förklarbarhet i dessa system allt mer väsentligt. För att förbättra denna förklarbarhet är det viktigt att lösa problemet med belöningstilldelning, en nyckelutmaning i MARL som innefattar att kvantifiera de enskilda bidragen från agenter mot ett gemensamt mål. I denna uppsats introduceras och utforskas tillämpningen av lokala och globala Shapley-värden (LSV och GSV) inom MARL-sammanhang. Dessa nya anpassningar av det traditionella Shapley-värdet från samarbetsbaserad spelteori undersöks särskilt i sammanhanget av att optimera fjärrstyrda elektriska lutningar i telekommunikationsantenner. Genom att använda både rovdjur-byte och fjärrstyrda elektriska lutningsmiljöer fördjupar studien sig i lokala och globala förklaringar, och undersöker hur Shapley-värdet kan belysa förändringar i agenters bidrag över tid och över olika tillstånd, samt sammanfatta dessa insikter över flera episoder. Resultaten visar att användningen av Shapley-värden förbättrar förståelsen för individuella agentbeteenden, erbjuder insikter i policybrister och miljönyanser, och hjälper till att identifiera agentredundanser – en egenskap med potentiella tillämpningar för energibesparingar i verkliga system. Sammanfattningsvis belyser denna uppsats den betydande potentialen av att använda Shapley-värdet som ett verktyg i förklaringsbar MARL.
|
276 |
Statistical Methods for Offline Deep Reinforcement LearningDanyang Wang (18414336) 20 April 2024 (has links)
<p dir="ltr">Reinforcement learning (RL) has been a rapidly evolving field of research over the past years, enhancing developments in areas such as artificial intelligence, healthcare, and education, to name a few. Regardless of the success of RL, its inherent online learning nature presents obstacles for its real-world applications, since in many settings, online data collection with the latest learned policy can be expensive and/or dangerous (such as robotics, healthcare, and autonomous driving). This challenge has catalyzed research into offline RL, which involves reinforcement learning from previously collected static datasets, without the need for further online data collection. However, most existing offline RL methods depend on two key assumptions: unconfoundedness and positivity (also known as the full-coverage assumption), which frequently do not hold in the context of static datasets. </p><p dir="ltr">In the first part of this dissertation, we simultaneously address these two challenges by proposing a novel policy learning algorithm: PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on Front-Door Criterion, to remove the confounding bias. Additionally, we adopt the pessimistic principle to tackle the distributional shift problem induced by the under-coverage issue. This issue refers to the mismatch of distributions between the action distributions induced by candidate policies, and the policy that generates the observational data (known as the behavior policy). Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.</p><p dir="ltr">In the second part of this dissertation, in contrast to the first part, which approaches the distributional shift issue implicitly by penalizing the value function as a whole, we explicitly constrain the learned policy to not deviate significantly from the behavior policy, while still enabling flexible adjustment of the degree of constraints. Building upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, we propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based Actor Critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL benchmarks.</p>
|
277 |
Complexity and problem solving : A tale of two systemsAndersson, Marcus January 2018 (has links)
The purpose of this thesis is to investigate if increasing complexity for a problem makes a difference for a learning system with dual parts. The dual parts of the learning system are modelled after the Actor and Critic parts from the Actor-Critic algorithm, using the reinforcement learning framework. The results conclude that not any difference can be found in the relative performance in the Actor and Critic parts when increasing the complexity of a problem. These results could depend on technical difficulties in comparing the environments and the algorithms. The difference in complexity would then be non-uniform in an unknowable way and uncertain to use as comparison. If on the other hand the change of complexity is uniform, this could point to the fact that there is an actual difference in how each of the actor and critic handles different types of complexity. Further studies with a controlled increase in complexity are needed to establish which of the scenarios is most likely to be true. In the discussion an idea is presented of using the Actor-Critic framework as a model to understand the success rate of psychological treatments better. / Syftet med den här uppsatsen är att undersöka om en ökande komplexitet på ett problem, innebär en skillnad för ett lärande system med två samverkande. De två samverkande delarna som används är från “Actor” och “Critic”, som kommer ifrån algoritmen “Actor-Critic”. som implementeras med hjälp av ramverket “Reinforcement learning”. Resultaten bekräftar att det inte verkar vara någon skillnad i relativ effektivitet mellan “Actor” och “Critic” när komplexiteten ändras mellan två problem. Detta kan bero på tekniska svårigheter att jämföra miljöerna i experimentet och algoritmerna som används. Om det finns problem med jämförelserna skulle skillnaden i komplexitet vara icke-uniform på ett obestämbart sätt, och att kunna göra jämförelser blir därför svårt. Däremot om skillnaden i komplexitet är uniform, skulle det kunna tyda på det kanske finns en skillnad i hur “Actor” och “Critic” hanterar olika typer av komplexitet. Vidare studier med kontrollerade ökningar för komplexiteten är nödvändiga för att fastställa hur “Actor-Crtic” algoritmen samverkar med skillnader i komplexitet. I diskussionen presenteras iden att använda Actor-Critic modellen för att förstå metoder för psykologiska behandlingar bättre.
|
278 |
MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGSQinbo Bai (19804362) 07 October 2024 (has links)
<p dir="ltr">Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.</p><p dir="ltr">Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.</p><p dir="ltr">However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.</p>
|
279 |
Remembering how to walk - Using Active Dendrite Networks to Drive Physical Animations / Att minnas att gå - användning av Active Dendrite Nätverk för att driva fysiska animeringarHenriksson, Klas January 2023 (has links)
Creating embodied agents capable of performing a wide range of tasks in different types of environments has been a longstanding challenge in deep reinforcement learning. A novel network architecture introduced in 2021 called the Active Dendrite Network [A. Iyer et al., “Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments”] designed to create sparse subnetworks for different tasks showed promising multi-tasking performance on the Meta-World [T. Yu et al., “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning”] multi-tasking benchmark. This thesis further explores the performance of this novel architecture in a multi-tasking environment focused on physical animations and locomotion. Specifically we implement and compare the architecture to the commonly used Multi-Layer Perceptron (MLP) architecture on a multi-task reinforcement learning problem in a video-game setting consisting of training a hexapedal agent on a set of locomotion tasks involving moving at different speeds, turning and standing still. The evaluation focused on two areas: (1) Assessing the average overall performance of the Active Dendrite Network relative to the MLP on a set of locomotive scenarios featuring our behaviour sets and environments. (2) Assessing the relative impact Active Dendrite networks have on transfer learning between related tasks by comparing their performance on novel behaviours shortly after training a related behaviour. Our findings suggest that the novel Active Dendrite Network can make better use of limited network capacity compared to the MLP - the Active Dendrite Network outperformed the MLP by ∼18% on our benchmark using limited network capacity. When both networks have sufficient capacity however, there is not much difference between the two. We further find that Active Dendrite Networks have very similar transfer-learning capabilities compared to the MLP in our benchmarks.
|
280 |
Towards Novelty-Resilient AI: Learning in the Open WorldTrevor A Bonjour (18423153) 22 April 2024 (has links)
<p dir="ltr">Current artificial intelligence (AI) systems are proficient at tasks in a closed-world setting where the rules are often rigid. However, in real-world applications, the environment is usually open and dynamic. In this work, we investigate the effects of such dynamic environments on AI systems and develop ways to mitigate those effects. Central to our exploration is the concept of \textit{novelties}. Novelties encompass structural changes, unanticipated events, and environmental shifts that can confound traditional AI systems. We categorize novelties based on their representation, anticipation, and impact on agents, laying the groundwork for systematic detection and adaptation strategies. We explore novelties in the context of stochastic games. Decision-making in stochastic games exercises many aspects of the same reasoning capabilities needed by AI agents acting in the real world. A multi-agent stochastic game allows for infinitely many ways to introduce novelty. We propose an extension of the deep reinforcement learning (DRL) paradigm to develop agents that can detect and adapt to novelties in these environments. To address the sample efficiency challenge in DRL, we introduce a hybrid approach that combines fixed-policy methods with traditional DRL techniques, offering enhanced performance in complex decision-making tasks. We present a novel method for detecting anticipated novelties in multi-agent games, leveraging information theory to discern patterns indicative of collusion among players. Finally, we introduce DABLER, a pioneering deep reinforcement learning architecture that dynamically adapts to changing environmental conditions through broad learning approaches and environment recognition. Our findings underscore the importance of developing AI systems equipped to navigate the uncertainties of the open world, offering promising pathways for advancing AI research and application in real-world settings.</p>
|
Page generated in 0.1148 seconds