141 |
Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement LearningZachery Peter Berg (12455928) 25 April 2022 (has links)
<p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p>
|
142 |
Biased Exploration in Offline Hierarchical Reinforcement LearningMiller, Eric D. 26 January 2021 (has links)
No description available.
|
143 |
Model-Free Reinforcement Learning for Hierarchical OO-MDPsGoldblatt, John Dallan 23 May 2022 (has links)
No description available.
|
144 |
ENHANCING POLICY OPTIMIZATION FOR IMPROVED SAMPLE EFFICIENCY AND GENERALIZATION IN DEEP REINFORCEMENT LEARNINGMd Masudur Rahman (19818171) 08 October 2024 (has links)
<p dir="ltr">The field of reinforcement learning has made significant progress in recent years, with deep reinforcement learning (RL) being a major contributor. However, there are still challenges associated with the effective training of RL algorithms, particularly with respect to sample efficiency and generalization. This thesis aims to address these challenges by developing RL algorithms capable of generalizing to unseen environments and adapting to dynamic conditions, thereby expanding the practical applicability of RL in real-world tasks. The first contribution of this thesis is the development of novel policy optimization techniques that enhance the generalization capabilities of RL agents. These techniques include the Thinker method, which employs style transfer to diversify observation trajectories, and Bootstrap Advantage Estimation, which improves policy and value function learning through augmented data. These methods have demonstrated superior performance in standard benchmarks, outperforming existing data augmentation and policy optimization techniques. Additionally, this thesis introduces Robust Policy Optimization, a method that enhances exploration in policy gradient-based RL by perturbing action distributions. This method addresses the limitations of traditional methods, such as entropy collapse and primacy bias, resulting in improved sample efficiency and adaptability in continuous action spaces. The thesis further explores the potential of natural language descriptions as an alternative to image-based state representations in RL. This approach enhances interpretability and generalization in tasks involving complex visual observations by leveraging large language models. Furthermore, this work contributes to the field of semi-autonomous teleoperated robotic surgery by developing systems capable of performing complex surgical tasks remotely, even under challenging conditions such as communication delays and data scarcity. The creation of the DESK dataset supports knowledge transfer across different robotic platforms, further enhancing the capabilities of these systems. Overall, the advancements presented in this thesis represent significant steps toward developing more robust, adaptable, and efficient autonomous agents. These contributions have broad implications for various real-world applications, including autonomous systems, robotics, and safety-critical tasks such as medical surgery.</p>
|
145 |
Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environmentNgai, Chi-kit., 魏智傑. January 2007 (has links)
published_or_final_version / abstract / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
|
146 |
APPLICATION OF SWARM AND REINFORCEMENT LEARNING TECHNIQUES TO REQUIREMENTS TRACINGSultanov, Hakim 01 January 2013 (has links)
Today, software has become deeply woven into the fabric of our lives. The quality of the software we depend on needs to be ensured at every phase of the Software Development Life Cycle (SDLC). An analyst uses the requirements engineering process to gather and analyze system requirements in the early stages of the SDLC. An undetected problem at the beginning of the project can carry all the way through to the deployed product.
The Requirements Traceability Matrix (RTM) serves as a tool to demonstrate how requirements are addressed by the design and implementation elements throughout the entire software development lifecycle. Creating an RTM matrix by hand is an arduous task. Manual generation of an RTM can be an error prone process as well.
As the size of the requirements and design document collection grows, it becomes more challenging to ensure proper coverage of the requirements by the design elements, i.e., assure that every requirement is addressed by at least one design element. The techniques used by the existing requirements tracing tools take into account only the content of the documents to establish possible links. We expect that if we take into account the relative order of the text around the common terms within the inspected documents, we may discover candidate links with a higher accuracy.
The aim of this research is to demonstrate how we can apply machine learning algorithms to software requirements engineering problems. This work addresses the problem of requirements tracing by viewing it in light of the Ant Colony Optimization (ACO) algorithm and a reinforcement learning algorithm. By treating the documents as the starting (nest) and ending points (sugar piles) of a path and the terms used in the documents as connecting nodes, a possible link can be established and strengthened by attracting more agents (ants) onto a path between the two documents by using pheromone deposits. The results of the work show that ACO and RL can successfully establish links between two sets of documents.
|
147 |
Q-Learning: Ett sätt att lära agenter att spela fotboll / Q-Learning: A way to tach agents to play footballEkelund, Kalle January 2013 (has links)
Den artificiella intelligensen i spel brukar ofta använda sig utav regelbaserade tekniker för dess beteende. Detta har gjort att de artificiella agenterna blivit förutsägbara, vilket är väldigt tydligt för sportspel. Det här arbetet har utvärderat ifall inlärningstekniken Q-learning är bättre på att spela fotboll än en regelbaserade tekniken tillståndsmaskin. För att utvärdera detta har en förenklad fotbollssimulering skapats. Där de båda lagen har använts sig av varsin teknik. De båda lagen har sedan spelat 100 matcher mot varandra för att se vilket lag/teknik som är bäst. Statistik ifrån matcherna har använts som undersökningsresultat. Resultatet visar att Q-learning är en bättre teknik då den vinner flest match och skapar flest chanser under matcherna. Diskussionen efteråt handlar om hur användbart Q-learning är i ett spelsammanhang.
|
148 |
Models and metaphors in neuroscience : the role of dopamine in reinforcement learning as a case studyKyle, Robert January 2012 (has links)
Neuroscience makes use of many metaphors in its attempt to explain the relationship between our brain and our behaviour. In this thesis I contrast the most commonly used metaphor - that of computation driven by neuron action potentials - with an alternative view which seeks to understand the brain in terms of an agent learning from the reward signalled by neuromodulators. To explore this reinforcement learning model I construct computational models to assess one of its key claims — that the neurotransmitter dopamine signals unexpected reward, and that this signal is used by the brain to learn control of our movements and drive goal-directed behaviour. In this thesis I develop a selection of computational models that are motivated by either theoretical concepts or experimental data relating to the effects of dopamine. The first model implements a published dopamine-modulated spike timing-dependent plasticity mechanism but is unable to correctly solve the distal reward problem. I analyse why this model fails and suggest solutions. The second model, more closely linked to the empirical data attempts to investigate the relative contributions of firing rate and synaptic conductances to synaptic plasticity. I use experimental data to estimate how model neurons will be affected by dopamine modulation, and use the resulting computational model to predict the effect of dopamine on synaptic plasticity. The results suggest that dopamine modulation of synaptic conductances is more significant than modulation of excitability. The third model demonstrates how simple assumptions about the anatomy of the basal ganglia, and the electrophysiological effects of dopamine modulation can lead to reinforcement learning like behaviour. The model makes the novel prediction that working memory is an emergent feature of a reinforcement learning process. In the course of producing these models I find that both theoretically and empirically based models suffer from methodological problems that make it difficult to adequately support such fundamental claims as the reinforcement learning hypothesis. The conclusion that I draw from the modelling work is that it is neither possible, nor desirable to falsify the theoretical models used in neuroscience. Instead I argue that models and metaphors can be valued by how useful they are, independently of their truth. As a result I suggest that we ought to encourage a plurality of models and metaphors in neuroscience. In Chapter 7 I attempt to put this into practice by reviewing the other transmitter systems that modulate dopamine release, and use this as a basis for exploring the context of dopamine modulation and reward-driven behaviour. I draw on evidence to suggest that dopamine modulation can be seen as part of an extended stress response, and that the function of dopamine is to encourage the individual to engage in behaviours that take it away from homeostasis. I also propose that the function of dopamine can be interpreted in terms of behaviourally defining self and non-self, much in the same way as inflammation and antibody responses are said to do in immunology.
|
149 |
A Coordinated Reinforcement Learning Framework for Multi-Agent Virtual EnvironmentsSause, William 01 January 2013 (has links)
The growing popularity of online virtual communities such as Second Life and ActiveWorlds demands the presence of intelligent agents to assist users in their daily online activities (e.g., exploring, shopping, and socializing). As these virtual environments become more crowded, multiple agents are needed to support the increasing number of users. Multi-agent environments, however, can suffer from the problem of resource competition among agents. It is therefore necessary that agents within multi-agent environments include a coordination mechanism to prevent unrealistic behaviors. Moreover, it is essential that these agents exhibit some form of intelligence, or the ability to learn, to support realism as well as to eliminate the need for developers to write separate scripts for each task the agents are required to perform. This research presents a coordinated reinforcement learning framework which can be used to develop task-oriented intelligent agents in multi-agent virtual environments. The framework contains a combination of a "next available agent" coordination model and a reinforcement learning model consisting of existing temporal difference reinforcement learning algorithms. Furthermore, the framework supports evaluations of reinforcement learning algorithms to determine which methods are best suited for task-oriented intelligent agents in dynamic, multi-agent virtual environments.
To assess the effectiveness of the temporal difference reinforcement algorithms used in this study (Q-learning and Sarsa), experiments were conducted that measured an agent's ability to learn three tasks commonly performed by workers in a café environment. These tasks were basic sandwich making (BSM), complex sandwich making (CSM), and dynamic sandwich making (DSM). The BSM task consisted of four steps. The CSM and DSM tasks contained an additional fifth step. The agent learned the BSM and CSM tasks from scratch while the DSM task was learned after the agent became skillful in BSM. The measurements used to evaluate the efficiency of the Q-learning and Sarsa algorithms were the percentage of successful and optimally successful episodes performed by the agent and the average number of time steps taken by the agent to complete a successful episode. The experiments were run using both a fixed (FEP) and variable (VEP) ε-greedy probability rate. Results showed that the Sarsa reinforcement learning algorithm, on average, outperformed the Q-learning algorithm in almost all experiments except when measuring the percentage of successfully completed episodes using FEP for CSM and DSM, in which Sarsa performed almost equally as well as Q-learning. Overall, experiments utilizing VEP resulted in higher percentages of successes and optimal successes, and showed convergence to the optimal policy when measuring the average number of time steps per successful episode.
|
150 |
Dynamic movement primitives andreinforcement learning for adapting alearned skillLundell, Jens January 2016 (has links)
Traditionally robots have been preprogrammed to execute specific tasks. Thisapproach works well in industrial settings where robots have to execute highlyaccurate movements, such as when welding. However, preprogramming a robot isalso expensive, error prone and time consuming due to the fact that every featuresof the task has to be considered. In some cases, where a robot has to executecomplex tasks such as playing the ball-in-a-cup game, preprogramming it mighteven be impossible due to unknown features of the task. With all this in mind,this thesis examines the possibility of combining a modern learning framework,known as Learning from Demonstrations (LfD), to first teach a robot how toplay the ball-in-a-cup game by demonstrating the movement for the robot, andthen have the robot to improve this skill by itself with subsequent ReinforcementLearning (RL). The skill the robot has to learn is demonstrated with kinestheticteaching, modelled as a dynamic movement primitive, and subsequently improvedwith the RL algorithm Policy Learning by Weighted Exploration with the Returns.Experiments performed on the industrial robot KUKA LWR4+ showed that robotsare capable of successfully learning a complex skill such as playing the ball-in-a-cupgame. / Traditionellt sett har robotar blivit förprogrammerade för att utföra specifika uppgifter.Detta tillvägagångssätt fungerar bra i industriella miljöer var robotar måsteutföra mycket noggranna rörelser, som att svetsa. Förprogrammering av robotar ärdock dyrt, felbenäget och tidskrävande eftersom varje aspekt av uppgiften måstebeaktas. Dessa nackdelar kan till och med göra det omöjligt att förprogrammeraen robot att utföra komplexa uppgifter som att spela bollen-i-koppen spelet. Medallt detta i åtanke undersöker den här avhandlingen möjligheten att kombinera ettmodernt ramverktyg, kallat inläraning av demonstrationer, för att lära en robothur bollen-i-koppen-spelet ska spelas genom att demonstrera uppgiften för denoch sedan ha roboten att själv förbättra sin inlärda uppgift genom att användaförstärkande inlärning. Uppgiften som roboten måste lära sig är demonstreradmed kinestetisk undervisning, modellerad som dynamiska rörelseprimitiver, ochsenare förbättrad med den förstärkande inlärningsalgoritmen Policy Learning byWeighted Exploration with the Returns. Experiment utförda på den industriellaKUKA LWR4+ roboten visade att robotar är kapabla att framgångsrikt lära sigspela bollen-i-koppen spelet
|
Page generated in 0.09 seconds