Global ETD Search

141	Game Players Using Distributional Reinforcement Learning Pettersson, Adam, Pei Purroy, Francesc January 2024 (has links) Reinforcement learning (RL) algorithms aim to identify optimal action sequences for an agent in a given environment, traditionally maximizing the expected rewards received from the environment by taking each action and transitioning between states. This thesis explores approaching RL distributionally, replacing the expected reward function by the full distribution over the possible rewards received, known as the value distribution. We focus on the quantile regression distributional RL (QR-DQN) algorithm introduced by Dabney et al. (2017), which models the value distribution by representing its quantiles. With such information of the value distribution, we modify the QR-DQN algorithm to enhance the agent's risk sensitivity. Our risk-averse algorithm is evaluated against the original QR-DQN in the Atari 2600 and in the Gymnasium environment, specifically in the games Breakout, Pong, Lunar Lander and Cartpole. Results indicate that the risk-averse variant performs comparably in terms of rewards while exhibiting increased robustness and risk aversion. Potential refinements of the risk-averse algorithm are presented. Reinforcement learning distributional reinforcement learning quantile regression risk aversion Mathematics Matematik
142	Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design Kočí, Jakub January 2020 (has links) This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
143	Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning Zachery Peter Berg (12455928) 25 April 2022 (has links) <p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p> Theoretical Computer Science Deep Reinforcement Learning (DRL) Reinforcement Learning (RL) Double descent Policy network size bias-variance tradeoff Reinforcement Learning Generalization overparameterization
144	Biased Exploration in Offline Hierarchical Reinforcement Learning Miller, Eric D. 26 January 2021 (has links) No description available. Computer Science Artificial Intelligence machine learning reinforcement learning offline learning biased sampling sampling hierarchy task hierarchy hierarchical reinforcement learning rl hrl exploration optimism offline reinforcement learning
145	Model-Free Reinforcement Learning for Hierarchical OO-MDPs Goldblatt, John Dallan 23 May 2022 (has links) No description available. Artificial Intelligence Computer Science reinforcement learning RL hierarchical reinforcement learning HRL hierarchical hierarchy object-oriented reinforcement learning object-oriented OO-MDP OOMDP MDP Q-Learning QLearning MaxQ KOOL
146	ENHANCING POLICY OPTIMIZATION FOR IMPROVED SAMPLE EFFICIENCY AND GENERALIZATION IN DEEP REINFORCEMENT LEARNING Md Masudur Rahman (19818171) 08 October 2024 (has links) <p dir="ltr">The field of reinforcement learning has made significant progress in recent years, with deep reinforcement learning (RL) being a major contributor. However, there are still challenges associated with the effective training of RL algorithms, particularly with respect to sample efficiency and generalization. This thesis aims to address these challenges by developing RL algorithms capable of generalizing to unseen environments and adapting to dynamic conditions, thereby expanding the practical applicability of RL in real-world tasks. The first contribution of this thesis is the development of novel policy optimization techniques that enhance the generalization capabilities of RL agents. These techniques include the Thinker method, which employs style transfer to diversify observation trajectories, and Bootstrap Advantage Estimation, which improves policy and value function learning through augmented data. These methods have demonstrated superior performance in standard benchmarks, outperforming existing data augmentation and policy optimization techniques. Additionally, this thesis introduces Robust Policy Optimization, a method that enhances exploration in policy gradient-based RL by perturbing action distributions. This method addresses the limitations of traditional methods, such as entropy collapse and primacy bias, resulting in improved sample efficiency and adaptability in continuous action spaces. The thesis further explores the potential of natural language descriptions as an alternative to image-based state representations in RL. This approach enhances interpretability and generalization in tasks involving complex visual observations by leveraging large language models. Furthermore, this work contributes to the field of semi-autonomous teleoperated robotic surgery by developing systems capable of performing complex surgical tasks remotely, even under challenging conditions such as communication delays and data scarcity. The creation of the DESK dataset supports knowledge transfer across different robotic platforms, further enhancing the capabilities of these systems. Overall, the advancements presented in this thesis represent significant steps toward developing more robust, adaptable, and efficient autonomous agents. These contributions have broad implications for various real-world applications, including autonomous systems, robotics, and safety-critical tasks such as medical surgery.</p> Applications in health Intelligent robotics Planning and decision making Reinforcement learning Deep Reinforcement Learning Reinforcement Learning Robotics Robotic Surgery Robotic Telesurgery AI in Healthcare AI in Burn Care AI in Medical Surgery
147	Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environment Ngai, Chi-kit., 魏智傑. January 2007 (has links) published_or_final_version / abstract / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy Automobiles - Automatic control. System design.
148	APPLICATION OF SWARM AND REINFORCEMENT LEARNING TECHNIQUES TO REQUIREMENTS TRACING Sultanov, Hakim 01 January 2013 (has links) Today, software has become deeply woven into the fabric of our lives. The quality of the software we depend on needs to be ensured at every phase of the Software Development Life Cycle (SDLC). An analyst uses the requirements engineering process to gather and analyze system requirements in the early stages of the SDLC. An undetected problem at the beginning of the project can carry all the way through to the deployed product. The Requirements Traceability Matrix (RTM) serves as a tool to demonstrate how requirements are addressed by the design and implementation elements throughout the entire software development lifecycle. Creating an RTM matrix by hand is an arduous task. Manual generation of an RTM can be an error prone process as well. As the size of the requirements and design document collection grows, it becomes more challenging to ensure proper coverage of the requirements by the design elements, i.e., assure that every requirement is addressed by at least one design element. The techniques used by the existing requirements tracing tools take into account only the content of the documents to establish possible links. We expect that if we take into account the relative order of the text around the common terms within the inspected documents, we may discover candidate links with a higher accuracy. The aim of this research is to demonstrate how we can apply machine learning algorithms to software requirements engineering problems. This work addresses the problem of requirements tracing by viewing it in light of the Ant Colony Optimization (ACO) algorithm and a reinforcement learning algorithm. By treating the documents as the starting (nest) and ending points (sugar piles) of a path and the terms used in the documents as connecting nodes, a possible link can be established and strengthened by attracting more agents (ants) onto a path between the two documents by using pheromone deposits. The results of the work show that ACO and RL can successfully establish links between two sets of documents. Software Engineering Requirements Engineering Traceability Swarms Reinforcement Learning Software Engineering
149	Q-Learning: Ett sätt att lära agenter att spela fotboll / Q-Learning: A way to tach agents to play football Ekelund, Kalle January 2013 (has links) Den artificiella intelligensen i spel brukar ofta använda sig utav regelbaserade tekniker för dess beteende. Detta har gjort att de artificiella agenterna blivit förutsägbara, vilket är väldigt tydligt för sportspel. Det här arbetet har utvärderat ifall inlärningstekniken Q-learning är bättre på att spela fotboll än en regelbaserade tekniken tillståndsmaskin. För att utvärdera detta har en förenklad fotbollssimulering skapats. Där de båda lagen har använts sig av varsin teknik. De båda lagen har sedan spelat 100 matcher mot varandra för att se vilket lag/teknik som är bäst. Statistik ifrån matcherna har använts som undersökningsresultat. Resultatet visar att Q-learning är en bättre teknik då den vinner flest match och skapar flest chanser under matcherna. Diskussionen efteråt handlar om hur användbart Q-learning är i ett spelsammanhang. Q-Learning Maskininlärning Reinforcement Learning Artificiell Intelligens Fotboll
150	Models and metaphors in neuroscience : the role of dopamine in reinforcement learning as a case study Kyle, Robert January 2012 (has links) Neuroscience makes use of many metaphors in its attempt to explain the relationship between our brain and our behaviour. In this thesis I contrast the most commonly used metaphor - that of computation driven by neuron action potentials - with an alternative view which seeks to understand the brain in terms of an agent learning from the reward signalled by neuromodulators. To explore this reinforcement learning model I construct computational models to assess one of its key claims — that the neurotransmitter dopamine signals unexpected reward, and that this signal is used by the brain to learn control of our movements and drive goal-directed behaviour. In this thesis I develop a selection of computational models that are motivated by either theoretical concepts or experimental data relating to the effects of dopamine. The first model implements a published dopamine-modulated spike timing-dependent plasticity mechanism but is unable to correctly solve the distal reward problem. I analyse why this model fails and suggest solutions. The second model, more closely linked to the empirical data attempts to investigate the relative contributions of firing rate and synaptic conductances to synaptic plasticity. I use experimental data to estimate how model neurons will be affected by dopamine modulation, and use the resulting computational model to predict the effect of dopamine on synaptic plasticity. The results suggest that dopamine modulation of synaptic conductances is more significant than modulation of excitability. The third model demonstrates how simple assumptions about the anatomy of the basal ganglia, and the electrophysiological effects of dopamine modulation can lead to reinforcement learning like behaviour. The model makes the novel prediction that working memory is an emergent feature of a reinforcement learning process. In the course of producing these models I find that both theoretically and empirically based models suffer from methodological problems that make it difficult to adequately support such fundamental claims as the reinforcement learning hypothesis. The conclusion that I draw from the modelling work is that it is neither possible, nor desirable to falsify the theoretical models used in neuroscience. Instead I argue that models and metaphors can be valued by how useful they are, independently of their truth. As a result I suggest that we ought to encourage a plurality of models and metaphors in neuroscience. In Chapter 7 I attempt to put this into practice by reviewing the other transmitter systems that modulate dopamine release, and use this as a basis for exploring the context of dopamine modulation and reward-driven behaviour. I draw on evidence to suggest that dopamine modulation can be seen as part of an extended stress response, and that the function of dopamine is to encourage the individual to engage in behaviours that take it away from homeostasis. I also propose that the function of dopamine can be interpreted in terms of behaviourally defining self and non-self, much in the same way as inflammation and antibody responses are said to do in immunology. 612.8

Search results