• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 651
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1063
  • 1063
  • 260
  • 217
  • 194
  • 177
  • 161
  • 159
  • 154
  • 149
  • 142
  • 127
  • 123
  • 121
  • 113
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

The interaction of working memory and Uncertainty (mis)estimation in context-dependent outcome estimation

Li Xin Lim (9230078) 13 November 2023 (has links)
<p dir="ltr">In the context of reinforcement learning, extensive research has shown how reinforcement learning was facilitated by the estimation of uncertainty to improve the ability to make decisions. However, the constraints imposed by the limited observation of the process of forming environment representation have seldom been a subject of discussion. Thus, the study intended to demonstrate that when incorporating a limited memory into uncertainty estimation, individuals potentially misestimate outcomes and environmental statistics. The study included a computational model that included the process of active working memory and lateral inhibition in working memory (WM) to describe how the relevant information was chosen and stored to form estimations of uncertainty in forming outcome expectations. The active working memory maintained relevant information not just by the recent memory, but also with utility. With the relevant information stored in WM, the model was able to estimate expected uncertainty and perceived volatility and detect contextual changes or dynamics in the outcome structure. Two experiments to investigate limitations in information availability and uncertainty estimation were carried out. The first experiment investigated the impact of cognitive loading on the reliance on memories to form outcome estimation. The findings revealed that introducing cognitive loading diminished the reliance on memory for uncertainty estimations and lowered the expected uncertainty, leading to an increased perception of environmental volatility. The second experiment investigated the ability to detect changes in outcome noise under different conditions of outcome exposure. The study found differences in the mechanisms used for detecting environmental changes in various conditions. Through the experiments and model fitting, the study showed that the misestimation of uncertainties was reliant on individual experiences and relevant information stored in WM under a limited capacity.</p>
132

REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS

Mridul Agarwal (13171941) 29 July 2022 (has links)
<p> </p> <p>Formulating RL with MDPs work typically works for a single objective, and hence, they are not readily applicable where the policies need to optimize multiple objectives or to satisfy certain constraints while maximizing one or multiple objectives, which can often be conflicting. Further, many applications such as robotics or autonomous driving do not allow for violating constraints even during the training process. Currently, existing algorithms do not simultaneously combine multiple objectives and zero-constraint violations, sample efficiency, and computational complexity. To this end, we study sample efficient Reinforcement Learning with concave objective and convex constraints, where an agent maximizes a concave, Lipschitz continuous function of multiple objectives while satisfying a convex cost objective. For this setup, we provide a posterior sampling algorithm which works with a convex optimization problem to solve for the stationary distribution of the states and actions. Further, using our Bellman error based analysis, we show that the algorithm obtains a near-optimal Bayesian regret bound for the number of interaction with the environment. Moreover, with an assumption of existence of slack policies, we design an algorithm that solves for conservative policies which does not violate  constraints and still achieves the near-optimal regret bound. We also show that the algorithm performs significantly better than the existing algorithm for MDPs with finite states and finite actions.</p>
133

Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design

Kočí, Jakub January 2020 (has links)
This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
134

Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning

Zachery Peter Berg (12455928) 25 April 2022 (has links)
<p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p>
135

Biased Exploration in Offline Hierarchical Reinforcement Learning

Miller, Eric D. 26 January 2021 (has links)
No description available.
136

Model-Free Reinforcement Learning for Hierarchical OO-MDPs

Goldblatt, John Dallan 23 May 2022 (has links)
No description available.
137

Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environment

Ngai, Chi-kit., 魏智傑. January 2007 (has links)
published_or_final_version / abstract / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
138

APPLICATION OF SWARM AND REINFORCEMENT LEARNING TECHNIQUES TO REQUIREMENTS TRACING

Sultanov, Hakim 01 January 2013 (has links)
Today, software has become deeply woven into the fabric of our lives. The quality of the software we depend on needs to be ensured at every phase of the Software Development Life Cycle (SDLC). An analyst uses the requirements engineering process to gather and analyze system requirements in the early stages of the SDLC. An undetected problem at the beginning of the project can carry all the way through to the deployed product. The Requirements Traceability Matrix (RTM) serves as a tool to demonstrate how requirements are addressed by the design and implementation elements throughout the entire software development lifecycle. Creating an RTM matrix by hand is an arduous task. Manual generation of an RTM can be an error prone process as well. As the size of the requirements and design document collection grows, it becomes more challenging to ensure proper coverage of the requirements by the design elements, i.e., assure that every requirement is addressed by at least one design element. The techniques used by the existing requirements tracing tools take into account only the content of the documents to establish possible links. We expect that if we take into account the relative order of the text around the common terms within the inspected documents, we may discover candidate links with a higher accuracy. The aim of this research is to demonstrate how we can apply machine learning algorithms to software requirements engineering problems. This work addresses the problem of requirements tracing by viewing it in light of the Ant Colony Optimization (ACO) algorithm and a reinforcement learning algorithm. By treating the documents as the starting (nest) and ending points (sugar piles) of a path and the terms used in the documents as connecting nodes, a possible link can be established and strengthened by attracting more agents (ants) onto a path between the two documents by using pheromone deposits. The results of the work show that ACO and RL can successfully establish links between two sets of documents.
139

Q-Learning: Ett sätt att lära agenter att spela fotboll / Q-Learning: A way to tach agents to play football

Ekelund, Kalle January 2013 (has links)
Den artificiella intelligensen i spel brukar ofta använda sig utav regelbaserade tekniker för dess beteende. Detta har gjort att de artificiella agenterna blivit förutsägbara, vilket är väldigt tydligt för sportspel. Det här arbetet har utvärderat ifall inlärningstekniken Q-learning är bättre på att spela fotboll än en regelbaserade tekniken tillståndsmaskin. För att utvärdera detta har en förenklad fotbollssimulering skapats. Där de båda lagen har använts sig av varsin teknik. De båda lagen har sedan spelat 100 matcher mot varandra för att se vilket lag/teknik som är bäst. Statistik ifrån matcherna har använts som undersökningsresultat. Resultatet visar att Q-learning är en bättre teknik då den vinner flest match och skapar flest chanser under matcherna. Diskussionen efteråt handlar om hur användbart Q-learning är i ett spelsammanhang.
140

Models and metaphors in neuroscience : the role of dopamine in reinforcement learning as a case study

Kyle, Robert January 2012 (has links)
Neuroscience makes use of many metaphors in its attempt to explain the relationship between our brain and our behaviour. In this thesis I contrast the most commonly used metaphor - that of computation driven by neuron action potentials - with an alternative view which seeks to understand the brain in terms of an agent learning from the reward signalled by neuromodulators. To explore this reinforcement learning model I construct computational models to assess one of its key claims — that the neurotransmitter dopamine signals unexpected reward, and that this signal is used by the brain to learn control of our movements and drive goal-directed behaviour. In this thesis I develop a selection of computational models that are motivated by either theoretical concepts or experimental data relating to the effects of dopamine. The first model implements a published dopamine-modulated spike timing-dependent plasticity mechanism but is unable to correctly solve the distal reward problem. I analyse why this model fails and suggest solutions. The second model, more closely linked to the empirical data attempts to investigate the relative contributions of firing rate and synaptic conductances to synaptic plasticity. I use experimental data to estimate how model neurons will be affected by dopamine modulation, and use the resulting computational model to predict the effect of dopamine on synaptic plasticity. The results suggest that dopamine modulation of synaptic conductances is more significant than modulation of excitability. The third model demonstrates how simple assumptions about the anatomy of the basal ganglia, and the electrophysiological effects of dopamine modulation can lead to reinforcement learning like behaviour. The model makes the novel prediction that working memory is an emergent feature of a reinforcement learning process. In the course of producing these models I find that both theoretically and empirically based models suffer from methodological problems that make it difficult to adequately support such fundamental claims as the reinforcement learning hypothesis. The conclusion that I draw from the modelling work is that it is neither possible, nor desirable to falsify the theoretical models used in neuroscience. Instead I argue that models and metaphors can be valued by how useful they are, independently of their truth. As a result I suggest that we ought to encourage a plurality of models and metaphors in neuroscience. In Chapter 7 I attempt to put this into practice by reviewing the other transmitter systems that modulate dopamine release, and use this as a basis for exploring the context of dopamine modulation and reward-driven behaviour. I draw on evidence to suggest that dopamine modulation can be seen as part of an extended stress response, and that the function of dopamine is to encourage the individual to engage in behaviours that take it away from homeostasis. I also propose that the function of dopamine can be interpreted in terms of behaviourally defining self and non-self, much in the same way as inflammation and antibody responses are said to do in immunology.

Page generated in 0.0984 seconds