• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 512
  • 77
  • 65
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 889
  • 889
  • 215
  • 191
  • 146
  • 130
  • 130
  • 128
  • 120
  • 120
  • 119
  • 106
  • 103
  • 96
  • 88
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Autonomous inter-task transfer in reinforcement learning domains

Taylor, Matthew Edmund 07 September 2012 (has links)
Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents’ experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL’s applicability to difficult tasks by allowing agents to generalize their experience across learning problems. This dissertation presents inter-task mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Inter-task mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same inter-task mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how inter-task mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mapping-learning algorithms for RL transfer. Combining transfer methods with these similarity-learning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent’s ability to learn in each domain, and explore the limits of transfer’s applicability. / text

Reinforcement learning with parameterized actions

Masson, Warwick Anthony January 2016 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2016. / In order to complete real-world tasks, autonomous robots require a mix of fine-grained control and high-level skills. A robot requires a wide range of skills to handle a variety of different situations, but must also be able to adapt its skills to handle a specific situation. Reinforcement learning is a machine learning paradigm for learning to solve tasks by interacting with an environment. Current methods in reinforcement learning focus on agents with either a fixed number of discrete actions, or a continuous set of actions. We consider the problem of reinforcement learning with parameterized actions—discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. By representing actions in this way, we have the high level skills given by discrete actions and adaptibility given by the parameters for each action. We introduce the Q-PAMDP algorithm for model-free learning in parameterized action Markov decision processes. Q-PAMDP alternates learning which discrete actions to use in each state and then which parameters to use in those states. We show that under weak assumptions, Q-PAMDP converges to a local maximum. We compare Q-PAMDP with a direct policy search approach in the goal and Platform domains. Q-PAMDP out-performs direct policy search in both domains. / TG2016

Representation discovery using a fixed basis in reinforcement learning

Wookey, Dean Stephen January 2016 (has links)
A thesis presented for the degree of Doctor of Philosophy, School of Computer Science and Applied Mathematics. University of the Witwatersrand, South Africa. 26 August 2016. / In the reinforcement learning paradigm, an agent learns by interacting with its environment. At each state, the agent receives a numerical reward. Its goal is to maximise the discounted sum of future rewards. One way it can do this is through learning a value function; a function which maps states to the discounted sum of future rewards. With an accurate value function and a model of the environment, the agent can take the optimal action in each state. In practice, however, the value function is approximated, and performance depends on the quality of the approximation. Linear function approximation is a commonly used approximation scheme, where the value function is represented as a weighted sum of basis functions or features. In continuous state environments, there are infinitely many such features to choose from, introducing the new problem of feature selection. Existing algorithms such as OMP-TD are slow to converge, scale poorly to high dimensional spaces, and have not been generalised to the online learning case. We introduce heuristic methods for reducing the search space in high dimensions that significantly reduce computational costs and also act as regularisers. We extend these methods and introduce feature regularisation for incremental feature selection in the batch learning case, and show that introducing a smoothness prior is effective with our SSOMP-TD and STOMP-TD algorithms. Finally we generalise OMP-TD and our algorithms to the online case and evaluate them empirically. / LG2017

Discovering hierarchy in reinforcement learning /

Hengst, Bernhard. January 2003 (has links)
Thesis (Ph. D.)--University of New South Wales, 2003. / Also available online.

Learning MDP action models via discrete mixture trees /

Wynkoop, Michael S. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2009. / Printout. Includes bibliographical references (leaves 43-44). Also available on the World Wide Web.

Autonomous inter-task transfer in reinforcement learning domains

Taylor, Matthew Edmund. January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2008. / Vita. Includes bibliographical references.

Model-based approximation methods for reinforcement learning /

Wang, Xin. January 1900 (has links)
Thesis (Ph. D.)--Oregon State University, 2007. / Printout. Includes bibliographical references (leaves 187-198). Also available on the World Wide Web.

Structured exploration for reinforcement learning

Jong, Nicholas K. 18 December 2012 (has links)
Reinforcement Learning (RL) offers a promising approach towards achieving the dream of autonomous agents that can behave intelligently in the real world. Instead of requiring humans to determine the correct behaviors or sufficient knowledge in advance, RL algorithms allow an agent to acquire the necessary knowledge through direct experience with its environment. Early algorithms guaranteed convergence to optimal behaviors in limited domains, giving hope that simple, universal mechanisms would allow learning agents to succeed at solving a wide variety of complex problems. In practice, the field of RL has struggled to apply these techniques successfully to the full breadth and depth of real-world domains. This thesis extends the reach of RL techniques by demonstrating the synergies among certain key developments in the literature. The first of these developments is model-based exploration, which facilitates theoretical convergence guarantees in finite problems by explicitly reasoning about an agent's certainty in its understanding of its environment. A second branch of research studies function approximation, which generalizes RL to infinite problems by artificially limiting the degrees of freedom in an agent's representation of its environment. The final major advance that this thesis incorporates is hierarchical decomposition, which seeks to improve the efficiency of learning by endowing an agent's knowledge and behavior with the gross structure of its environment. Each of these ideas has intuitive appeal and sustains substantial independent research efforts, but this thesis defines the first RL agent that combines all their benefits in the general case. In showing how to combine these techniques effectively, this thesis investigates the twin issues of generalization and exploration, which lie at the heart of efficient learning. This thesis thus lays the groundwork for the next generation of RL algorithms, which will allow scientific agents to know when it suffices to estimate a plan from current data and when to accept the potential cost of running an experiment to gather new data. / text

The hardware implementation of an artificial neural network using stochastic pulse rate encoding principles

Glover, John Sigsworth January 1995 (has links)
In this thesis the development of a hardware artificial neuron device and artificial neural network using stochastic pulse rate encoding principles is considered. After a review of neural network architectures and algorithmic approaches suitable for hardware implementation, a critical review of hardware techniques which have been considered in analogue and digital systems is presented. New results are presented demonstrating the potential of two learning schemes which adapt by the use of a single reinforcement signal. The techniques for computation using stochastic pulse rate encoding are presented and extended with new novel circuits relevant to the hardware implementation of an artificial neural network. The generation of random numbers is the key to the encoding of data into the stochastic pulse rate domain. The formation of random numbers and multiple random bit sequences from a single PRBS generator have been investigated. Two techniques, Simulated Annealing and Genetic Algorithms, have been applied successfully to the problem of optimising the configuration of a PRBS random number generator for the formation of multiple random bit sequences and hence random numbers. A complete hardware design for an artificial neuron using stochastic pulse rate encoded signals has been described, designed, simulated, fabricated and tested before configuration of the device into a network to perform simple test problems. The implementation has shown that the processing elements of the artificial neuron are small and simple, but that there can be a significant overhead for the encoding of information into the stochastic pulse rate domain. The stochastic artificial neuron has the capability of on-line weight adaption. The implementation of reinforcement schemes using the stochastic neuron as a basic element are discussed.

Reinforcement learning in commercial computer games

Coggan, Melanie. January 2008 (has links)
The goal of this thesis is to explore the use of reinforcement learning (RL) in commercial computer games. Although RL has been applied with success to many types of board games and non-game simulated environments, there has been little work in applying RL to the most popular genres of games: first-person shooters, role-playing games, and real-time strategies. In this thesis we use a first-person shooter environment to create computer players, or bots, that learn to play the game using reinforcement learning techniques. / We have created three experimental bots: ChaserBot, ItemBot and HybridBot. The two first bots each focus on a different aspect of the first-person shooter genre, and learn using basic RL. ChaserBot learns to chase down and shoot an enemy player. ItemBot, on the other hand, learns how to pick up the items---weapons, ammunition, armor---that are available, scattered on the ground, for the players to improve their arsenal. Both of these bots become reasonably proficient at their assigned task. Our goal for the third bot, HybridBot, was to create a bot that both chases and shoots an enemy player and goes after the items in the environment. Unlike the two previous bots which only have primitive actions available (strafing right or left, moving forward or backward, etc.), HybridBot uses options. At any state, it may choose either the player chasing option or the item gathering option. These options' internal policies are determined by the data learned by ChaserBot and ItemBot. HybridBot uses reinforcement learning to learn which option to pick at a given state. / Each bot learns to perform its given tasks. We compare the three bots' ability to gather items, and ChaserBot's and HybridBot's ability to chase their opponent. HybridBot's results are of particular interest as it outperforms ItemBot at picking up items by a large amount. However, none of our experiments yielded bots that are competitive with human players. We discuss the reasons for this and suggest improvements for future work that could lead to competitive reinforcement learning bots.

Page generated in 0.1621 seconds