Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
1 
Autonomous intertask transfer in reinforcement learning domainsTaylor, Matthew Edmund 07 September 2012 (has links)
Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents’ experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL’s applicability to difficult tasks by allowing agents to generalize their experience across learning problems. This dissertation presents intertask mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Intertask mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same intertask mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how intertask mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mappinglearning algorithms for RL transfer. Combining transfer methods with these similaritylearning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent’s ability to learn in each domain, and explore the limits of transfer’s applicability. / text

2 
Reinforcement learning with parameterized actionsMasson, Warwick Anthony January 2016 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2016. / In order to complete realworld tasks, autonomous robots require a mix of finegrained control and
highlevel skills. A robot requires a wide range of skills to handle a variety of different situations, but
must also be able to adapt its skills to handle a specific situation. Reinforcement learning is a machine
learning paradigm for learning to solve tasks by interacting with an environment. Current methods in
reinforcement learning focus on agents with either a fixed number of discrete actions, or a continuous
set of actions.
We consider the problem of reinforcement learning with parameterized actions—discrete actions with
continuous parameters. At each step the agent must select both which action to use and which parameters
to use with that action. By representing actions in this way, we have the high level skills given by discrete
actions and adaptibility given by the parameters for each action.
We introduce the QPAMDP algorithm for modelfree learning in parameterized action Markov decision
processes. QPAMDP alternates learning which discrete actions to use in each state and then which
parameters to use in those states. We show that under weak assumptions, QPAMDP converges to a
local maximum. We compare QPAMDP with a direct policy search approach in the goal and Platform
domains. QPAMDP outperforms direct policy search in both domains. / TG2016

3 
Representation discovery using a fixed basis in reinforcement learningWookey, Dean Stephen January 2016 (has links)
A thesis presented for the degree of Doctor of Philosophy, School of Computer Science and Applied Mathematics. University of the Witwatersrand, South Africa. 26 August 2016. / In the reinforcement learning paradigm, an agent learns by interacting with its environment. At each
state, the agent receives a numerical reward. Its goal is to maximise the discounted sum of future rewards.
One way it can do this is through learning a value function; a function which maps states to the discounted
sum of future rewards. With an accurate value function and a model of the environment, the agent can
take the optimal action in each state. In practice, however, the value function is approximated, and
performance depends on the quality of the approximation. Linear function approximation is a commonly
used approximation scheme, where the value function is represented as a weighted sum of basis functions
or features. In continuous state environments, there are infinitely many such features to choose from,
introducing the new problem of feature selection. Existing algorithms such as OMPTD are slow to
converge, scale poorly to high dimensional spaces, and have not been generalised to the online learning
case. We introduce heuristic methods for reducing the search space in high dimensions that significantly
reduce computational costs and also act as regularisers. We extend these methods and introduce feature
regularisation for incremental feature selection in the batch learning case, and show that introducing a
smoothness prior is effective with our SSOMPTD and STOMPTD algorithms. Finally we generalise
OMPTD and our algorithms to the online case and evaluate them empirically. / LG2017

4 
Discovering hierarchy in reinforcement learning /Hengst, Bernhard. January 2003 (has links)
Thesis (Ph. D.)University of New South Wales, 2003. / Also available online.

5 
Learning MDP action models via discrete mixture trees /Wynkoop, Michael S. January 1900 (has links)
Thesis (M.S.)Oregon State University, 2009. / Printout. Includes bibliographical references (leaves 4344). Also available on the World Wide Web.

6 
Autonomous intertask transfer in reinforcement learning domainsTaylor, Matthew Edmund. January 1900 (has links)
Thesis (Ph. D.)University of Texas at Austin, 2008. / Vita. Includes bibliographical references.

7 
Modelbased approximation methods for reinforcement learning /Wang, Xin. January 1900 (has links)
Thesis (Ph. D.)Oregon State University, 2007. / Printout. Includes bibliographical references (leaves 187198). Also available on the World Wide Web.

8 
Structured exploration for reinforcement learningJong, Nicholas K. 18 December 2012 (has links)
Reinforcement Learning (RL) offers a promising approach towards achieving the dream of autonomous agents that can behave intelligently in the real world. Instead of requiring humans to determine the correct behaviors or sufficient knowledge in advance, RL algorithms allow an agent to acquire the necessary knowledge through direct experience with its environment. Early algorithms guaranteed convergence to optimal behaviors in limited domains, giving hope that simple, universal mechanisms would allow learning agents to succeed at solving a wide variety of complex problems. In practice, the field of RL has struggled to apply these techniques successfully to the full breadth and depth of realworld domains.
This thesis extends the reach of RL techniques by demonstrating the synergies among certain key developments in the literature. The first of these developments is modelbased exploration, which facilitates theoretical convergence guarantees in finite problems by explicitly reasoning about an agent's certainty in its understanding of its environment. A second branch of research studies function approximation, which generalizes RL to infinite problems by artificially limiting the degrees of freedom in an agent's representation of its environment. The final major advance that this thesis incorporates is hierarchical decomposition, which seeks to improve the efficiency of learning by endowing an agent's knowledge and behavior with the gross structure of its environment.
Each of these ideas has intuitive appeal and sustains substantial independent research efforts, but this thesis defines the first RL agent that combines all their benefits in the general case. In showing how to combine these techniques effectively, this thesis investigates the twin issues of generalization and exploration, which lie at the heart of efficient learning. This thesis thus lays the groundwork for the next generation of RL algorithms, which will allow scientific agents to know when it suffices to estimate a plan from current data and when to accept the potential cost of running an experiment to gather new data. / text

9 
The hardware implementation of an artificial neural network using stochastic pulse rate encoding principlesGlover, John Sigsworth January 1995 (has links)
In this thesis the development of a hardware artificial neuron device and artificial neural network using stochastic pulse rate encoding principles is considered. After a review of neural network architectures and algorithmic approaches suitable for hardware implementation, a critical review of hardware techniques which have been considered in analogue and digital systems is presented. New results are presented demonstrating the potential of two learning schemes which adapt by the use of a single reinforcement signal. The techniques for computation using stochastic pulse rate encoding are presented and extended with new novel circuits relevant to the hardware implementation of an artificial neural network. The generation of random numbers is the key to the encoding of data into the stochastic pulse rate domain. The formation of random numbers and multiple random bit sequences from a single PRBS generator have been investigated. Two techniques, Simulated Annealing and Genetic Algorithms, have been applied successfully to the problem of optimising the configuration of a PRBS random number generator for the formation of multiple random bit sequences and hence random numbers. A complete hardware design for an artificial neuron using stochastic pulse rate encoded signals has been described, designed, simulated, fabricated and tested before configuration of the device into a network to perform simple test problems. The implementation has shown that the processing elements of the artificial neuron are small and simple, but that there can be a significant overhead for the encoding of information into the stochastic pulse rate domain. The stochastic artificial neuron has the capability of online weight adaption. The implementation of reinforcement schemes using the stochastic neuron as a basic element are discussed.

10 
Reinforcement learning in commercial computer gamesCoggan, Melanie. January 2008 (has links)
The goal of this thesis is to explore the use of reinforcement learning (RL) in commercial computer games. Although RL has been applied with success to many types of board games and nongame simulated environments, there has been little work in applying RL to the most popular genres of games: firstperson shooters, roleplaying games, and realtime strategies. In this thesis we use a firstperson shooter environment to create computer players, or bots, that learn to play the game using reinforcement learning techniques. / We have created three experimental bots: ChaserBot, ItemBot and HybridBot. The two first bots each focus on a different aspect of the firstperson shooter genre, and learn using basic RL. ChaserBot learns to chase down and shoot an enemy player. ItemBot, on the other hand, learns how to pick up the itemsweapons, ammunition, armorthat are available, scattered on the ground, for the players to improve their arsenal. Both of these bots become reasonably proficient at their assigned task. Our goal for the third bot, HybridBot, was to create a bot that both chases and shoots an enemy player and goes after the items in the environment. Unlike the two previous bots which only have primitive actions available (strafing right or left, moving forward or backward, etc.), HybridBot uses options. At any state, it may choose either the player chasing option or the item gathering option. These options' internal policies are determined by the data learned by ChaserBot and ItemBot. HybridBot uses reinforcement learning to learn which option to pick at a given state. / Each bot learns to perform its given tasks. We compare the three bots' ability to gather items, and ChaserBot's and HybridBot's ability to chase their opponent. HybridBot's results are of particular interest as it outperforms ItemBot at picking up items by a large amount. However, none of our experiments yielded bots that are competitive with human players. We discuss the reasons for this and suggest improvements for future work that could lead to competitive reinforcement learning bots.

Page generated in 0.1199 seconds