Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
1 
Autonomous intertask transfer in reinforcement learning domainsTaylor, Matthew Edmund 07 September 2012 (has links)
Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents’ experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL’s applicability to difficult tasks by allowing agents to generalize their experience across learning problems. This dissertation presents intertask mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Intertask mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same intertask mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how intertask mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mappinglearning algorithms for RL transfer. Combining transfer methods with these similaritylearning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent’s ability to learn in each domain, and explore the limits of transfer’s applicability. / text

2 
Modelbased approximation methods for reinforcement learning /Wang, Xin. January 1900 (has links)
Thesis (Ph. D.)Oregon State University, 2007. / Printout. Includes bibliographical references (leaves 187198). Also available on the World Wide Web.

3 
Autonomous intertask transfer in reinforcement learning domainsTaylor, Matthew Edmund. January 1900 (has links)
Thesis (Ph. D.)University of Texas at Austin, 2008. / Vita. Includes bibliographical references.

4 
Learning MDP action models via discrete mixture trees /Wynkoop, Michael S. January 1900 (has links)
Thesis (M.S.)Oregon State University, 2009. / Printout. Includes bibliographical references (leaves 4344). Also available on the World Wide Web.

5 
Discovering hierarchy in reinforcement learning /Hengst, Bernhard. January 2003 (has links)
Thesis (Ph. D.)University of New South Wales, 2003. / Also available online.

6 
Reinforcement learning with parameterized actionsMasson, Warwick Anthony January 2016 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2016. / In order to complete realworld tasks, autonomous robots require a mix of finegrained control and
highlevel skills. A robot requires a wide range of skills to handle a variety of different situations, but
must also be able to adapt its skills to handle a specific situation. Reinforcement learning is a machine
learning paradigm for learning to solve tasks by interacting with an environment. Current methods in
reinforcement learning focus on agents with either a fixed number of discrete actions, or a continuous
set of actions.
We consider the problem of reinforcement learning with parameterized actions—discrete actions with
continuous parameters. At each step the agent must select both which action to use and which parameters
to use with that action. By representing actions in this way, we have the high level skills given by discrete
actions and adaptibility given by the parameters for each action.
We introduce the QPAMDP algorithm for modelfree learning in parameterized action Markov decision
processes. QPAMDP alternates learning which discrete actions to use in each state and then which
parameters to use in those states. We show that under weak assumptions, QPAMDP converges to a
local maximum. We compare QPAMDP with a direct policy search approach in the goal and Platform
domains. QPAMDP outperforms direct policy search in both domains. / TG2016

7 
Representation discovery using a fixed basis in reinforcement learningWookey, Dean Stephen January 2016 (has links)
A thesis presented for the degree of Doctor of Philosophy, School of Computer Science and Applied Mathematics. University of the Witwatersrand, South Africa. 26 August 2016. / In the reinforcement learning paradigm, an agent learns by interacting with its environment. At each
state, the agent receives a numerical reward. Its goal is to maximise the discounted sum of future rewards.
One way it can do this is through learning a value function; a function which maps states to the discounted
sum of future rewards. With an accurate value function and a model of the environment, the agent can
take the optimal action in each state. In practice, however, the value function is approximated, and
performance depends on the quality of the approximation. Linear function approximation is a commonly
used approximation scheme, where the value function is represented as a weighted sum of basis functions
or features. In continuous state environments, there are infinitely many such features to choose from,
introducing the new problem of feature selection. Existing algorithms such as OMPTD are slow to
converge, scale poorly to high dimensional spaces, and have not been generalised to the online learning
case. We introduce heuristic methods for reducing the search space in high dimensions that significantly
reduce computational costs and also act as regularisers. We extend these methods and introduce feature
regularisation for incremental feature selection in the batch learning case, and show that introducing a
smoothness prior is effective with our SSOMPTD and STOMPTD algorithms. Finally we generalise
OMPTD and our algorithms to the online case and evaluate them empirically. / LG2017

8 
Discovering hierarchy in reinforcement learningHengst, Bernhard, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds buildingblocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multidimensional Markov decision problem (MDP) by constructing a multilevel hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) subgoal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents

9 
QLearning for Robot ControlGaskett, Chris, cgaskett@it.jcu.edu.au January 2002 (has links)
QLearning is a method for solving reinforcement learning problems. Reinforcement learning problems require improvement of behaviour based on received rewards. QLearning has the potential to reduce robot programming effort and increase the range of robot abilities. However, most currentQlearning systems are not suitable for robotics problems: they treat continuous variables, for example speeds or positions, as discretised values. Discretisation does not allow smooth control and does not fully exploit sensed information. A practical algorithm must also cope with realtime constraints, sensing and actuation delays, and incorrect sensor data.
This research describes an algorithm that deals with continuous state and action variables without discretising. The algorithm is evaluated with visionbased mobile robot and active head gaze control tasks. As well as learning the basic control tasks, the algorithm learns to compensate for delays in sensing and actuation by predicting the behaviour of its environment. Although the learned dynamic model is implicit in the controller, it is possible to extract some aspects of the model. The extracted models are compared to theoretically derived models of environment behaviour.
The difficulty of working with robots motivates development of methods that reduce experimentation time. This research exploits Qlearnings ability to learn by passively observing the robots actionsrather than necessarily controlling the robot. This is a valuable tool for shortening the duration of learning experiments.

10 
Structured exploration for reinforcement learningJong, Nicholas K. 18 December 2012 (has links)
Reinforcement Learning (RL) offers a promising approach towards achieving the dream of autonomous agents that can behave intelligently in the real world. Instead of requiring humans to determine the correct behaviors or sufficient knowledge in advance, RL algorithms allow an agent to acquire the necessary knowledge through direct experience with its environment. Early algorithms guaranteed convergence to optimal behaviors in limited domains, giving hope that simple, universal mechanisms would allow learning agents to succeed at solving a wide variety of complex problems. In practice, the field of RL has struggled to apply these techniques successfully to the full breadth and depth of realworld domains.
This thesis extends the reach of RL techniques by demonstrating the synergies among certain key developments in the literature. The first of these developments is modelbased exploration, which facilitates theoretical convergence guarantees in finite problems by explicitly reasoning about an agent's certainty in its understanding of its environment. A second branch of research studies function approximation, which generalizes RL to infinite problems by artificially limiting the degrees of freedom in an agent's representation of its environment. The final major advance that this thesis incorporates is hierarchical decomposition, which seeks to improve the efficiency of learning by endowing an agent's knowledge and behavior with the gross structure of its environment.
Each of these ideas has intuitive appeal and sustains substantial independent research efforts, but this thesis defines the first RL agent that combines all their benefits in the general case. In showing how to combine these techniques effectively, this thesis investigates the twin issues of generalization and exploration, which lie at the heart of efficient learning. This thesis thus lays the groundwork for the next generation of RL algorithms, which will allow scientific agents to know when it suffices to estimate a plan from current data and when to accept the potential cost of running an experiment to gather new data. / text

Page generated in 0.1233 seconds