1 |
Reinforcement learning with parameterized actionsMasson, Warwick Anthony January 2016 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2016. / In order to complete real-world tasks, autonomous robots require a mix of fine-grained control and
high-level skills. A robot requires a wide range of skills to handle a variety of different situations, but
must also be able to adapt its skills to handle a specific situation. Reinforcement learning is a machine
learning paradigm for learning to solve tasks by interacting with an environment. Current methods in
reinforcement learning focus on agents with either a fixed number of discrete actions, or a continuous
set of actions.
We consider the problem of reinforcement learning with parameterized actions—discrete actions with
continuous parameters. At each step the agent must select both which action to use and which parameters
to use with that action. By representing actions in this way, we have the high level skills given by discrete
actions and adaptibility given by the parameters for each action.
We introduce the Q-PAMDP algorithm for model-free learning in parameterized action Markov decision
processes. Q-PAMDP alternates learning which discrete actions to use in each state and then which
parameters to use in those states. We show that under weak assumptions, Q-PAMDP converges to a
local maximum. We compare Q-PAMDP with a direct policy search approach in the goal and Platform
domains. Q-PAMDP out-performs direct policy search in both domains. / TG2016
|
2 |
Autonomous inter-task transfer in reinforcement learning domainsTaylor, Matthew Edmund 07 September 2012 (has links)
Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents’ experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL’s applicability to difficult tasks by allowing agents to generalize their experience across learning problems. This dissertation presents inter-task mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Inter-task mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same inter-task mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how inter-task mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mapping-learning algorithms for RL transfer. Combining transfer methods with these similarity-learning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent’s ability to learn in each domain, and explore the limits of transfer’s applicability. / text
|
3 |
Discovering hierarchy in reinforcement learning /Hengst, Bernhard. January 2003 (has links)
Thesis (Ph. D.)--University of New South Wales, 2003. / Also available online.
|
4 |
Learning MDP action models via discrete mixture trees /Wynkoop, Michael S. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2009. / Printout. Includes bibliographical references (leaves 43-44). Also available on the World Wide Web.
|
5 |
Autonomous inter-task transfer in reinforcement learning domainsTaylor, Matthew Edmund. January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2008. / Vita. Includes bibliographical references.
|
6 |
Model-based approximation methods for reinforcement learning /Wang, Xin. January 1900 (has links)
Thesis (Ph. D.)--Oregon State University, 2007. / Printout. Includes bibliographical references (leaves 187-198). Also available on the World Wide Web.
|
7 |
Representation discovery using a fixed basis in reinforcement learningWookey, Dean Stephen January 2016 (has links)
A thesis presented for the degree of Doctor of Philosophy, School of Computer Science and Applied Mathematics. University of the Witwatersrand, South Africa. 26 August 2016. / In the reinforcement learning paradigm, an agent learns by interacting with its environment. At each
state, the agent receives a numerical reward. Its goal is to maximise the discounted sum of future rewards.
One way it can do this is through learning a value function; a function which maps states to the discounted
sum of future rewards. With an accurate value function and a model of the environment, the agent can
take the optimal action in each state. In practice, however, the value function is approximated, and
performance depends on the quality of the approximation. Linear function approximation is a commonly
used approximation scheme, where the value function is represented as a weighted sum of basis functions
or features. In continuous state environments, there are infinitely many such features to choose from,
introducing the new problem of feature selection. Existing algorithms such as OMP-TD are slow to
converge, scale poorly to high dimensional spaces, and have not been generalised to the online learning
case. We introduce heuristic methods for reducing the search space in high dimensions that significantly
reduce computational costs and also act as regularisers. We extend these methods and introduce feature
regularisation for incremental feature selection in the batch learning case, and show that introducing a
smoothness prior is effective with our SSOMP-TD and STOMP-TD algorithms. Finally we generalise
OMP-TD and our algorithms to the online case and evaluate them empirically. / LG2017
|
8 |
Transfer in reinforcement learningAlexander, John W. January 2015 (has links)
The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.
|
9 |
The use of apprenticeship learning via inverse reinforcement learning for musical compositionMesser, Orry 04 February 2015 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. 14 August 2014. / Reinforcement learning is a branch of machine learning wherein an agent is given rewards which it uses
to guide its learning. These rewards are often manually specified in terms of a reward function. The
agent performs a certain action in a particular state and is given a reward accordingly (where a state is
a configuration of the environment). A problem arises when the reward function is either difficult or
impossible to manually specify. Apprenticeship learning via inverse reinforcement learning can be used
in these cases in order to ascertain a reward function, given a set of expert trajectories. The research
presented in this document used apprenticeship learning in order to ascertain a reward function in a
musical context. The agent then optimized its performance in terms of this reward function. This was
accomplished by presenting the learning agents with pieces of music composed by the author. These
were the expert trajectories from which the learning agent discovered a reward function. This reward
function allowed the agents to attempt to discover an optimal strategy for maximizing its value.
Three learning agents were created. Two were drum-beat generating agents and one a melody composing
agent. The first two agents were used to recreate expert drum-beats as well as generate new
drum-beats. The melody agent was used to generate new melodies given a set of expert melodies. The
results show that apprenticeship learning can be used both to recreate expert musical pieces as well as
generate new musical pieces which are similar to with the expert musical pieces. Further, the results using
the melody agent indicate that the agent has learned to generate new melodies in a given key, without
having been given explicit information about key signatures.
|
10 |
Crowd behavioural simulation via multi-agent reinforcement learningLim, Sheng Yan January 2016 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2015. / Crowd simulation can be thought of as a group of entities interacting with one another. Traditionally,
an animated entity would require precise scripts so that it can function in a virtual
environment autonomously. Previous studies on crowd simulation have been used in real world
applications but these methods are not learning agents and are therefore unable to adapt and
change their behaviours. The state of the art crowd simulation methods include flow based, particle
and strategy based models. A reinforcement learning agent could learn how to navigate,
behave and interact in an environment without explicit design. Then a group of reinforcement
learning agents should be able to act in a way that simulates a crowd. This thesis investigates
the believability of crowd behavioural simulation via three multi-agent reinforcement learning
methods. The methods are Q-learning in multi-agent markov decision processes model, joint
state action Q-learning and joint state value iteration algorithm. The three learning methods are
able to produce believable and realistic crowd behaviours.
|
Page generated in 0.1276 seconds