A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2016. / In order to complete real-world tasks, autonomous robots require a mix of fine-grained control and
high-level skills. A robot requires a wide range of skills to handle a variety of different situations, but
must also be able to adapt its skills to handle a specific situation. Reinforcement learning is a machine
learning paradigm for learning to solve tasks by interacting with an environment. Current methods in
reinforcement learning focus on agents with either a fixed number of discrete actions, or a continuous
set of actions.
We consider the problem of reinforcement learning with parameterized actions—discrete actions with
continuous parameters. At each step the agent must select both which action to use and which parameters
to use with that action. By representing actions in this way, we have the high level skills given by discrete
actions and adaptibility given by the parameters for each action.
We introduce the Q-PAMDP algorithm for model-free learning in parameterized action Markov decision
processes. Q-PAMDP alternates learning which discrete actions to use in each state and then which
parameters to use in those states. We show that under weak assumptions, Q-PAMDP converges to a
local maximum. We compare Q-PAMDP with a direct policy search approach in the goal and Platform
domains. Q-PAMDP out-performs direct policy search in both domains. / TG2016
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/21639 |
Date | January 2016 |
Creators | Masson, Warwick Anthony |
Source Sets | South African National ETD Portal |
Language | English |
Detected Language | English |
Type | Thesis |
Format | Online resource (46 leaves), application/pdf, image/jpeg, application/pdf, application/pdf |
Page generated in 0.0018 seconds