Alexander, John W.
The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.
Page generated in 0.1735 seconds