A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015. / Reinforcement learning agents solve tasks by finding policies that maximise their reward
over time. The policy can be found from the value function, which represents the value
of each state-action pair. In continuous state spaces, the value function must be approximated.
Often, this is done using a fixed linear combination of functions across all
We introduce and demonstrate the wavelet basis for reinforcement learning, a basis
function scheme competitive against state of the art fixed bases. We extend two online
adaptive tiling schemes to wavelet functions and show their performance improvement
across standard domains. Finally we introduce the Multiscale Adaptive Wavelet Basis
(MAWB), a wavelet-based adaptive basis scheme which is dimensionally scalable and insensitive
to the initial level of detail. This scheme adaptively grows the basis function
set by combining across dimensions, or splitting within a dimension those candidate functions
which have a high estimated projection onto the Bellman error. A number of novel
measures are used to find this estimate.
|Source Sets||South African National ETD Portal|
Page generated in 0.0169 seconds