Global ETD Search

Return to search

Adaptive value function approximation in reinforcement learning using wavelets

A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015. / Reinforcement learning agents solve tasks by finding policies that maximise their reward
over time. The policy can be found from the value function, which represents the value
of each state-action pair. In continuous state spaces, the value function must be approximated.
Often, this is done using a fixed linear combination of functions across all
dimensions.
We introduce and demonstrate the wavelet basis for reinforcement learning, a basis
function scheme competitive against state of the art fixed bases. We extend two online
adaptive tiling schemes to wavelet functions and show their performance improvement
across standard domains. Finally we introduce the Multiscale Adaptive Wavelet Basis
(MAWB), a wavelet-based adaptive basis scheme which is dimensionally scalable and insensitive
to the initial level of detail. This scheme adaptively grows the basis function
set by combining across dimensions, or splitting within a dimension those candidate functions
which have a high estimated projection onto the Bellman error. A number of novel
measures are used to find this estimate.
i

http://hdl.handle.net/10539/19298

Wavelets (Mathematics)

Reinforcement learning.

Identifer	oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/19298
Date	January 2016
Creators	Mitchley, Michael
Source Sets	South African National ETD Portal
Language	English
Detected Language	English
Type	Thesis
Format	application/pdf

Page generated in 0.0026 seconds

Adaptive value function approximation in reinforcement learning using wavelets

Description

Links & Downloads

Tags

Additional Fields