Return to search
## Representation discovery using a fixed basis in reinforcement learning

A thesis presented for the degree of Doctor of Philosophy, School of Computer Science and Applied Mathematics. University of the Witwatersrand, South Africa. 26 August 2016. / In the reinforcement learning paradigm, an agent learns by interacting with its environment. At each

state, the agent receives a numerical reward. Its goal is to maximise the discounted sum of future rewards.

One way it can do this is through learning a value function; a function which maps states to the discounted

sum of future rewards. With an accurate value function and a model of the environment, the agent can

take the optimal action in each state. In practice, however, the value function is approximated, and

performance depends on the quality of the approximation. Linear function approximation is a commonly

used approximation scheme, where the value function is represented as a weighted sum of basis functions

or features. In continuous state environments, there are infinitely many such features to choose from,

introducing the new problem of feature selection. Existing algorithms such as OMP-TD are slow to

converge, scale poorly to high dimensional spaces, and have not been generalised to the online learning

case. We introduce heuristic methods for reducing the search space in high dimensions that significantly

reduce computational costs and also act as regularisers. We extend these methods and introduce feature

regularisation for incremental feature selection in the batch learning case, and show that introducing a

smoothness prior is effective with our SSOMP-TD and STOMP-TD algorithms. Finally we generalise

OMP-TD and our algorithms to the online case and evaluate them empirically. / LG2017

Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/21642 |

Date | January 2016 |

Creators | Wookey, Dean Stephen |

Source Sets | South African National ETD Portal |

Language | English |

Detected Language | English |

Type | Thesis |

Format | Online resource (v, 74 leaves), application/pdf |

Page generated in 0.0024 seconds