Many tasks can easily be posed as the problem of responding to the states of an external world with actions that maximise the reward received over time. Algorithms that reliably solve such problems exist. However, their worst-case complexities are typically more than proportional to the size of the state space in which a task is to be performed. Many simple tasks involve enormous numbers of states, which can make the application of such algorithms impractical. This thesis examines reinforcement learning algorithms which effectively learn to perform tasks by constructing mappings from states to suitable actions. In problems involving large numbers of states, these algorithms usually must construct approximate, rather than exact, solutions and the primary issue examined in the thesis is the way in which the complexity of constructing adequate approximations scales as the size of a state space increases. The vast majority of reinforcement learning algorithms operate by constructing estimates of the long-term value of states and using these estimates to select actions. The potential effects of errors in such estimates are examined and shown to be severe. Empirical results are presented which suggest that minor errors are likely to result in significant losses in many problems, and where such losses are most likely to occur. The complexity of constructing estimates accurate enough to prevent significant losses is also examined empirically and shown to be substantial.
Identifer | oai:union.ndltd.org:ADTP/230226 |
Creators | McDonald, Matthew A. F |
Publisher | University of Western Australia. Dept. of Computer Science |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Copyright Matthew A.F. McDonald, http://www.itpo.uwa.edu.au/UWA-Computer-And-Software-Use-Regulations.html |
Page generated in 0.0019 seconds