Intelligent agents are designed to interact with, and learn about, their environment so that they can act purposefully towards a goal. One class of problems encountered in building such agents is learning how to respond to dynamic systems with a continuous state space. The goals of this dissertation are to develop a framework for understanding the behaviour of partitioned dynamic systems with continuous underlying state and to translate this framework into algorithms which adaptively form a partition of the continuous space such that the partitioned system is more easily learned and controlled, and such that the control law may be easily explained in intuitive ways. Currently, algorithms which learn a control policy for partitioned continuous state space systems treat the partitioned system as an approximation to a Markov chain. I give conditions for the partitioned system to be a Markov chain, a semi-Markov process and a new class of system, a weak-semi-Markov process. The weak-semi-Markov model is shown to model partitioned dynamic systems with greater economy than other surveyed models. The behaviour of a partitioned state space system in the area around the region boundaries is also considered. I use the theory of sliding surfaces, and some heuristic arguments to recommend region boundary shape and position. The concept of 'staying on the boundary' then becomes a robust and relatively easy subgoal within the control algorithm. The concept of 'reaching the sliding surface' as a subgoal is used as the basis for an intuitive explanation of the learnt controller. I present an algorithm based on this concept which explains the behaviour of a learnt controller in ways not previously available to a machine learning algorithms. Finally, the Markov Property and the theory of Sliding Mode Control are used as the basis of a class of recursive algorithms. These algorithms adaptively find a partition, and simultaneously use this partition in conjunction with one of five reinforcement learning algorithms to find a control policy based on that partition. This technique is shown to work very well in learning, controlling and explaining a variety of physical systems, from a monorail to a container crane.
Identifer | oai:union.ndltd.org:ADTP/187869 |
Date | January 2004 |
Creators | McGarity, Michael, Computer Science & Engineering, Faculty of Engineering, UNSW |
Publisher | Awarded by:University of New South Wales. School of Computer Science and Engineering |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Copyright Michael McGarity, http://unsworks.unsw.edu.au/copyright |
Page generated in 0.0014 seconds