There are a number of techniques that are used to solve reinforcement learning
problems, but very few that have been developed for and tested on highly reconfigurable
systems cast as reinforcement learning problems. Reconfigurable systems
refers to a vehicle (air, ground, or water) or collection of vehicles that can change its
geometrical features, i.e. shape or formation, to perform tasks that the vehicle could
not otherwise accomplish. These systems tend to be optimized for several operating
conditions, and then controllers are designed to reconfigure the system from one operating
condition to another. Q-learning, an unsupervised episodic learning technique
that solves the reinforcement learning problem, is an attractive control methodology
for reconfigurable systems. It has been successfully applied to a myriad of control
problems, and there are a number of variations that were developed to avoid or alleviate
some limitations in earlier version of this approach. This dissertation describes the
development of three modular enhancements to the Q-learning algorithm that solve
some of the unique problems that arise when working with this class of systems, such
as the complex interaction of reconfigurable parameters and computationally intensive
models of the systems. A multi-resolution state-space discretization method is developed
that adaptively rediscretizes the state-space by progressively finer grids around
one or more distinct Regions Of Interest within the state or learning space. A genetic
algorithm that autonomously selects the basis functions to be used in the approximation of the action-value function is applied periodically throughout the learning
process. Policy comparison is added to monitor the state of the policy encoded in the
action-value function to prevent unnecessary episodes at each level of discretization.
This approach is validated on several problems including an inverted pendulum, reconfigurable
airfoil, and reconfigurable wing. Results show that the multi-resolution
state-space discretization method reduces the number of state-action pairs, often by
an order of magnitude, required to achieve a specific goal and the policy comparison
prevents unnecessary episodes once the policy has converged to a usable policy. Results
also show that the genetic algorithm is a promising candidate for the selection
of basis functions for function approximation of the action-value function.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2009-12-7421 |
Date | 2009 December 1900 |
Creators | Lampton, Amanda K. |
Contributors | Valasek, John |
Source Sets | Texas A and M University |
Language | English |
Detected Language | English |
Type | Book, Thesis, Electronic Dissertation, text |
Format | application/pdf |
Page generated in 0.0019 seconds