Return to search

Reward-driven Training of Random Boolean Network Reservoirs for Model-Free Environments

Reservoir Computing (RC) is an emerging machine learning paradigm where a fixed kernel, built from a randomly connected "reservoir" with sufficiently rich dynamics, is capable of expanding the problem space in a non-linear fashion to a higher dimensional feature space. These features can then be interpreted by a linear readout layer that is trained by a gradient descent method. In comparison to traditional neural networks, only the output layer needs to be trained, which leads to a significant computational advantage. In addition, the short term memory of the reservoir dynamics has the ability to transform a complex temporal input state space to a simple non-temporal representation. Adaptive real-time systems are multi-stage decision problems that can be used to train an agent to achieve a preset goal by performing an optimal action at each timestep. In such problems, the agent learns through continuous interactions with its environment. Conventional techniques to solving such problems become computationally expensive or may not converge if the state-space being considered is large, partially observable, or if short term memory is required in optimal decision making. The objective of this thesis is to use reservoir computers to solve such goal-driven tasks, where no error signal can be readily calculated to apply gradient descent methodologies. To address this challenge, we propose a novel reinforcement learning approach in combination with reservoir computers built from simple Boolean components. Such reservoirs are of interest because they have the potential to be fabricated by self-assembly techniques. We evaluate the performance of our approach in both Markovian and non-Markovian environments. We compare the performance of an agent trained through traditional Q-Learning. We find that the reservoir-based agent performs successfully in these problem contexts and even performs marginally better than Q-Learning agents in certain cases. Our proposed approach allows to retain the advantage of traditional parameterized dynamic systems in successfully modeling embedded state-space representations while eliminating the complexity involved in training traditional neural networks. To the best of our knowledge, our method of training a reservoir readout layer through an on-policy boot-strapping approach is unique in the field of random Boolean network reservoirs.

Identiferoai:union.ndltd.org:pdx.edu/oai:pdxscholar.library.pdx.edu:open_access_etds-1668
Date27 March 2013
CreatorsGargesa, Padmashri
PublisherPDXScholar
Source SetsPortland State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceDissertations and Theses

Page generated in 0.0019 seconds