Global ETD Search

Return to search

Large-scale dynamic optimization using teams of reinforcement learning agents

Recent algorithmic and theoretical advances in reinforcement learning (RL) are attracting widespread interest. RL algorithms have appeared that approximate dynamic programming (DP) on an incremental basis. Unlike traditional DP algorithms, these algorithms do not require knowledge of the state transition probabilities or reward structure of a system. This allows them to be trained using real or simulated experiences, focusing their computations on the areas of state space that are actually visited during control, making them computationally tractable on very large problems. RL algorithms can be used as components of multi-agent algorithms. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this dissertation we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems. Elevator group control serves as our primary testbed. The elevator domain poses a combination of challenges not seen in most RL research to date. Elevator systems operate in continuous state spaces and in continuous time as discrete event dynamic systems. Their states are not fully observable and they are non-stationary due to changing passenger arrival rates. As a way of streamlining the search through policy space, we use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reinforcement signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility.

https://scholarworks.umass.edu/dissertations/AAI9709586

Computer science|Artificial intelligence

Identifer	oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-2773
Date	01 January 1996
Creators	Crites, Robert Harry
Publisher	ScholarWorks@UMass Amherst
Source Sets	University of Massachusetts, Amherst
Language	English
Detected Language	English
Type	text
Source	Doctoral Dissertations Available from Proquest

Page generated in 0.0019 seconds

Large-scale dynamic optimization using teams of reinforcement learning agents

Description

Links & Downloads

Tags

Additional Fields