Return to search

Autonomous discovery of temporal abstractions from interaction with an environment

The ability to create and to use abstractions in complex environments, that is, to systematically ignore irrelevant details, is a key reason that humans are effective problem solvers. Although the utility of abstraction is commonly accepted, there has been relatively little research on autonomously discovering or creating useful abstractions. A system that can create new abstractions autonomously can learn and plan in situations that its original designer was not able to anticipate. This dissertation introduces two related methods that allow an agent to autonomously discover and create temporal abstractions from its accumulated experience with its environment. A temporal abstraction is an encapsulation of a complex set of actions into a single higher-level action that allows an agent to learn and plan while ignoring details that appear at finer levels of temporal resolution. The main idea of both methods is to search for patterns that occur frequently within an agent's accumulated successful experience and that do not occur in unsuccessful experiences. These patterns are used to create the new temporal abstractions. The two types of temporal abstractions that our methods create are (1) subgoals and closed-loop policies for achieving them, and (2) open-loop policies, or action sequences, that are useful “macros.” We demonstrate the utility of both types of temporal abstractions in several simulated tasks, including two simulated mobile robot tasks. We use these tasks to demonstrate that the autonomously created temporal abstractions can both facilitate the learning of an agent within a task and can enable effective knowledge transfer to related tasks. As a larger task, we focus on the difficult problem of scheduling the assembly instructions for computers with multiple pipelines in such a manner that the reordered instructions will execute as quickly as possible. We demonstrate that the autonomously discovered action sequences can significantly improve performance of the scheduler and can enable effective knowledge transfer across similar processors. Both methods can extract the temporal abstractions from collections of behavioral trajectories generated by different processes. In particular, we demonstrate that the methods can be effective when applied to collections generated by reinforcement learning agents, heuristic searchers, and human tele-operators.

Identiferoai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-3675
Date01 January 2002
CreatorsMcGovern, Elizabeth Amy
PublisherScholarWorks@UMass Amherst
Source SetsUniversity of Massachusetts, Amherst
LanguageEnglish
Detected LanguageEnglish
Typetext
SourceDoctoral Dissertations Available from Proquest

Page generated in 0.0015 seconds