Global ETD Search

Return to search

Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration

The purpose of this dissertation is to understand how algorithms can efficiently learn to solve new tasks based on previous experience, instead of being explicitly programmed with a solution for each task that we want it to solve. Here a task is a series of decisions, such as a robot vacuum deciding which room to clean next or an intelligent car deciding to stop at a traffic light. In such a case, state-of-the-art learning algorithms are difficult to employ in practice because they often make thou- sands of mistakes before reliably solving a task. However, humans learn solutions to novel tasks, often making fewer mistakes, which suggests that efficient learning algorithms may exist. One advantage that humans have over state- of-the-art learning algorithms is that, while learning a new task, humans can apply knowledge gained from previously solved tasks. The central hypothesis investigated by this dissertation is that learning algorithms can solve new tasks more efficiently when they take into consideration knowledge learned from solving previous tasks. Al- though this hypothesis may appear to be obviously true, what knowledge to use and how to apply that knowledge to new tasks is a challenging, open research problem.

I investigate this hypothesis in three ways. First, I developed a new learning algorithm that is able to use prior knowledge to constrain the exploration space. Second, I extended a powerful theoretical framework in machine learning, called Probably Approximately Correct, so that I can formally compare the efficiency of algorithms that solve only a single task to algorithms that consider knowledge from previously solved tasks. With this framework, I found sufficient conditions for using knowledge from previous tasks to improve efficiency of learning to solve new tasks and also identified conditions where transferring knowledge may impede learning. I present situations where transfer learning can be used to intelligently constrain the exploration space so that optimality loss can be minimized. Finally, I tested the efficiency of my algorithms in various experimental domains.

These theoretical and empirical results provide support for my central hypothesis. The theory and experiments of this dissertation provide a deeper understanding of what makes a learning algorithm efficient so that it can be widely used in practice. Finally, these results also contribute the general goal of creating autonomous machines that can be reliably employed to solve complex tasks.

http://hdl.handle.net/1969.1/148402

pruning

scaling

multiarmed bandit

Markov decision process

exploration/exploitation dilemma

exploration

machine learning

transfer learning

reinforcement learning

Identifer	oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/148402
Date	14 March 2013
Creators	Mann, Timothy 1984-
Contributors	Choe, Yoonsuck
Source Sets	Texas A and M University
Detected Language	English
Type	Thesis, text
Format	application/pdf

Page generated in 0.0049 seconds

Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration

Description

Links & Downloads

Tags

Additional Fields