Return to search

Task Distillation: Transforming Reinforcement Learning into Supervised Learning

Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more complex learning task, such as a reinforcement learning environment, to be reduced to a simpler learning task, such as supervised classification. In order to advance task distillation beyond supervised-to-supervised distillation, we explore distilling reinforcement learning environments into supervised learning datasets. We propose a new distillation algorithm that allows PPO to be used to distill a reinforcement learning environment. We demonstrate k-shot learning on distilled cart-pole to demonstrate the effectiveness of our distillation algorithm, as well as to explore distillation generalization. We distill multi-dimensional cart-pole environments to their minimum-sized distillations and show that this matches the theoretical minimum number of data instances required to teach each task. We demonstrate how a distilled task can be used as an interpretability artifact, as it compactly represents everything needed to learn the task. We demonstrate the feasibility of distillation in more complex Atari environments by fully distilling Centipede and demonstrating that distillation is cheaper than training directly on Centipede for training more than 9 models. We provide a method to "partially" distill more complex environments and demonstrate it on Ms. Pac-Man, Pong, and Space Invaders and show how it scales distillation difficulty fully on Centipede.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-11160
Date12 October 2023
CreatorsWilhelm, Connor
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttps://lib.byu.edu/about/copyright/

Page generated in 0.0024 seconds