Return to search

Open System Neural Networks

Recent advances in self-supervised learning have made it possible to reuse information-rich models that have been generally pre-trained on massive amounts of data for other downstream tasks. But the pre-training process can be drastically different from the fine-tuning training process, which can lead to inefficient learning. We address this disconnect in training dynamics by structuring the learning process like an open system in thermodynamics. Open systems can achieve a steady state when low-entropy inputs are converted to high-entropy outputs. We modify the the model and the learning process to mimic this behavior, and attend more to elements of the input sequence that exhibit greater changes in entropy. We call this architecture the Open System Neural Network (OSNN). We show the efficacy of the OSNN on multiple classification datasets with a variety of encoder-only Transformers. We find that the OSNN outperforms nearly all model specific baselines, and achieves a new state-of-the-art result on two classification datasets.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-11243
Date12 January 2024
CreatorsHatch, Bradley
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttps://lib.byu.edu/about/copyright/

Page generated in 0.0018 seconds