Return to search

Regularized Fine-tuning Strategies for Neural Language Models : Application of entropy regularization on GPT-2

Deep neural language models like GPT-2 is undoubtedly strong at text generation, but often requires special decoding strategies to prevent producing degenerate output - namely repetition. The use of maximum likelihood training objective results in a peaked probability distribution, leading to the over-confidence of neural networks. In this thesis, we explore entropy regularization for a neural language model that can easily smooth peaked output distribution during the fine-tuning process employing GPT-2. We first define the models in three ways: (1) Out of-the box model without fine-tuning process, (2) Fine-tuned model without entropy regularization, and (3) Fine-tuned model with entropy regularization. To investigate the effect of domains on the model, we also divide the dataset into three ways: (1) fine-tuned on heterogeneous dataset, tested on heterogeneous dataset, (2) fine-tuned on homogeneous dataset, tested on homogeneous dataset, and (3) fine-tuned on heterogeneous dataset, tested on homogeneous dataset. In terms of entropy regularization, we experiment controlling the entropy strength parameter (𝛽) in the range of [0.5, 1.0, 2.0, 4.0, 6.0] and annealing the parameter during fine-tuning process. Our findings prove that the entropy-based regularization during fine-tuning process improve the text generation models by significantly reducing the repetition rate without tuning the decoding strategies. As a result of comparing the probabilities of human-generated sentence tokens, it was observed that entropy regularization compensates for the shortcomings of the deterministic decoding method (Beam search) that mostly selects few high-probability words. Various studies have explored entropy regularization in the cold-start training process of neural networks. However, there are not many studies covering the effect of the fine-tuning stage of text generation tasks when employing large scale pre-trained language models. Our findings present strong evidence that one can achieve significant improvement in text generation by way of utilizing entropy regularization, a highly cost-effective approach, during the fine-tuning process.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-479154
Date January 2022
CreatorsHong, Jae Eun
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds