Global ETD Search

Return to search

Study of Pretraining Bias and Frequencies

Usage of language models in an in-context learning environment has been adapted for a wide range of tasks. Recent works have showcased the impact of pretraining data on the in-context performance of language models. In this work, we experiment with numbers having high and low frequencies in the pretraining data to understand the impact of term frequencies on the model's performance. We also experiment with random and adversarial demonstrations to understand the pretraining bias present in the model. Through these experiments, we showcase the importance of pretraining frequencies of the numbers present in the demonstrations and explain how highly frequent terms can be used in the demonstrations to achieve better task performance. Moreover, we also show the impact of pretraining bias on the model's performance and explain how the model overcomes this bias with more demonstrations. / Master of Science / Recent works focus on understanding and improving the arithmetic capabilities of the state-of-the-art (SOTA) systems in the domain of Natural Language Processing (NLP). This work focuses on designing and performing novel experiments to analyze the impact of training data on the performance of such systems. Through these experiments, this work showcases interesting properties of the SOTA systems which will promote future research to understand them better as well as help in creating better downstream applications.

Identifer	oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/115712
Date	10 July 2023
Creators	Taware, Rutuja Murlidhar
Contributors	Computer Science and Applications, Ramakrishnan, Narendran, Lourentzou, Ismini, Lu, Chang Tien
Publisher	Virginia Tech
Source Sets	Virginia Tech Theses and Dissertation
Language	English
Detected Language	English
Type	Thesis
Format	ETD, application/pdf
Rights	In Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.002 seconds

Study of Pretraining Bias and Frequencies

Description

Links & Downloads

Tags

Additional Fields