Global ETD Search

Return to search

Non-linguistic Notions in Language Modeling: Learning, Retention, and Applications

Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling. / Doctor of Philosophy / Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.

Multitask Learning

Transfer Learning

Finetuning

Catastrophic Forgetting

Fisher

Natural Language Processing

Identifer	oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/121122
Date	11 September 2024
Creators	Sharma, Mandar
Contributors	Computer Science and#38; Applications, Ramakrishnan, Narendran, North, Christopher L., Lu, Chang Tien, Huang, Lifu, Kumar, Srijan
Publisher	Virginia Tech
Source Sets	Virginia Tech Theses and Dissertation
Language	English
Detected Language	English
Type	Dissertation
Format	ETD, application/pdf
Rights	Creative Commons Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/

Page generated in 0.0017 seconds

Non-linguistic Notions in Language Modeling: Learning, Retention, and Applications

Description

Links & Downloads

Tags

Additional Fields