Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling. / Doctor of Philosophy / Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/121122 |
Date | 11 September 2024 |
Creators | Sharma, Mandar |
Contributors | Computer Science and#38; Applications, Ramakrishnan, Narendran, North, Christopher L., Lu, Chang Tien, Huang, Lifu, Kumar, Srijan |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | Creative Commons Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/ |
Page generated in 0.0025 seconds