Spelling suggestions: "subject:"detuning.""
1 |
Non-linguistic Notions in Language Modeling: Learning, Retention, and ApplicationsSharma, Mandar 11 September 2024 (has links)
Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling. / Doctor of Philosophy / Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.
|
2 |
Digital Platform Dynamics: Governance, Market Design and AI IntegrationIlango Guru Muniasamy (19149178) 17 July 2024 (has links)
<p dir="ltr">In my dissertation, I examine the dynamics of digital platforms, starting with the governance practices of established platforms, then exploring innovative design approaches, and finally the integration of advanced AI technologies in platforms. I structure this exploration into three essays: in the first essay, I discuss moderation processes in online communities; in the second, I propose a novel design for a blockchain-based green bond exchange; and in the third, I examine how AI-based decision-making platforms can be enhanced through synthetic data generation.</p><p dir="ltr">In my first essay, I investigate the role of moderation in online communities, focusing on its effect on users' participation in community moderation. Using data from a prominent online forum, I analyze changes in users' moderation actions (upvoting and downvoting of others' content) after they experience a temporary account suspension. While I find no significant change in their upvoting behavior, my results suggest that users downvote more after their suspension. Combined with findings on lower quality and conformity with the community while downvoting, the results suggest an initial increase in hostile moderation after suspension, although these effects dissipate over time. The short-term hostility post-suspension has the potential to negatively affect platform harmony, thus revealing the complexities of disciplinary actions and their unintended consequences.</p><p dir="ltr">In the second essay, I shift from established platforms to innovations in platform design, presenting a novel hybrid green bond exchange that integrates blockchain technology with thermodynamic principles to address market volatility and regulatory uncertainty. The green bond market, despite its high growth, faces issues like greenwashing, liquidity constraints, and limited retail investor participation. To tackle these challenges, I propose an exchange framework that uses blockchain for green bond tokenization, enhancing transparency and accessibility. By conceptualizing the exchange as a thermodynamic system, I ensure economic value is conserved and redistributed, promoting stability and efficiency. I include key mechanisms in the design to conserve value in the exchange and deter speculative trading. Through simulations, I demonstrate significant improvements in market stability, liquidity, and efficiency, highlighting the effectiveness of this interdisciplinary approach and offering a robust framework for future financial system development.</p><p dir="ltr">In the third essay, I explore the integration of advanced AI technologies, focusing on how large language models (LLMs) like GPT can be adapted for specialized fields such as education policy and decision-making. To address the need for high-quality, domain-specific training data, I develop a methodology that combines agent-based simulation (ABS) with synthetic data generation and GPT fine-tuning. This enhanced model provides accurate, contextually relevant, and interpretable insights for educational policy scenarios. My approach addresses challenges such as data scarcity, privacy concerns, and the need for diverse, representative data. Experiments show significant improvements in model performance and robustness, offering policymakers a powerful tool for exploring complex scenarios and making data-driven decisions. This research advances the literature on synthetic data in AI and agent-based modeling in education, demonstrating the adaptability of large language models to specialized domains.</p>
|
3 |
Charakterizace chodců ve videu / Pedestrian Attribute AnalysisStudená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
|
Page generated in 0.0799 seconds