11 |
Augmenting Large Language Models with Humor Theory To Understand PunsRyan Rony Dsilva (18429846) 25 April 2024 (has links)
<p dir="ltr">This research explores the application of large language models (LLMs) to comprehension of puns. Leveraging the expansive capabilities of LLMs, this study delves into the domain of pun classification by examining it through the prism of two humor theories: the Computational Model of Humor and the Benign Violation theory, which is an extension of the N+V Theory. The computational model posits that for a phrase to qualify as a pun, it must possess both ambiguity and distinctiveness, characterized by a word that can be interpreted in two plausible ways, each interpretation being supported by at least one unique word. On the other hand, the Benign Violation theory posits that puns work by breaching one linguistic rule while conforming to another, thereby creating a "benign violation." By leveraging the capabilities of large language models (LLMs), this research endeavors to scrutinize a curated collection of English language puns. Our aim is to assess the validity and effectiveness of the use of these theoretical frameworks in accurately classifying puns. We undertake controlled experiments on the dataset, selectively removing a condition specific to one theory and then evaluating the puns based on the criteria of the other theory to see how well it classifies the altered inputs. This approach allows us to uncover deeper insights into the processes that facilitate the recognition of puns and to explore the practical implications of applying humor theories. The findings of our experiments, detailed in the subsequent sections, sheds light on how the alteration of specific conditions impacts the ability of the LLMs to accurately classify puns, according to each theory, where each component of the theory does not influence the result to the same extent, thereby contributing to our understanding of humor mechanics through the eyes of LLMs.</p>
|
12 |
Large Language Models for Unsupervised Keyphrase Extraction and Biomedical Data AnalyticsHaoran Ding (18825838) 03 September 2024 (has links)
<p dir="ltr">Natural Language Processing (NLP), a vital branch of artificial intelligence, is designed to equip computers with the ability to comprehend and manipulate human language, facilitating the extraction and utilization of textual data. NLP plays a crucial role in harnessing the vast quantities of textual data generated daily, facilitating meaningful information extraction. Among the various techniques, keyphrase extraction stands out due to its ability to distill concise information from extensive texts, making it invaluable for summarizing and navigating content efficiently. The process of keyphrase extraction usually begins by generating candidates first and then ranking them to identify the most relevant phrases. Keyphrase extraction can be categorized into supervised and unsupervised approaches. Supervised methods typically achieve higher accuracy as they are trained on labeled data, which allows them to effectively capture and utilize patterns recognized during training. However, the dependency on extensive, well-annotated datasets limits their applicability in scenarios where such data is scarce or costly to obtain. On the other hand, unsupervised methods, while free from the constraints of labeled data, face challenges in capturing deep semantic relationships within text, which can impact their effectiveness. Despite these challenges, unsupervised keyphrase extraction holds significant promise due to its scalability and lower barriers to entry, as it does not require labeled datasets. This approach is increasingly favored for its potential to aid in building extensive knowledge bases from unstructured data, which can be particularly useful in domains where acquiring labeled data is impractical. As a result, unsupervised keyphrase extraction is not only a valuable tool for information retrieval but also a pivotal technology for the ongoing expansion of knowledge-driven applications in NLP.</p><p dir="ltr">In this dissertation, we introduce three innovative unsupervised keyphrase extraction methods: AttentionRank, AGRank, and LLMRank. Additionally, we present a method for constructing knowledge graphs from unsupervised keyphrase extraction, leveraging the self-attention mechanism. The first study discusses the AttentionRank model, which utilizes a pre-trained language model to derive underlying importance rankings of candidate phrases through self-attention. This model employs a cross-attention mechanism to assess the semantic relevance between each candidate phrase and the document, enhancing the phrase ranking process. AGRank, detailed in the second study, is a sophisticated graph-based framework that merges deep learning techniques with graph theory. It constructs a candidate phrase graph using mutual attentions from a pre-trained language model. Both global document information and local phrase details are incorporated as enhanced nodes within the graph, and a graph algorithm is applied to rank the candidate phrases. The third study, LLMRank, leverages the strengths of large language models (LLMs) and graph algorithms. It employs LLMs to generate keyphrase candidates and then integrates global information through the text's graphical structures. This process reranks the candidates, significantly improving keyphrase extraction performance. The fourth study explores how self-attention mechanisms can be used to extract keyphrases from medical literature and generate query-related phrase graphs, improving text retrieval visualization. The mutual attentions of medical entities, extracted using a pre-trained model, form the basis of the knowledge graph. This, coupled with a specialized retrieval algorithm, allows for the visualization of long-range connections between medical entities while simultaneously displaying the supporting literature. In summary, our exploration of unsupervised keyphrase extraction and biomedical data analysis introduces novel methods and insights in NLP, particularly in information extraction. These contributions are crucial for the efficient processing of large text datasets and suggest avenues for future research and applications.</p>
|
13 |
Measurement and Development for Automated Secure Coding SolutionsFrantz, Miles Eugene 09 September 2024 (has links)
With the rise of development efforts, there has also been a rise in source code vulnerabilities.
Advanced security tools have been created to identify these vulnerabilities throughout the lifetime of the developer's ecosystem and afterward, before the vulnerabilities are exposed.
One such popular method is Static Code Analysis (Code Analysis) (SCA), which scans developers' source code to identify potential vulnerabilities in the code. My Ph.D. work aims to help reduce the vulnerabilities exposed by YIELD, ENHANCE, and EVALUATE (EYE) SCA tools to identify vulnerabilities while the developer writes the code. We first look into evaluating tools that support developers with their source code by determining how accurate they are with identifying vulnerability information. Large Language Machine Learning Model (LLM)s have been on the rise recently with the introduction of Chat Generative Pre-trained Transformer (ChatGPT) 3.5, ChatGPT 4.1, Google Gemini, and many more.
Using a common framework, we created a zero-shot prompt instructing the LLM to identify; whether there is a vulnerability in the provided source code and what Common Weakness Enumeration (CWE) value represents the vulnerability. With our Python cryptographic benchmark PyCryptoBench, we sent vulnerable samples to four different LLMs and two different versions of ChatGPT Application Program Interface (API)s. The samples allow us to measure how reliable each LLM is at vulnerability identification and defining. The Chat- GPT APIs include multiple reproducible fields that allowed us to measure how reproducible the responses are. Next, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. Cryptolation, our state-of-the-art (SOA) Python SCA tool uses constant propagation-supported variable inference to obtain insight into the data flow state through the program's execution. Python source code has ever-increasing complexities and a lack of SCA tools compared to Java. We compare Cryptolation with the other SOA SCA tools Bandit, Semgrep, and Dlint. To verify the Precision of our tool, we created the benchmark PyCryptoBench, which contains 1,836 test cases and encompasses five different language features. Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and sampled over 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Finally, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we examine several developer surveys to determine their reasons for not adopting such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently.
To achieve this, we enhance the SOA Java SCA tool CryptoGuard with the following: integrated build tools, modern terminal Command Line Interface (CLI) usage, customizable and vendor-specific output formats, and no-install demos. / Doctor of Philosophy / With the rise of more development efforts and source codes, there has also been a rise in source code vulnerabilities. More advanced security tools have been created to identify these vulnerabilities before they are exposed to match this. SCA are a popular method for identifying vulnerable source code since they do not execute any code and can scan the code while the developer is writing it. Despite their popularity, there is still much room for improvement.
My Ph.D. work aims to help reduce the vulnerabilities exposed by EYE SCA tools to identify vulnerabilities while the developer writes the code. First, we look into evaluating tools that support and refine SCA by examining the Accuracy and secureness of generative LLMs. LLM have been on the rise recently with the introduction of ChatGPT 3.5 and, more recently, ChatGPT 4.1. ChatGPT is a conversation-based program in which you ask the program a question, and it answers the question. This can explain small source code snippets to developers, provide source code examples, or even fix source code. While the developers of the LLMs have restricted certain aspects of the models, one of their main selling points is their source code assistance. With over 1,000 zero-shot prompts, we measure how accurate and reliable LLMs are in identifying the existence and information of vulnerabilities within the source code. First, we yield a new SCA tool to apply what we learned to a current gap in increasingly complex source code. This tool is Cryptolation, a Python SCA tool that uses variable inference to try to determine the variable values without execution. Python source code has ever-increasing complexities and a lack of tools compared to Java. We compare Cryptolation with four other SOA tools. To verify the Precision of our tool, we create the benchmark PyCryptoBench, over 1,000 test cases encompassing five different language features.
Next, we crawled over 1,000 cryptographic-related Python projects on GitHub and each with each tool. Finally, we reviewed all PyCryptoBench results and samples of the 10,000 cryptographic-related Python projects. The results reveal Cryptolation has a 100% Precision on the benchmark, with the second highest Precision with cryptographic-related projects. Next, we look at enhancing SCA tools. The SOA tools already compete to have the highest Precision, Recall, and Accuracy. However, we investigated developers' current surveys to see what they identified as reasons not to adopt such tools. These are generally better aesthetics, usability, customization, and a low effort cost to use consistently. To achieve this, we enhance the SOA Java SCA tool CryptoGuardto address these adequately.
|
14 |
The Student Becomes The Teacher: Training High-Performance Language Models More Sample-Efficiently From Small Models Via SuperstillingGundry, Chaz Allen 14 August 2023 (has links) (PDF)
Recent advances including the Transformer architecture have revolutionized the Natural Language Processing community by providing immense performance improvements across many tasks, including the development of Large Language Models (LLMs). LLMs show enormous promise as few-shot learners, common-sense knowledge repositories, conversational agents, writing assistants, and coding tools, and are gaining widespread traction in commercial industry. However, LLMs are expensive and time-consuming to train, requiring many passes over terabytes of data for the largest models. In this paper, we present Superstilling, a method for reducing the sample complexity of language model training by distilling the knowledge from a previously-trained model (the teacher) into a new, larger model (the student). This method does not require conformity between the architectures of the two models, and can be applied even when the weights and training data of the teacher model are not available, for example in federated learning scenarios. We apply Superstilling to train models of various sizes and show this method can decrease sample complexity by more than 10\% on models with over 160M parameters. We also show that in certain scenarios, Superstilling can be used to speed up training despite the need to run the teacher and student models simultaneously.
|
15 |
Automatisering av CPV- klassificering : En studie om Large Language Models i kombination med word embeddings kan lösa CPV-kategorisering av offentliga upphandlingar.Andersson, Niklas, Andersson Sjöberg, Hanna January 2024 (has links)
Denna studie utforskar användningen av Large Language Models och word embeddings för attautomatisera kategoriseringen av CPV-koder inom svenska offentliga upphandlingar. Tidigarestudier har inte lyckats uppnå tillförlitlig kategorisering, men detta experiment testar en nymetod som innefattar LLM-modellerna Mistral och Llama3 samt FastText word embeddings. Resultaten visar att även om studiens lösning korrekt kan identifiera vissa CPV-huvudgrupper, är dess övergripande prestanda låg med ett resultat på 12% för en helt korrekt klassificering av upphandlingar och 35% för en delvis korrekt klassificering med minst en korrekt funnen CPV-huvudgrupp. Förbättringar behövs både när det kommer till korrekthet och noggrannhet. Studien bidrar till forskningsfältet genom att påvisa de utmaningar och potentiella lösningar som finns för automatiserad kategorisering av offentliga upphandlingar. Den föreslår även framtida forskning som omfattar användningen av större och mer avancerade modeller för att adressera de identifierade utmaningarna.
|
16 |
Towards On-Premise Hosted Language Models for Generating Documentation in Programming ProjectsHedlund, Ludvig January 2024 (has links)
Documentation for programming projects can vary both in quality and availability. The availability of documentation can vary more for a closed working environment, since fewer developers will read the documentation. Documenting programming projects can be demanding on worker hours and unappreciated among developers. It is a common conception that developers rather invest time on developing a project than documenting a project, and making the documentation process more effective would benefit developers. To move towards a more automated process of writing documentation, this work generated documentation for repositories which attempts to summarize the repositories in their use cases and functionalities. Two different implementations are created to generate documentation using an on-premise hosted large language model (LLM) as a tool. First, the embedded solution processes all available code in a project and creates the documentation based on multiple summarizations of files and folders. Second, the RAG solution attempts to use only the most important parts of the code and lets the LLM create the documentation on a smaller set of the codebase. The results show that generating documentation is possible, but unreliable and must be controlled by a person with knowledge about the codebase. The embedded solution seems to be more reliable and produce better results, but is more costly compared to the RAG solution.
|
17 |
Enhancing Software Maintenance with Large Language Models : A comprehensive studyYounes, Youssef, Nassrallah, Tareq January 2024 (has links)
This study investigates the potential of Large Language Models (LLMs) to automate and enhance software maintenance tasks, focusing on bug detection and code refactoring. Traditional software maintenance, which includes debugging and code optimization, is time-consuming and prone to human error. With advancements in artificial intelligence, LLMs like ChatGPT and Copilot offer promising capabilities for automating these tasks. Through a series of quasi-experiments, we evaluate the effectiveness of ChatGPT 3.5, ChatGPT 4 (Grimoire GPT), and GitHub Copilot. Each model was tested on various code snippets to measure their ability to identify and correct bugs and refactor code while maintaining its original functionality. The results indicatethat ChatGPT 4 (Grimoire GPT) outperforms the other models, demonstrating superior accuracy and effectiveness, with success percentages of 87.5% and 75% in bug detection and code refactoring respectively. This research highlights the potential of advanced LLMs to significantly reduce the time and cost associated with software maintenance, though human oversight is still necessary to ensure code integrity. The findings contribute to the understanding of LLM capabilities in real-world software engineering tasks and pave the way for more intelligent and efficient software maintenance practices.
|
18 |
Citation Evaluation Using Large Language Models (LLMs) : Can LLMs evaluate citations in scholarly documents? An experimental study on ChatGPTZeeb, Ahmad, Olsson, Philip January 2024 (has links)
This study investigates the capacity of Large Language Models (LLMs), specifically ChatGPT 3.5 and 4, to evaluate citations in scholarly papers. Given the importance of accurate citations in academic writing, the goal was to determine how well these models can assist in verifying citations. A series of experiments were conducted using a dataset of our own creation. This dataset includes the three main citation categories: Direct Quotation, Paraphrasing, and Summarising, along with subcategories such as minimal and long source text. In the preliminary experiment, ChatGPT 3.5 demonstrated perfect accuracy, while ChatGPT 4 showed a tendency towards false positives. Further experiments with an extended dataset revealed that ChatGPT 4 excels in correctly identifying valid citations, particularly with longer and more complex texts, but is also more prone to wrong predictions. ChatGPT 3.5, on the other hand, provided a more balanced performance across different text lengths, with both models achieving an accuracy rate of 90.7%. The reliability experiments indicated that ChatGPT 4 is more consistent in its responses compared to ChatGPT 3.5, although it also had a higher rate of consistent wrong predictions. This study highlights the potential of LLMs to assist scholars in citation verification, suggesting a hybrid approach where ChatGPT 4 is used for initial scans and ChatGPT 3.5 for final verification, paving the way for automating this process. Additionally, this study contributes a dataset that can be further expanded and tested on, offering a valuable resource for future research in this domain.
|
19 |
Improving Context Awareness of Transformer Networks using Retrieval-Augmented GenerationDo, Anh, Tran, Saga January 2024 (has links)
The Thermo-Calc software is a key tool in the research process for many material engineers. However, integrating multiple modules in Thermo-Calc requires the user to write code in a Python-based language, which can be challenging for novice programmers. This project aims to enable the generation of such code from user prompts by using existing generative AI models. In particular, we use a retrieval-augmented generation architecture applied to LLaMA and Mistral models. We use Code LLaMA-Instruct models with 7, 13, and 34 billion parameters, and a Mistral-Instruct model with 7 billion parameters. These models are all based on LLaMA 2. We also use a LLaMA 3-Instruct model with 8 billion parameters. All these models are instruction-tuned, which suggests that they have the capability to interpret natural language and identify appropriate options for a command-line program such as Python. In our testing, the LLaMA 3-Instruct model performed best, achieving 53% on the industry benchmark HumanEval and 49% on our internal adequacy assessment at pass@1, which is the expected probability of getting a correct solution when generating a response. This indicates that the model generates approximately every other answer correct. Due to GPU memory limitations, we had to apply quantisation to process the 13 and 34 billion parameter models. Our results revealed a mismatch between model size and optimal levels of quantisation, indicating that reduced precision adversely affects the performance of these models. Our findings suggest that a properly customised large language model can greatly reduce the coding effort of novice programmers, thereby improving productivity in material research.
|
20 |
Large language models and variousprogramming languages : A comparative study on bug detection and correctionGustafsson, Elias, Flystam, Iris January 2024 (has links)
This bachelor’s thesis investigates the efficacy of cutting-edge Large Language Models (LLMs) — GPT-4, Code Llama Instruct (7B parameters), and Gemini 1.0 — in detecting and correcting bugs in Java and Python code. Through a controlled experiment using standardized prompts and the QuixBugs dataset, each model's performance was analyzed and compared. The study highlights significant differences in the ability of these LLMs to correctly identify and fix programming bugs, showcasing a comparative advantage in handling Python over Java. Results suggest that while all these models are capable of identifying bugs, their effectiveness varies significantly between models. The insights gained from this research aim to aid software developers and AI researchers in selecting appropriate LLMs for integration into development workflows, enhancing the efficiency of bug management processes.
|
Page generated in 0.0957 seconds