Global ETD Search

41	Incorporating LLM-based Interactive Learning Environments in CS Education: Learning Data Structures and Algorithms using the Gurukul platform Rachha, Ashwin Kedari 24 September 2024 (has links) Large Language Models (LLMs) have emerged as a revolutionary force in Computer Science Education, offering unprecedented opportunities to facilitate learning and comprehension. Their application in the classroom, however, is not without challenges. LLMs are prone to hallucination and contextual inaccuracies. Furthermore, they risk exposing learning processes to cheating illicit practices and providing explicit solutions that impede the development of critical thinking skills in students. To address these pitfalls and investigate how specialized LLMs can enhance engagement among learners particularly using LLMs, we present Gurukul, a unique coding platform incorporating dual features - Retrieval Augmented Generation and Guardrails. Gurukul's practice feature provides a hands-on code editor to solve DSA problems with the help of a dynamically Guardrailed LLM to prevent explicit code solutions. On the other hand, Gurukul's Study feature incorporates a Retrieval Augmented Generation mechanism that uses OpenDSA as its source of truth, allowing the LLM to fetch and present information accurately and relevantly, thereby trying to overcome the issue of inaccuracies. We present these features to evaluate the user perceptions of LLM-assisted educational tools. To evaluate the effectiveness and utility of Gurukul in a real-world educational setting, we conducted a User Study and a User Expert Review with students (n=40) and faculty (n=2), respectively, from a public state university in the US specializing in DSA courses. We examine student's usage patterns and perceptions of the tool and report reflections from instructors and a series of recommendations for classroom use. Our findings suggest that Gurukul had a positive impact on student learning and engagement in learning DSA. This feedback analyzed through qualitative and quantitative methods indicates the promise of the utility of specialized LLMs in enhancing student engagement in DSA learning. / Master of Science / Computer science education is continuously evolving with new technologies enhancing the learning experience. This thesis introduces Gurukul, an innovative platform designed to transform the way students learn Data Structures and Algorithms (DSA). Gurukul integrates large language models (LLMs) with advanced features like Retrieval Augmented Generation (RAG) and Guardrails to create an interactive and adaptive learning environment. Traditional learning methods often struggle with providing accurate information and engaging students actively. Gurukul addresses these issues by offering a live code editor for hands-on practice and a study feature that retrieves accurate information from trusted sources. The platform ensures students receive context-sensitive guidance without bypassing critical thinking skills. A study involving students and faculty from a public university specializing in DSA courses evaluated Gurukul's effectiveness. The feedback, based on qualitative and quantitative evaluations, highlights the platform's potential to enhance student engagement and learning outcomes in computer science education. This research contributes to the ongoing development of educational technologies and provides insights for future improvements. Large Language Models Retrieval Augmented Generation Guardrails Computer Science Education Adaptive Learning Technologies
42	A Framework to Identify Online Communities for Social Media Analysis Nikhil Mehta (9750842) 16 October 2024 (has links) <p dir="ltr">Easy access, variety of content, and fast widespread interactions are some of the reasons that have made social media increasingly popular in our society. This has lead to many people use social media everyday for a variety of reasons, such as interacting with friends or consuming news content. Thus, understanding content on social media is more important than ever.</p><p dir="ltr">An increased understanding on social media can lead to improvements on a large number of important tasks. In this work, we particularly focus on fake news detection and political bias detection. Fake news, text published by news sources with an intent to spread misinformation and sway beliefs, is ever prevalent in today's society. Detecting it is an important and challenging problem to prevent large scale misinformation and maintain a healthy society. In a similar way, detecting the political bias of news content can provide insights about the different perspectives on social media.</p><p dir="ltr">In this work, we view the problem of understanding social media as reasoning over the relationships between sources, the articles they publish, and the engaging users. We start by analyzing these relationships in a graph-based framework, and then use Large Language Models to do the same. We hypothesize that the key to understanding social media is understanding these relationships, such as identifying which users have similar perspectives, or which articles are likely to be shared by similar users.</p><p dir="ltr">Throughout this thesis, we propose several frameworks to capture the relationships on social media better. We initially tackle this problem using supervised learning systems, improving them to achieve strong performance. However, we find that automatedly modeling the complexities of the social media landscape is challenging. On the contrary, having humans analyze and interact with all news content to find relationships, is not scalable. Thus, we then propose to approach enhance our supervised approaches by approaching the social media understanding problem \textit{interactively}, where humans can interact to help an automated system learn a better social media representation quality.</p><p dir="ltr">On real world events, our experiments show performance improvements in detecting the factuality and political bias of news sources, both when trained with and without minimal human interactions. We particularly focus on one of the most challenging setups of this task, where test data is unseen and focuses on new topics when compared with the training data. This realistic setting shows the real world impact of our work in improving social media understanding.</p> Natural language processing social media analysis Large Language Models (LLMs) Natural Language Processing Model
43	Analysis of Security Findings and Reduction of False Positives through Large Language Models Wagner, Jonas 18 October 2024 (has links) This thesis investigates the integration of State-of-the-Art (SOTA) Large Language Models (LLMs) into the process of reassessing security findings generated by Static Application Security Testing (SAST) tools. The primary objective is to determine whether LLMs are able to detect false positives (FPs) while maintaining a high true positive (TP) rate, thereby enhancing the efficiency and effectiveness of security assessments. Four consecutive experiments were conducted, each addressing specific research questions. The initial experiment, using a dataset of security findings extracted from the OWASP Bench- mark, identified the optimal combination of context items provided by the SAST tool Spot- Bugs, which, when used with GPT-3.5 Turbo, reduced FPs while minimizing the loss of TPs. The second experiment, conducted on the same dataset, demonstrated that advanced prompting techniques, particularly few-shot Chain-of-Thought (CoT) prompting combined with Self-Consistency (SC), further improved the reassessment process. The third experiment compared both proprietary and open-source LLMs on an OWASP Benchmark dataset about one-fourth the size of the previously used dataset. GPT-4o achieved the highest performance, detecting 80 out of 128 FPs without missing any TPs, resulting in a perfect TPR of 100% and a decrease in FPR by 41.27 percentage points. Meanwhile, Llama 3.1 70B detected 112 out of the 128 FPs but missed 10 TPs, resulting in a TPR of 94.94% and a reduction in FPR by 56.62 percentage points. To validate these findings in a real-world context, the approach was applied to a dataset generated from the open-source project Mnestix using multiple SAST tools. GPT-4o again emerged as the top performer, detecting 26 out of 68 FPs while only missing one TP, resulting in a TPR decreased by 2.22 percentage points but simultaneously an FPR decreased 37.57 percentage points.:Table of Contents IV List of Figures VI List of Tables VIII List of Source Codes IX List of Abbreviations XI 1. Motivation 1 2. Background 3 3. Related Work 17 4. Concept 31 5. Preparing a Security Findings Dataset 39 6. Implementing a Workflow 51 7. Identifying Context Items 67 8. Comparing Prompting Techniques 85 9. Comparing Large Language Models 101 10.Evaluating Developed Approach 127 11.Discussion 141 12.Conclusion 145 A. Appendix: Figures 147 A.1. Repository Directory Tree 148 A.2. Precision-Recall Curve of Compared Large Language Models 149 A.3. Performance Metrics Self-Consistency on Mnestix Dataset 150 B. Appendix: Tables 151 B.1. Design Science Research Concept 151 C. Appendix: Code 153 C.1. Pydantic Base Config Documentation 153 C.2. Pydantic LLM Client Config Documentation 155 C.3. LLM BaseClient Class 157 C.4. Test Cases Removed From Dataset 158
44	Automatisering av CPV- klassificering : En studie om Large Language Models i kombination med word embeddings kan lösa CPV-kategorisering av offentliga upphandlingar. Andersson, Niklas, Andersson Sjöberg, Hanna January 2024 (has links) Denna studie utforskar användningen av Large Language Models och word embeddings för attautomatisera kategoriseringen av CPV-koder inom svenska offentliga upphandlingar. Tidigarestudier har inte lyckats uppnå tillförlitlig kategorisering, men detta experiment testar en nymetod som innefattar LLM-modellerna Mistral och Llama3 samt FastText word embeddings. Resultaten visar att även om studiens lösning korrekt kan identifiera vissa CPV-huvudgrupper, är dess övergripande prestanda låg med ett resultat på 12% för en helt korrekt klassificering av upphandlingar och 35% för en delvis korrekt klassificering med minst en korrekt funnen CPV-huvudgrupp. Förbättringar behövs både när det kommer till korrekthet och noggrannhet. Studien bidrar till forskningsfältet genom att påvisa de utmaningar och potentiella lösningar som finns för automatiserad kategorisering av offentliga upphandlingar. Den föreslår även framtida forskning som omfattar användningen av större och mer avancerade modeller för att adressera de identifierade utmaningarna. CPV-kategorisering Offentliga upphandlingar Large Language Models Word embeddings Computer Sciences Datavetenskap (datalogi)
45	Towards On-Premise Hosted Language Models for Generating Documentation in Programming Projects Hedlund, Ludvig January 2024 (has links) Documentation for programming projects can vary both in quality and availability. The availability of documentation can vary more for a closed working environment, since fewer developers will read the documentation. Documenting programming projects can be demanding on worker hours and unappreciated among developers. It is a common conception that developers rather invest time on developing a project than documenting a project, and making the documentation process more effective would benefit developers. To move towards a more automated process of writing documentation, this work generated documentation for repositories which attempts to summarize the repositories in their use cases and functionalities. Two different implementations are created to generate documentation using an on-premise hosted large language model (LLM) as a tool. First, the embedded solution processes all available code in a project and creates the documentation based on multiple summarizations of files and folders. Second, the RAG solution attempts to use only the most important parts of the code and lets the LLM create the documentation on a smaller set of the codebase. The results show that generating documentation is possible, but unreliable and must be controlled by a person with knowledge about the codebase. The embedded solution seems to be more reliable and produce better results, but is more costly compared to the RAG solution. LLM Large Language Models Documentation AI Computer and Information Sciences Data- och informationsvetenskap Computer Engineering Datorteknik
46	Enhancing Software Maintenance with Large Language Models : A comprehensive study Younes, Youssef, Nassrallah, Tareq January 2024 (has links) This study investigates the potential of Large Language Models (LLMs) to automate and enhance software maintenance tasks, focusing on bug detection and code refactoring. Traditional software maintenance, which includes debugging and code optimization, is time-consuming and prone to human error. With advancements in artificial intelligence, LLMs like ChatGPT and Copilot offer promising capabilities for automating these tasks. Through a series of quasi-experiments, we evaluate the effectiveness of ChatGPT 3.5, ChatGPT 4 (Grimoire GPT), and GitHub Copilot. Each model was tested on various code snippets to measure their ability to identify and correct bugs and refactor code while maintaining its original functionality. The results indicatethat ChatGPT 4 (Grimoire GPT) outperforms the other models, demonstrating superior accuracy and effectiveness, with success percentages of 87.5% and 75% in bug detection and code refactoring respectively. This research highlights the potential of advanced LLMs to significantly reduce the time and cost associated with software maintenance, though human oversight is still necessary to ensure code integrity. The findings contribute to the understanding of LLM capabilities in real-world software engineering tasks and pave the way for more intelligent and efficient software maintenance practices. Large Language Models Bug detection Code refactoring ChatGPT GitHub Copilot Software Engineering Programvaruteknik
47	Citation Evaluation Using Large Language Models (LLMs) : Can LLMs evaluate citations in scholarly documents? An experimental study on ChatGPT Zeeb, Ahmad, Olsson, Philip January 2024 (has links) This study investigates the capacity of Large Language Models (LLMs), specifically ChatGPT 3.5 and 4, to evaluate citations in scholarly papers. Given the importance of accurate citations in academic writing, the goal was to determine how well these models can assist in verifying citations. A series of experiments were conducted using a dataset of our own creation. This dataset includes the three main citation categories: Direct Quotation, Paraphrasing, and Summarising, along with subcategories such as minimal and long source text. In the preliminary experiment, ChatGPT 3.5 demonstrated perfect accuracy, while ChatGPT 4 showed a tendency towards false positives. Further experiments with an extended dataset revealed that ChatGPT 4 excels in correctly identifying valid citations, particularly with longer and more complex texts, but is also more prone to wrong predictions. ChatGPT 3.5, on the other hand, provided a more balanced performance across different text lengths, with both models achieving an accuracy rate of 90.7%. The reliability experiments indicated that ChatGPT 4 is more consistent in its responses compared to ChatGPT 3.5, although it also had a higher rate of consistent wrong predictions. This study highlights the potential of LLMs to assist scholars in citation verification, suggesting a hybrid approach where ChatGPT 4 is used for initial scans and ChatGPT 3.5 for final verification, paving the way for automating this process. Additionally, this study contributes a dataset that can be further expanded and tested on, offering a valuable resource for future research in this domain. Large Language Models Citation Verification Citation Evaluation Natural Language Processing Academic Writing Computer Sciences Datavetenskap (datalogi)
48	Improving Context Awareness of Transformer Networks using Retrieval-Augmented Generation Do, Anh, Tran, Saga January 2024 (has links) The Thermo-Calc software is a key tool in the research process for many material engineers. However, integrating multiple modules in Thermo-Calc requires the user to write code in a Python-based language, which can be challenging for novice programmers. This project aims to enable the generation of such code from user prompts by using existing generative AI models. In particular, we use a retrieval-augmented generation architecture applied to LLaMA and Mistral models. We use Code LLaMA-Instruct models with 7, 13, and 34 billion parameters, and a Mistral-Instruct model with 7 billion parameters. These models are all based on LLaMA 2. We also use a LLaMA 3-Instruct model with 8 billion parameters. All these models are instruction-tuned, which suggests that they have the capability to interpret natural language and identify appropriate options for a command-line program such as Python. In our testing, the LLaMA 3-Instruct model performed best, achieving 53% on the industry benchmark HumanEval and 49% on our internal adequacy assessment at pass@1, which is the expected probability of getting a correct solution when generating a response. This indicates that the model generates approximately every other answer correct. Due to GPU memory limitations, we had to apply quantisation to process the 13 and 34 billion parameter models. Our results revealed a mismatch between model size and optimal levels of quantisation, indicating that reduced precision adversely affects the performance of these models. Our findings suggest that a properly customised large language model can greatly reduce the coding effort of novice programmers, thereby improving productivity in material research. Generative AI Natural Language Processing Large Language Models Transformer Networks Retrieval-Augmented Generation Mathematics Matematik
49	Large language models and variousprogramming languages : A comparative study on bug detection and correction Gustafsson, Elias, Flystam, Iris January 2024 (has links) This bachelor’s thesis investigates the efficacy of cutting-edge Large Language Models (LLMs) — GPT-4, Code Llama Instruct (7B parameters), and Gemini 1.0 — in detecting and correcting bugs in Java and Python code. Through a controlled experiment using standardized prompts and the QuixBugs dataset, each model's performance was analyzed and compared. The study highlights significant differences in the ability of these LLMs to correctly identify and fix programming bugs, showcasing a comparative advantage in handling Python over Java. Results suggest that while all these models are capable of identifying bugs, their effectiveness varies significantly between models. The insights gained from this research aim to aid software developers and AI researchers in selecting appropriate LLMs for integration into development workflows, enhancing the efficiency of bug management processes. Large Language Models Bug Detection Java Python AI in Software Development Computer Systems Datorsystem
50	Generalization and Fairness Optimization in Pretrained Language Models Ghanbar Zadeh, Somayeh 05 1900 (has links) This study introduces an effective method to address the generalization challenge in pretrained language models (PLMs), which affects their performance on diverse linguistic data beyond their training scope. Improving PLMs' adaptability to out-of-distribution (OOD) data is essential for their reliability and practical utility in real-world applications. Furthermore, we address the ethical imperative of fairness in PLMs, particularly as they become integral to decision-making in sensitive societal sectors. We introduce gender-tuning, to identify and disrupt gender-related biases in training data. This method perturbs gendered terms, replacing them to break associations with other words. Gender-tuning stands as a practical, ethical intervention against gender bias in PLMs. Finally, we present FairAgent, a novel framework designed to imbue small language models (SLMs) with fairness, drawing on the knowledge of large language models (LLMs) without incurring the latter's computational costs. FairAgent operates by enabling SLMs to consult with LLMs, harnessing their vast knowledge to guide the generation of less biased content. This dynamic system not only detects bias in SLM responses but also generates prompts to correct it, accumulating effective prompts for future use. Over time, SLMs become increasingly adept at producing fair responses, enhancing both computational efficiency and fairness in AI-driven interactions. Natural Language Processing Pretrained Language models Generalization Fairness optimization Computer Science Artificial Intelligence

Search results