Global ETD Search

1	InjectBench: An Indirect Prompt Injection Benchmarking Framework Kong, Nicholas Ka-Shing 20 August 2024 (has links) The integration of large language models (LLMs) with third party applications has allowed for LLMs to retrieve information from up-to-date or specialized resources. Although this integration offers numerous advantages, it also introduces the risk of indirect prompt injection attacks. In such scenarios, an attacker embeds malicious instructions within the retrieved third party data, which when processed by the LLM, can generate harmful and untruthful outputs for an unsuspecting user. Although previous works have explored how these attacks manifest, there is no benchmarking framework to evaluate indirect prompt injection attacks and defenses at scale, limiting progress in this area. To address this gap, we introduce InjectBench, a framework that empowers the community to create and evaluate custom indirect prompt injection attack samples. Our study demonstrate that InjectBench has the capabilities to produce high quality attack samples that align with specific attack goals, and that our LLM evaluation method aligns with human judgement. Using InjectBench, we investigate the effects of different components of an attack sample on four LLM backends, and subsequently use this newly created dataset to do preliminary testing on defenses against indirect prompt injections. Experiment results suggest that while more capable models are susceptible to attacks, they are better equipped at utilizing defense strategies. To summarize, our work helps the research community to systematically evaluate features of attack samples and defenses by introducing a dataset creation and evaluation framework. / Master of Science / Large language models (LLMs), such as ChatGPT, are now able to retrieve up-to-date information from online resources like Google Flights or Wikipedia. This ultimately allows the LLM to utilize current information to generate truthful, helpful and accurate responses. Despite the numerous advantages, it also exposes a user to a new vector of attacks known as indirect prompt injections. In this attack, an attacker will write a instruction onto an online resource that an LLM will process when retrieved from the online resource. The primary aim of the attacker is to instruct the LLM to say something it is not supposed to, and thus may manifest as a blatant lie or misinformation given to the user. Prior works have studied and showcased the harmfulness of this attack, however not many works have tried to understand which LLMs are more vulnerable to indirect prompt injection attacks and how we may defend from them. We believe that this is mainly due to the non-availability of a benchmarking dataset which allows us to test LLMs and new defenses. To address this gap, we introduce InjectBench, a methodology that allows the automated creation of these benchmarking datasets, and the evaluation of LLMs and defenses. We show that InjectBench can produce a high quality dataset that we can customize to specific attack goals, and that our evaluation process is accurate and agrees with human judgement. Using the benchmarking dataset created from InjectBench, we evaluate four LLMs and investigate defenses for indirect prompt injection attacks. Large Language Models Security Indirect Prompt Injection
2	Transforming SDOH Screening: Towards a General Framework for Transformer-based Prediction of Social Determinants of Health King III, Kenneth Hale 09 September 2024 (has links) Social Determinants of Health (SDOH) play a crucial role in healthcare outcomes, yet identifying them from unstructured patient data remains a challenge. This research explores the potential of Large Language Models (LLMs) for automated SDOH identification from patient notes. We propose a general framework for SDOH screening that is simple and straightforward. We leverage existing SDOH datasets, adapting and combining them to create a more comprehensive benchmark for this task, addressing the research gap of limited datasets. Using the benchmark and proposed framework, we conclude by conducting several preliminary experiments exploring and comparing promising LLM system implementations. Our findings highlight the potential of LLMs for automated SDOH screening while emphasizing the need for more robust datasets and evaluation frameworks. / Master of Science / Social Determinants of Health (SDOH) have been shown to significantly impact health outcomes and are seen as a major contributor to global health inequities. However, their use within the healthcare industry is still significantly under emphasized, largely due to the difficulty of manually identifying SDOH factors. While previous works have explored automated approaches for SDOH identification, they lack standardization, data transparency and robustness, and are largely outdated compared to the latest Artificial Intelligence (AI) approaches. Therefore, in this work we propose a holistic framework for automated SDOH identification. We also present a higher quality SDOH benchmark, merging existing publicly available datasets, standardizing them, and cleaning them for errors. With this benchmark, we then conducted experiments to gain greater insights into the best performance across different state-of-the-art AI approaches. Through this work, we contribute a better way to think about automated SDOH screening systems, the first publicly accessible multi-clinic and multi-annotator benchmark, as well as greater insights into the latest AI approaches for state-of-the-art results. Generative AI Large Language Models Social Determinants of Health
3	“Imagine You’re a Qualitative Researcher”: Exploring the Possibilities and Limitations of Gen-AI for Thematic Analysis Narkiewicz, Nicole 01 January 2024 (has links) (PDF) The integration of technology and technological advancements in qualitative research has transformed the research process over the past several decades. Various tools, applications, and devices have expanded opportunities for multimodal field sites and created possibilities for online observations, focus groups, and interviews. As a result of new technologies and innovations, methods of qualitative data collection and analysis have transformed, and methodological approaches have evolved. As new technologies emerge, it is important to understand their impacts on the research process. With the increasing accessibility of large language models, researchers and institutions must carefully assess the implications for qualitative analysis. In this qualitative methodological dissertation I explore the possibilities and limitations of utilizing generative artificial intelligence (GenAI) to analyze text data. I demonstrate the knowledge required to approach thematic analysis using Copilot Pro in Word (Copilot). I discuss the methodological decisions I encountered while exploring Copilot's features for qualitative analysis and explain my reasoning for choosing to utilize Copilot for this study. Further, I compare the results from a traditionally human-conducted thematic analysis to the codes, categories, and themes generated by Copilot. In the process, I developed the criteria by which I compared the outcomes while also evaluating Copilot’s output against the American Educational Research Association’s Standards for Reporting on Empirical Social Science Research. The findings provide insights into opportunities and limitations in leveraging GenAI tools in the qualitative research process. Through this methodological study I demonstrate Copilot’s capabilities generating codes inductively, grouping codes into categories, and developing themes. I argue that researchers need to balance the capabilities of the tool with an understanding of its limitations, particularly concerning time and efficiency, transparency regarding its analytic process, the reliability of its responses, the presentation of its outcomes, and the level of support provided to substantiate its claims. Qualitative Research GenAI Research Methods Thematic Analysis Large Language Models
4	[en] SUMARIZATION OF HEALTH SCIENCE PAPERS IN PORTUGUESE / [pt] SUMARIZAÇÃO DE ARTIGOS CIENTÍFICOS EM PORTUGUÊS NO DOMÍNIO DA SAÚDE DAYSON NYWTON C R DO NASCIMENTO 30 October 2023 (has links) [pt] Neste trabalho, apresentamos um estudo sobre o fine-tuning de um LLM (Modelo de Linguagem Amplo ou Large Language Model) pré-treinado para a sumarização abstrativa de textos longos em português. Para isso, construímos um corpus contendo uma coleção de 7.450 artigos científicos na área de Ciências da Saúde em português. Utilizamos esse corpus para o fine-tuning do modelo BERT pré-treinado para o português brasileiro (BERTimbau). Em condições semelhantes, também treinamos um segundo modelo baseado em Memória de Longo Prazo e Recorrência (LSTM) do zero, para fins de comparação. Nossa avaliação mostrou que o modelo ajustado obteve pontuações ROUGE mais altas, superando o modelo baseado em LSTM em 30 pontos no F1-score. O fine-tuning do modelo pré-treinado também se destaca em uma avaliação qualitativa feita por avaliadores a ponto de gerar a percepção de que os resumos gerados poderiam ter sido criados por humanos em uma coleção de documentos específicos do domínio das Ciências da Saúde. / [en] In this work, we present a study on the fine-tuning of a pre-trained Large Language Model for abstractive summarization of long texts in Portuguese. To do so, we built a corpus gathering a collection of 7,450 public Health Sciences papers in Portuguese. We fine-tuned a pre-trained BERT model for Brazilian Portuguese (the BERTimbau) with this corpus. In a similar condition, we also trained a second model based on Long Short-Term Memory (LSTM) from scratch for comparison purposes. Our evaluation showed that the fine-tuned model achieved higher ROUGE scores, outperforming the LSTM based by 30 points for F1-score. The fine-tuning of the pre-trained model also stands out in a qualitative evaluation performed by assessors, to the point of generating the perception that the generated summaries could have been created by humans in a specific collection of documents in the Health Sciences domain. [pt] PORTUGUES [pt] ARTIGOS CIENTIFICOS [pt] SUMARIZACAO ABSTRATIVA [pt] FINE-TUNING [pt] LARGE LANGUAGE MODELS [en] PORTUGUESE [en] SCIENTIFIC PAPERS [en] ABSTRACTIVE SUMMARIZATION [en] FINE-TUNING [en] LARGE LANGUAGE MODELS
5	Semantische Transformation von natürlichsprachigen Anfragen in Datenbankabfragesprachen: Design und Implementierung einer sprachgesteuerten Schnittstelle für die semantische Transformation von natürlichsprachigen Anfragen in Datenbankabfragesprachen am Beispiel von OntoChem´s SciWalker Horstkorte, Garlef 17 December 2024 (has links) Diese Bachelorarbeit beschäftigt sich mit der Entwicklung einer Softwarelösung zur semantischen und syntaktischen Umwandlung natürlicher Sprache in Datenbankabfragesprachen. Ziel ist es, eine benutzerfreundliche Schnittstelle zu schaffen, die auch Nicht-Experten ermöglicht, komplexe Datenbankabfragen durchzuführen. Im Rahmen eines Praktikums bei der OntoChem GmbH wurde zunächst ein regelbasierter Prototyp entwickelt, der natürliche Sprachabfragen in maschinenlesbare Datenbank abfragen transformiert. Anschlieÿend wurde dieser Ansatz mit einem auf Large Language Models (LLMs) basierenden Ansatz, wie beispielsweise ChatGPT, verglichen. Dabei wurden unter anderem die Effizienz, Genauigkeit, Zuverlässigkeit und ökonomischen Kosten beider Ansätze untersucht. Die Arbeit beginnt mit einer Einführung in die Grundlagen der natürlichen Sprachver arbeitung (NLP), regelbasierter Systeme und LLMs. Es folgt eine detaillierte Beschrei bung des Praktikumsprojekts, einschlieÿlich der eingesetzten Technologien und Tools. In den darauf folgenden Kapiteln werden der regelbasierte Ansatz und der LLM-Ansatz zur Umwandlung natürlicher Sprache in Datenbankabfragen vorgestellt, implementiert und getestet. Die Vergleichsanalyse zeigt, dass der regelbasierte Ansatz durch hohe Geschwindigkeit und Datenkontrolle besticht, jedoch in seiner Flexibilität und Genauigkeit limitiert ist. Der LLM-Ansatz bietet hingegen eine höhere Genauigkeit und Flexibilität bei der Interpretation natürlicher Sprache, weist jedoch längere Antwortzeiten und höhere Betriebskosten auf. Abschließend werden Empfehlungen für die Praxis gegeben und zukünftige Forschungsrichtungen aufgezeigt, wie etwa die Kombination beider Ansätze oder das Training eines eigenen Modells. Die Ergebnisse dieser Arbeit tragen dazu bei, die Interaktion zwischen natürlicher Sprache und Datenbanksystemen zu verbessern und bieten praktische Lösungen für die semantische Transformation von Benutzeranfragen.:1 Einleitung 2 1.1 Motivation 2 1.2 Zielsetzung der Arbeit 3 1.3 Aufbau der Arbeit 4 2 Hintergrund und theoretische Grundlagen 6 2.1 Natürliche Sprachverarbeitung (NLP) 6 2.1.1 Grundlagen der NLP 6 2.1.2 Modelle und Algorithmen 7 2.1.3 Anwendungsbereiche 8 2.2 Regelbasierte Systeme 9 2.2.1 Definition und Funktionsweise 9 2.2.2 Beispiele und Anwendungen 10 2.3 Large Language Models (LLMs) 10 2.3.1 Funktionsweise und Architektur 10 2.3.2 Entwicklung und Technologien 14 2.3.3 Training und Datenbasis 15 2.3.4 Anwendungsbereiche 15 2.3.5 Limitationen von GPT-Modellen 16 3 Praktikumsprojekt bei OntoChem GmbH 18 3.1 Unternehmensvorstellung 18 3.1.1 Überblick und Geschichte 18 3.1.2 Produkte und Technologien 19 3.2 Projektbeschreibung 21 3.2.1 Ziel des Projekts 21 3.2.2 Aufgabenstellung 26 3.3 Technologie-Stack und Tools 27 3.3.1 Programmiersprache und Umgebung 27 3.3.2 Bibliotheken 28 4 Regelbasierter Ansatz zur Umwandlung natürlicher Sprache in Datenbankabfragen 29 4.1 API-Design 29 4.1.1 Methodik und Konzeption 29 4.1.2 structFromNaturalSearch 29 4.1.3 queryFromSearchStructure 35 4.2 Implementierung 37 4.2.1 Funktion: SearchStructureFromString 37 4.2.2 Integration OC-Technologien 38 4.2.3 Algorithmen und Regeln 40 4.2.4 Herausforderungen 43 5 LLM-Ansatz zur Umwandlung natürlicher Sprache in Datenbankabfragen 45 5.1 Einführung in den LLM-Ansatz 45 5.1.1 Grundlagen 45 5.1.2 Vergleich mit Regelbasierten Systemen 46 5.2 Prompting in LLMs (z.B. ChatGPT) 46 5.2.1 Prinzipien des Promptings 46 5.2.2 Design effektiver Prompts 47 5.3 Tests und Evaluierung 50 5.3.1 Beschreibung der Tests 50 5.3.2 Ergebnisse und Analyse 52 6 Vergleich der Ansätze 58 6.1 Methodik 58 6.2 Ergebnisse 58 6.3 Diskussion 61 7 Evaluation und Ausblick 62 7.1 Kritische Betrachtung 62 7.2 Limitationen und Fehlerquellen 62 7.3 Fazit und Implikationen 63 7.4 Zukünftige Forschung 63 Literaturverzeichnis I Abbildungsverzeichnis IV Daten- und Codeverzeichnis V
6	Comparative Analysis of Language Models: hallucinations in ChatGPT : Prompt Study / Jämförande analys av språkmodeller: hallucinationer i ChatGPT : Prompt Studie Hanna, Elias, Levic, Alija January 2023 (has links) This thesis looks at the percentage of hallucinations in two large language models (LLM), ChatGPT 3.5 and ChatGPT 4 output for a set of prompts. This work was motivated by two factors: the release of ChatGPT 4 and its parent company OpenAI, claiming it to be much more potent than its predecessor ChatGPT 3.5, which raised questions regarding the capabilities of the LLM. Furthermore, the other factor is that ChatGPT 3.5 showcased hallucinations (creating material that is factually wrong, deceptive, or untrue.) in response to different prompts, as shown by other studies. The intended audience was members of the computer science community, such as researchers, software developers, and policymakers. The aim was to highlight large language models' potential capabilities and provide insights into their dependability. This study used a quasi-experimental study design and a systematic literature review.Our hypothesis predicted that the percentage of hallucinations (creating factually wrong, deceptive, or untrue material) would be more prevalent in ChatGPT 3.5 compared to ChatGPT 4. We based our prediction on the fact that OpenAI trained ChatGPT 4 on more material than ChatGPT 3.5. We experimented on both LLMS, and our findings supported The hypothesis. Furthermore, we looked into the literature and found studies that also agree that ChatGPT 4 is better than ChatGPT 3.5. The research concluded with suggestions for future work, like using extensive datasets and comparing the performance of different models, not only ChatGPT 3.5 and ChatGPT 4. Large Language Models Hallucinations ChatGPT Stora Språkmodeller Hallucinationer ChatGPT Computer Sciences Datavetenskap (datalogi)
7	The Influence of Political Media on Large Language Models: Impacts on Information Synthesis, Reasoning, and Demographic Representation Shaw, Alexander Glenn 16 August 2023 (has links) (PDF) This thesis investigates the impact of finetuning the LLaMA 33B language model on partisan news datasets, revealing negligible changes and underscoring the enduring influence of pretraining datasets on model opinions. Training nine models across nine distinct news datasets spanning three topics and two ideologies, the study found consistent demographic representation, predominantly favoring liberal, college-educated, high-income, and non-religious demographics. Interestingly, a depolarizing effect emerged from partisan news finetuning, suggesting that intense exposure to topic-specific information might lead to depolarization, irrespective of ideological alignment. Despite the exposure to contrasting viewpoints, LLaMA 33B maintained its common sense reasoning ability, showing minimal variance on evaluation metrics like Hellaswag accuracy, ARC accuracy, and TruthfulQA MC1 and MC2. These results might indicate robustness in common sense reasoning or a deficiency in synthesizing diverse contextual information. Ultimately, this thesis demonstrates the resilience of high-performing language models like LLaMA 33B against targeted ideological bias, demonstrating their continued functionality and reasoning ability, even when subjected to highly partisan information environments. large language models political news demographic alignment common sense reasoning information synthesis Physical Sciences and Mathematics
8	ChatGPT: A Good Computer Engineering Student? : An Experiment on its Ability to Answer Programming Questions from Exams Loubier, Michael January 2023 (has links) The release of ChatGPT has really set new standards for what an artificial intelligence chatbot should be. It has even shown its potential in answering university-level exam questions from different subjects. This research is focused on evaluating its capabilities in programming subjects. To achieve this, coding questions taken from software engineering exams were posed to the AI (N = 23) through an experiment. Then, statistical analysis was done to find out how good of a student ChatGPT is by analyzing its answer’s correctness, degree of completion, diversity of response, speed of response, extraneity, number of errors, length of response and confidence levels. GPT-3.5 is the version analyzed. The experiment was done using questions from three different programming subjects. Afterwards, results showed a 93% rate of correct answer generation, demonstrating its competence. However, it was found that the AI occasionally produces unnecessary lines of code that were not asked for and thus treated as extraneity. The confidence levels given by ChatGPT, which were always high, also didn't always align with response quality which showed the subjectiveness of the AI’s self-assessment. Answer diversity was also a concern, where most answers were repeatedly written nearly the same way. Moreover, when there was diversity in the answers, it also caused much more extraneous code. If ChatGPT was to be blind tested for a software engineering exam containing a good number of coding questions, unnecessary lines of code and comments could be what gives it away as being an AI. Nonetheless, ChatGPT was found to have great potential as a learning tool. It can offer explanations, debugging help, and coding guidance just as any other tool or person could. It is not perfect though, so it should be used with caution. ChatGPT Chatbot OpenAI Large Language Models Natural Language Processing Computer Sciences Datavetenskap (datalogi)
9	Innovating the Study of Self-Regulated Learning: An Exploration through NLP, Generative AI, and LLMs Gamieldien, Yasir 12 September 2023 (has links) This dissertation explores the use of natural language processing (NLP) and large language models (LLMs) to analyze student self-regulated learning (SRL) strategies in response to exam wrappers. Exam wrappers are structured reflection activities that prompt students to practice SRL after they get their graded exams back. The dissertation consists of three manuscripts that compare traditional qualitative analysis with NLP-assisted approaches using transformer-based models including GPT-3.5, a state-of-the-art LLM. The data set comprises 3,800 student responses from an engineering physics course. The first manuscript develops two NLP-assisted codebooks for identifying learning strategies related to SRL in exam wrapper responses and evaluates the agreement between them and traditional qualitative analysis. The second manuscript applies a novel NLP technique called zero-shot learning (ZSL) to classify student responses into the codes developed in the first manuscript and assesses the accuracy of this method by evaluating a subset of the full dataset. The third manuscript identifies the distribution and differences of learning strategies and SRL constructs among students of different exam performance profiles using the results from the second manuscript. The dissertation demonstrates the potential of NLP and LLMs to enhance qualitative research by providing scalable, robust, and efficient methods for analyzing large corpora of textual data. The dissertation also contributes to the understanding of SRL in engineering education by revealing the common learning strategies, impediments, and SRL constructs that students report they use while preparing for exams in a first-year engineering physics course. The dissertation suggests implications, limitations, and directions for future research on NLP, LLMs, and SRL. / Doctor of Philosophy / This dissertation is about using artificial intelligence (AI) to help researchers and teachers understand how students learn from their exams. Exams are not only a way to measure what students know, but also a chance for students to reflect on how they studied and what they can do better next time. One way that students can reflect is by using exam wrappers, which are short questions that students answer after they get their graded exams back. A type of AI called natural language processing (NLP) is used in this dissertation, which can analyze text and find patterns and meanings in it. This study also uses a powerful AI tool called GPT-3.5, which can generate text and answer questions. The dissertation has three manuscripts that compare the traditional way of analyzing exam wrappers, which is done by hand, with the new way of using NLP and GPT-3.5, evaluate a specific promising NLP method, and use this method to try and gain a deeper understanding in students self-regulated learning (SRL) while preparing for exams. The data comes from 3,800 exam wrappers from a physics course for engineering students. The first manuscript develops a way of using NLP and GPT-3.5 to find out what learning strategies and goals students talk about in their exam wrappers and compares it to more traditional methods of analysis. The second manuscript tests how accurate a specific NLP technique is in finding these strategies and goals. The third manuscript looks at how different students use different strategies and goals depending on how well they did on the exams using the NLP technique in the second manuscript. I found that NLP and GPT-3.5 can aid in analyzing exam wrappers faster and provide nuanced insights when compared with manual approaches. The dissertation also shows what learning strategies and goals are most discussed for engineering students as they prepare for exams. The dissertation gives some suggestions, challenges, and ideas for future research on AI and learning from exams. Natural language processing Large language models self-regulated learning exam wrappers foundational engineering
10	From Bytecode to Safety : Decompiling Smart Contracts for Vulnerability Analysis Darwish, Malek January 2024 (has links) This thesis investigated the use of Large Language Models (LLMs) for vulnerability analysis of decompiled smart contracts. A controlled experiment was conducted in which an automated system was developed to decompile smart contracts using two decompilers: Dedaub and Heimdall-rs, and subsequently analyze them using three LLMs: OpenAI’s GPT-4 and GPT-3.5, as well as Meta’s CodeLlama. The study focuses on assessing the effectiveness of the LLMs at identifying a range of vulnerabilities. The evaluation method included the collection and comparative analysis of performance and evaluative metrics such as the precision, recall and F1-scores. Our results show the LLM-decompiler pairing of Dedaub and GPT-4 to exhibit impressive detection capabilities across a range of vulnerabilities while failing to detect some vulnerabilities at which CodeLlama excelled. We demonstrated the potential of LLMs to improve smart contract security and sets the stage for future research to further expand on this domain. Smart Contracts Decompiling Large Language Models (LLM) Vulnerability Detection Solidity Software Engineering Programvaruteknik

Search results