Global ETD Search

21	Self-Reflection on Chain-of-Thought Reasoning in Large Language Models / Självreflektion över Chain-of-Thought-resonerande i stora språkmodeller Praas, Robert January 2023 (has links) A strong capability of large language models is Chain-of-Thought reasoning. Prompting a model to ‘think step-by-step’ has led to great performance improvements in solving problems such as planning and question answering, and with the extended output it provides some evidence about the rationale behind an answer or decision. In search of better, more robust, and interpretable language model behavior, this work investigates self-reflection in large language models. Here, self-reflection consists of feedback from large language models to medical question-answering and whether the feedback can be used to accurately distinguish between correct and incorrect answers. GPT-3.5-Turbo and GPT-4 provide zero-shot feedback scores to Chain-of-Thought reasoning on the MedQA (medical questionanswering) dataset. The question-answering is evaluated on traits such as being structured, relevant and consistent. We test whether the feedback scores are different for questions that were either correctly or incorrectly answered by Chain-of-Thought reasoning. The potential differences in feedback scores are statistically tested with the Mann-Whitney U test. Graphical visualization and logistic regressions are performed to preliminarily determine whether the feedback scores are indicative to whether the Chain-of-Thought reasoning leads to the right answer. The results indicate that among the reasoning objectives, the feedback models assign higher feedback scores to questions that were answered correctly than those that were answered incorrectly. Graphical visualization shows potential for reviewing questions with low feedback scores, although logistic regressions that aimed to predict whether or not questions were answered correctly mostly defaulted to the majority class. Nonetheless, there seems to be a possibility for more robust output from self-reflecting language systems. / En stark förmåga hos stora språkmodeller är Chain-of-Thought-resonerande. Att prompta en modell att tänka stegvis har lett till stora prestandaförbättringar vid lösandet av problem som planering och frågebesvarande, och med den utökade outputen ger det en del bevis rörande logiken bakom ett svar eller beslut. I sökandet efter bättre, mer robust och tolk bart beteende hos språkmodeller undersöker detta arbete självreflektion i stora språkmodeller. Forskningsfrågan är: I vilken utsträckning kan feedback från stora språkmodeller, såsom GPT-3.5-Turbo och GPT-4, på ett korrekt sätt skilja mellan korrekta och inkorrekta svar i medicinska frågebesvarande uppgifter genom användningen av Chainof-Thought-resonerande? Här ger GPT-3.5-Turbo och GPT-4 zero-shot feedback-poäng till Chain-ofThought-resonerande på datasetet för MedQA (medicinskt frågebesvarande). Frågebesvarandet bör vara strukturerat, relevant och konsekvent. Feedbackpoängen jämförs mellan två grupper av frågor, baserat på om dessa besvarades korrekt eller felaktigt i första hand. Statistisk testning genomförs på skillnaden i feedback-poäng med Mann-Whitney U-testet. Grafisk visualisering och logistiska regressioner utförs för att preliminärt avgöra om feedbackpoängen är indikativa för huruvida Chainof-Thought-resonerande leder till rätt svar. Resultaten indikerar att bland resonemangsmålen tilldelar feedbackmodellerna fler positiva feedbackpoäng till frågor som besvarats korrekt än de som besvarats felaktigt. Grafisk visualisering visar potential för granskandet av frågor med låga feedbackpoäng, även om logistiska regressioner som syftade till att förutsäga om frågorna besvarades korrekt eller inte för det mesta majoritetsklassen. Icke desto mindre verkar det finnas potential för robustare från självreflekterande språksystem. Large language models Chain-of-Thought reasoning Metareasoning Question answering Selfcorrection Ethical AI Stora språkmodeller Chain-of-Thought-resonemang Metareasoning Frågesvar Självkorrigering Etisk AI Computer and Information Sciences Data- och informationsvetenskap
22	On Semantic Cognition, Inductive Generalization, and Language Models Kanishka Misra (9708551) 05 September 2023 (has links) <p dir="ltr">Our ability to understand language and perform reasoning crucially relies on a robust system of semantic cognition (G. L. Murphy, 2002; Rogers & McClelland, 2004; Rips et al., 2012; Lake & Murphy, 2021): processes that allow us to learn, update, and produce inferences about everyday concepts (e.g., cat, chair), properties (e.g., has fur, can be sat on), categories (e.g., mammals, furniture), and relations (e.g., is-a, taller-than). Meanwhile, recent progress in the field of natural language processing (NLP) has led to the development of language models (LMs): sophisticated neural networks that are trained to predict words in context (Devlin et al., 2019; Radford et al., 2019; Brown et al., 2020), and as a result build representations that encode the knowledge present in the statistics of their training environment. These models have achieved impressive levels of performance on a range of tasks that require sophisticated semantic knowledge (e.g. question answering and natural language inference), often even reaching human parity. To what extent do LMs capture the nuances of human conceptual knowledge and reasoning? Centering around this broad question, this dissertation uses core ideas in human semantic cognition as guiding principles and lays down the groundwork to establish effective evaluation and improvement of conceptual understanding in LMs. In particular, I build on prior work that focuses on characterizing what semantic knowledge is made available in the behavior and representations of LMs, and extend it by additionally proposing tests that focus on functional consequences of acquiring basic semantic knowledge.<br><br>I primarily focus on inductive generalization (Hayes & Heit, 2018)—the unique ability of humans to rely on acquired conceptual knowledge to project or generalize novel information—as a context within which we can analyze LMs’ encoding of conceptual knowledge. I do this, since the literature surrounding inductive generalization contains a variety of empirical regularities that map to specific conceptual abstractions and shed light on how humans store, organize and use conceptual knowledge. Before explicitly analyzing LMs for these empirical regularities, I test them on two other contexts, which also feature the role of inductive generalization. First I test the extent to which LMs demonstrate typicality effects—a robust finding in human categorization literature where certain members of a category are considered to be more central to the category than are others. Specifically, I test the behavior 19 different LMs on two contexts where typicality effects modulate human behavior: 1) verification of sentences expressing taxonomic category membership, and 2) projecting novel properties from individual category members to the entire category. In both tests, LMs achieved positive but modest correlations with human typicality ratings, suggesting that they can to a non-trivial extent capture subtle differences between category members. Next, I propose a new benchmark to test the robustness of LMs in attributing properties to everyday concepts, and in making inductive leaps to endow properties to novel concepts. On testing 31 different LMs for these capacities, I find that while they can correctly attribute properties to everyday concepts and even predict the properties of novel concepts in simple settings, they struggle to do so robustly. Combined with the analyses of typicality effects, these results suggest that the ability of LMs to demonstrate impressive conceptual knowledge and reasoning behavior can be explained by their sensitivities to shallow predictive cues. When these cues are carefully controlled for, LMs show critical failures in demonstrating robust conceptual understanding. Finally, I develop a framework that can allow us to characterize the extent to which the distributed representations learned by LMs can encode principles and abstractions that characterize inductive behavior of humans. This framework operationalizes inductive generalization as the behavior of an LM after its representations have been partially exposed (via gradient-based learning) to novel conceptual information. To simulate this behavior, the framework uses LMs that are endowed with human-elicited property knowledge, by training them to evaluate the truth of sentences attributing properties to concepts. I apply this framework to test four different LMs on 13 different inductive phenomena documented for humans (Osherson et al., 1990; Heit & Rubinstein, 1994). Results from these analyses suggest that building representations from word distributions can successfully allow the encoding of many abstract principles that can guide inductive behavior in the models—principles such as sensitivity to conceptual similarity, hierarchical organization of categories, reasoning about category coverage, and sample size. At the same time, the tested models also systematically failed at demonstrating certain phenomena, showcasing their inability to demonstrate pragmatic reasoning, preference to rely on shallow statistical cues, and lack of context sensitivity with respect to high-level intuitive theories.</p> Natural language processing Computational linguistics Cognition Language Models Artificial Intelligence Large Language Models Concepts and Categories Inductive Reasoning Machine Learning Natural Language Processing
23	An Empirical Study on Using Codex for Automated Program Repair Zhao, Pengyu January 2023 (has links) This thesis explores the potential of Codex, a pre-trained Large Language Model (LLM), for Automated Program Repair (APR) by assessing its performance on the Defects4J benchmark that includes real-world Java bugs. The study aims to provide a comprehensive understanding of Codex’s capabilities and limitations in generating syntactically and semantically equivalent patches for defects, as well as evaluating its ability to handle defects with different levels of importance and complexity. Additionally, we aim to compare the performance of Codex with other LLMs in the APR domain. To achieve these objectives, we employ a systematic methodology that includes prompt engineering, Codex parameter adjustment, code extraction, patch verification, and Abstract Syntax Tree (AST) comparison. We successfully verified 528 bugs in Defects4J, which represents the highest number among other studies, and achieved 53.98% of plausible and 26.52% correct patches. Furthermore, we introduce the elle-elle-aime framework, which extends the RepairThemAll for Codex-based APR and is adaptable for evaluating other LLMs, such as ChatGPT and GPT-4. The findings of this empirical study provide valuable insights into the factors that impact Codex’s performance on APR, helping to create new prompt strategies and techniques that improve research productivity. / Denna avhandling utforskar potentialen hos Codex, en förtränad LLM, för APR genom att utvärdera dess prestanda på Defects4J-benchmarket som inkluderar verkliga Java-buggar. Studien syftar till att ge en omfattande förståelse för Codex förmågor och begränsningar när det gäller att generera syntaktiskt och semantiskt ekvivalenta patchar för defekter samt att utvärdera dess förmåga att hantera defekter med olika nivåer av betydelse och komplexitet. Dessutom är vårt mål att jämföra prestanda hos Codex med andra LLM inom APR-området. För att uppnå dessa mål använder vi en systematisk metodik som inkluderar prompt engineering, justering av Codex-parametrar, kodextraktion, patchverifiering och jämförelse av AST. Vi verifierade framgångsrikt 528 buggar i Defects4J, vilket representerar det högsta antalet bland andra studier, och uppnådde 53,98% plausibla och 26,52% korrekta patchar. Vidare introducerar vi elle-elle-aime ramverket, som utvidgar RepairThemAll för Codex-baserad APR och är anpassningsbart för att utvärdera andra LLM, såsom ChatGPT och GPT-4. Resultaten av denna empiriska studie ger värdefulla insikter i de faktorer som påverkar Codex prestanda på APR och hjälper till att skapa nya promptstrategier och tekniker som förbättrar forskningsproduktiviteten. Automated Program Repair Codex Large Language Models Defects4J Patch Generation Prompt Engineering Automatiserad Programreparation Codex Storskaliga Språkmodeller Defects4J Patchgenerering Promptteknik Computer and Information Sciences Data- och informationsvetenskap
24	Prompt-learning and Zero-shot Text Classification with Domain-specific Textual Data Luo, Hengyu January 2023 (has links) The rapid growth of textual data in the digital age presents unique challenges in domain-specific text classification, particularly the scarcity of labeled data for many applications, due to expensive cost of manual labeling work. In this thesis, we explore the applicability of prompt-learning method, which is well-known for being suitable in few-shot scenarios and much less data-consuming, as an emerging alternative to traditional fine-tuning methods, for domain-specific text classification in the context of customer-agent interactions in the retail sector. Specifically, we implemented the entire prompt-learning pipeline for the classification task, and, our investigation encompasses various strategies of prompt-learning, including fixed-prompt language model tuning strategy and tuning-free prompting strategy, along with an examination of language model selection, few-shot sampling strategy, prompt template design, and verbalizer design. In this manner, we assessed the overall performance of the prompt-learning method in the classification task. Through a systematic evaluation, we demonstrate that with the fixed-prompt language model tuning strategy, based on relatively smaller language models (e.g. T5-base with around 220M parameters), prompt-learning can achieve competitive performance (close to 75% accuracy) even with limited labeled data (up to merely 15% of full data). And besides, with the tuning-free prompting strategy, based on a regular-size language model (e.g. FLAN-T5-large with around 770M parameters), the performance can be up to around 30% accuracy with detailed prompt templates and zero-shot setting (no extra training data involved). These results can offer valuable insights for researchers and practitioners working with domain-specific textual data, prompt-learning and few-shot / zero-shot learning. The findings of this thesis highlight the potential of prompt-learning as a practical solution for classification problems across diverse domains and set the stage for future research in this area. prompt-learning zero-shot few-shot text classification domain-specific retail sector domain customer-agent interaction transformer large language models ChatGPT natural language processing machine learning deep learning
25	Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data Fosteri, Iliana January 2023 (has links) Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance. dependency parsing part-of-speech tagging low-resource languages transcribed speech large language models cross-lingual learning transfer learning multi-task learning Universal Dependencies
26	KARTAL: Web Application Vulnerability Hunting Using Large Language Models : Novel method for detecting logical vulnerabilities in web applications with finetuned Large Language Models / KARTAL: Jakt på sårbarheter i webbapplikationer med hjälp av stora språkmodeller : Ny metod för att upptäcka logiska sårbarheter i webbapplikationer med hjälp av finjusterade stora språkmodeller Sakaoglu, Sinan January 2023 (has links) Broken Access Control is the most serious web application security risk as published by Open Worldwide Application Security Project (OWASP). This category has highly complex vulnerabilities such as Broken Object Level Authorization (BOLA) and Exposure of Sensitive Information. Finding such critical vulnerabilities in large software systems requires intelligent and automated tools. State-of-the-art (SOTA) research including hybrid application security testing tools, algorithmic brute forcers, and artificial intelligence has shown great promise in detection. Nevertheless, there exists a gap in research for reliably identifying logical and context-dependant Broken Access Control vulnerabilities. We modeled the problem as text classification and proposed KARTAL, a novel method for web application vulnerability detection using a Large Language Model (LLM). It consists of 3 components: Fuzzer, Prompter, and Detector. The Fuzzer is responsible for methodically collecting application behavior. The Prompter processes the data from the Fuzzer and formulates a prompt. Finally, the Detector uses an LLM which we have finetuned for detecting vulnerabilities. In the study, we investigate the performance, key factors, and limitations of the proposed method. Our research reveals the need for a labeled Broken Access Control vulnerability dataset in the cybersecurity field. Thus, we custom-generate our own dataset using an auto-regressive LLM with SOTA few-shot prompting techniques. We experiment with finetuning 3 types of decoder-only pre-trained transformers for detecting 2 sophisticated vulnerabilities. Our best model attained an accuracy of 87.19%, with an F1 score of 0.82. By using hardware acceleration on a consumer-grade laptop, our fastest model can make up to 539 predictions per second. The experiments on varying the training sample size demonstrated the great learning capabilities of our model. Every 400 samples added to training resulted in an average MCC score improvement of 19.58%. Furthermore, the dynamic properties of KARTAL enable inferencetime adaption to the application domain, resulting in reduced false positives. / Brutet åtkomstkontroll är den allvarligaste säkerhetsrisken för webbapplikationer enligt Open Worldwide Application Security Project (OWASP). Denna kategori har mycket komplexa sårbarheter såsom Brutet behörighetskontroll på objektnivå (BOLA) och exponering av känslig information. Att hitta sådana kritiska sårbarheter i stora programvarusystem kräver intelligenta och automatiserade verktyg. Senaste tekniken (SOTA)-forskning, inklusive hybridverktyg för säkerhetstestning av applikationer, algoritmiska bruteforcers och artificiell intelligens, har visat stor potential för upptäckt. Trots detta finns det en lucka i forskningen när det gäller tillförlitlig identifiering av logiska och kontextberoende sårbarheter relaterade till Brutet åtkomstkontroll. Vi modellerade problemet som textklassificering och föreslog KARTAL, en ny metod för att upptäcka sårbarheter i webbapplikationer med hjälp av en stor språkmodell (LLM). Den består av 3 komponenter: Fuzzer, Prompter och Detector. Fuzzer ansvarar för att systematiskt samla in applikationsbeteende. Prompter bearbetar data från Fuzzer och formulerar en förfrågan. Slutligen använder Detector en LLM som vi har finjusterat för att upptäcka sårbarheter. I studien undersöker vi prestanda, nyckelfaktorer och begränsningar hos den föreslagna metoden. Vår forskning visar behovet av en märkt dataset för sårbarheter relaterade till Brutet åtkomstkontroll inom cybersäkerhetsområdet. Därför genererar vi anpassade dataset med hjälp av en auto-regressiv LLM med SOTA few-shot-prompting-tekniker. Vi experimenterar med att finjustera 3 typer av endast avkodare transformers som är förtränade för att upptäcka 2 sofistikerade sårbarheter. Vår bästa modell uppnådde en noggrannhet på 87.19% med en F1-poäng på 0.82. Genom att använda hårdvaruacceleration på en bärbar dator för konsumenter kan vår snabbaste modell göra upp till 539 förutsägelser per sekund. Experimenten med varierande storlek på träningsprovet visade på vår modells stora förmåga att lära sig. Varje 400 prover som lades till träningen resulterade i en genomsnittlig förbättring av MCC-poängen med 19.58%. Dessutom möjliggör de dynamiska egenskaperna hos KARTAL anpassning vid inferringstid till applikationsdomänen, vilket resulterar i färre falska positiva resultat. Broken Access Control Vulnerability Large Language Models Web Application API Detection Scanner DAST Application Security Brutet åtkomstkontroll Sårbarhet Stora språkmodeller Webbapplikation API Upptäckt Skanner DAST Applikationssäkerhet Computer and Information Sciences Data- och informationsvetenskap
27	Towards Automatic Generation of Personality-Adapted Speech and Emotions for a Conversational Companion Robot / Mot Automatisk Generering av Personlighets Anpassade Tal och Känslor för en Samtalskunnig Sällskaps Robot Galatolo, Alessio January 2022 (has links) Previous works in Human-Robot Interaction have demonstrated the positive potential benefit of designing highly anthropomorphic robots. This includes physical appearance but also whether they can express emotions, behave in a congruent manner, etc. This work wants to explore the creation of a robot that is able to express a given personality consistently throughout a dialogue while also manifesting congruent emotional expressions. Personality defines many aspects of the character of a person and it can influence how one speaks, behaves, reacts to events, etc. Here, we only focus our attention on language and on how it changes depending on one particular personality trait, the extraversion. To this end, we tested different language models to automate the process of generating language according to a particular personality. We also compared large language models such as GPT-3 to smaller ones, to analyse how size can correlate to performance in this task. We initially evaluated these methods through a fairly small user study in order to confirm the correct manipulation of personality in a text-only context. Results suggest that personality manipulation and how well it is understood highly depend on the context of a dialogue, with a more ‘personal’ dialogue being more successful in manifesting personality. Also, the performance of GPT-3 is comparable to smaller models, specifically trained, with the main difference only given in the perceived fluency of the generations. We then conducted a follow-up study where we chose to use a robot that is capable of showing different facial expressions used to manifest different emotions, the Furhat robot. We integrated into the robot the generations from our language models together with an emotion classification method that is used to guide its facial expressions. Whilst the output of our models did trigger different emotional expressions, resulting in robots which differed both in their language and nonverbal behaviour, resultant perception of these robots’ personality only approached significance (p ∼ 0.08). In this study, GPT3 performed very similarly to much smaller models, with the difference in fluency also being much smaller than before. We did not see any particular change in the perception of the robots in terms of likeability nor uncanniness. / Tidigare arbeten inom Människa-robotinteraktion har visat den positiva potentiella fördelen med att designa mycket antropomorfa robotar. Detta inkluderar fysiskt utseende men också huruvida de kan uttrycka känslor, bete sig på ett kongruent sätt, etc. Detta arbete vill utforska skapandet av en robot som kan uttrycka en given personlighet konsekvent under en dialog samtidigt som den manifesterar kongruenta känslomässiga uttryck. Personlighet definierar många aspekter av en persons karaktär och den kan påverka hur man talar, beter sig, reagerar på händelser etc. Här fokuserar vi vår uppmärksamhet endast på språket och på hur det förändras beroende på ett särskilt personlighetsdrag, extraversion. För detta ändamål testade vi olika språkmodeller för att automatisera processen att skapa språk enligt en viss personlighet. Vi jämförde även stora språkmodeller som GPT-3 med mindre, för att analysera hur storlek kan relatera till prestanda i denna uppgift. Vi utvärderade inledningsvis dessa metoder genom en mindre användarstudie för att bekräfta att personligheten kan manipuleras på rätt sätt i en textbaserad kontext. Resultaten tyder på att personlighetsmanipulation och hur väl den förstås i hög grad beror på sammanhanget i en dialog, där en mer ‘personlig’ dialog är mer framgångsrik när det gäller att manifestera personlighet. Prestandan hos GPT-3 är också jämförbar med mindre modeller, specifikt tränade på en uppgift, där den största skillnaden var i den genererade textens upplevda flyt. Vi gjorde sedan en uppföljningsstudie där vi valde att använda en robot som är kapabel att visa olika ansiktsuttryck och därigenom kapabel att manifestera olika känslor, Furhat-roboten. Vi integrerade talet som genererades från våra språkmodeller i roboten tillsammans med en känsloklassificeringsmetod som används för att styra dess ansiktsuttryck. Medan resultatet av våra modeller framkallade olika känslomässiga uttryck, vilket resulterade i robotar som skilde sig åt både i språk och icke-verbal kommunikation, närmade sig endast den resulterande uppfattningen av dessa robotars personlighet signifikans (p ∼ 0.08). I denna studie presterade GPT-3 mycket likartat med mycket mindre modeller, med skillnaden i flyt också mycket mindre än tidigare. Vi såg ingen speciell förändring i uppfattningen av robotarna när det gäller sympati eller obehaglighet. Personality Emotions Human-Robot Interaction Machine Learning Large Language Models Text-style transfer GPT-3 STRAP Personlighet Känslor Människa-robotinteraktion Maskininlärning Stora Språkmodeller Överföring av text GPT-3 STRAP Computer and Information Sciences Data- och informationsvetenskap
28	Round-Trip Translation : A New Path for Automatic Program Repair using Large Language Models / Tur och retur-översättning : En ny väg för automatisk programreparation med stora språkmodeller Vallecillos Ruiz, Fernando January 2023 (has links) Research shows that grammatical mistakes in a sentence can be corrected by machine translating it to another language and back. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR), a software engineering task. Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing fine-tuning and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. / Forskning visar att grammatiska fel i en mening kan korrigeras genom att maskinöversätta den till ett annat språk och tillbaka. Vi undersöker om denna korrigeringsegenskap hos stora språkmodeller (LLMs) även gäller för Automatisk Programreparation (APR), en uppgift inom mjukvaruteknik. Nuvarande generativa modeller för APR är förtränade på källkod och finjusterade för reparation. Denna artikel föreslår att man undviker finjustering och använder Tur och retur-översättning (RTT): översättning av kod från ett programmeringsspråk till ett annat programmerings- eller naturspråk, och tillbaka. Vi antar att RTT med LLMs utför en regression mot medelvärdet, vilket tar bort buggar eftersom de är en form av brus med avseende på den mer frekventa, naturliga, buggfria koden i träningsdatan. För att testa denna hypotes använder vi åtta nyligen förtränade LLMs på kod, inklusive de senaste GPT-versionerna, och fyra vanliga programreparationsstandarder i Java. Vi upptäcker att RTT med engelska som ett mellanspråk reparerade 101 av 164 buggar med GPT-4 på HumanEval-Java-datasetet. Dessutom är 46 av dessa unika buggar som inte repareras av andra LLMs finjusterade för APR. Våra resultat belyser genomförbarheten av tur och retur-översättning med LLMs som en teknik för automatiserad programreparation och dess potential för forskning inom mjukvaruteknik. Automatic Program Repair Software Engineering Large Language Models Round-Trip Translation Neural Machine Translation Automatisk programreparation Mjukvaruutveckling Stora språkmodeller Tur och retur-översättning Neural maskinöversättning Computer and Information Sciences Data- och informationsvetenskap
29	WEAKLY SUPERVISED CHARACTERIZATION OF DISCOURSES ON SOCIAL AND POLITICAL MOVEMENTS ON ONLINE MEDIA Shamik Roy (16317636) 14 June 2023 (has links) <p>Nowadays an increasing number of people consume, share, and interact with information online. This results in posting and counter-posting on online media by different ideological groups on various polarized topics. Consequently, online media has become the primary platform for political and social influencers to directly interact with the citizens and share their perspectives, views, and stances with the goal of gaining support for their actions, bills, and legislation. Hence, understanding the perspectives and the influencing strategies in online media texts is important for an individual to avoid misinformation and improve trust between the general people and the influencers and the authoritative figures such as the government.</p> <p><br></p> <p>Automatically understanding the perspectives in online media is difficult because of two major challenges. Firstly, the proper grammar or mechanism to characterize the perspectives is not available. Recent studies in Natural Language Processing (NLP) have leveraged resources from social science to explain perspectives. For example, Policy Framing and Moral Foundation Theory are used for understanding how issues are framed and the moral appeal expressed in texts to gain support. However, these theories often fail to capture the nuances in perspectives and cannot generalize over all topics and events. Our research in this dissertation is one of the first studies that adapt social science theories in Natural Language Processing for understanding perspectives to the extent that they can capture differences in ideologies or stances. The second key challenge in understanding perspectives in online media texts is that annotated data is difficult to obtain to build automatic methods to detect the perspectives, that can generalize over the large corpus of online media text on different topics. To tackle this problem, in this dissertation, we used weak sources of supervision such as social network interaction of users who produce and interact with the messages, weak human interaction, or artificial few-shot data using Large Language Models. </p> <p><br></p> <p>Our insight is that various tasks such as perspectives, stances, sentiments toward entities, etc. are interdependent when characterizing online media messages. As a result, we proposed approaches that jointly model various interdependent problems such as perspectives, stances, sentiments toward entities, etc., and perform structured prediction to solve them jointly. Our research findings showed that the messaging choices and perspectives on online media in response to various real-life events and their prominence and contrast in different ideological camps can be efficiently captured using our developed methods.</p> Natural language processing Perspective Analysis Moral Foundation Theory Policy Framing Discourse Analysis Social Media Text Analysis News Media Text Analysis Structured Prediction Graph Neural Network Contrastive Learning Weak Supervision Linguistic Homophily Social Network Analysis Relational Learning Large Language Models
30	An initial investigation of Automatic Program Repair for Solidity Smart Contracts with Large Language Models / En första undersökning av automatisk lagning av solidity smarta kontrakt med stora språkmodeller Cruz, Erik January 2023 (has links) This thesis investigates how Large Language Models can be used to repair Solidity Smart Contracts automatically through the main contribution of this thesis, the Transformative Repair Tool. The Transformative Repair Tool achieves similar results to current state-of-the-art tools on the Smartbugs Curated Dataset and is the first published tool that uses Large Language Models to repair Solidity Smart Contracts. Moreover, the thesis explores different prompt strategies to repair Smart Contracts and assess their performance. / Detta masterexamensarbete undersöker hur stora språkmodeller kan användas för att automatisk laga solidity smarta kontrakt genom verktyget Transformative Repair Tool, som är detta masterexamensarbete huvudsakliga bidrag. Transformative Repair Tool presterar liknande som dagens bästa verktyg inom automatisk lagning av smarta kontrakt på Smartbugs Curated datasettet och är det första publicerade verktyget som just använder stora språkmodeller för att reparera solidity smarta kontrakt. Dessutom så utforskar denna rapport olika textprompts och dess prestanda för att laga smarta kontrakt Automatic Program Repair APR Large Language Models LLM Smart Contracts Smart Contract Audit Chat GPT Cybersecurity Automatisk Lagning av Kod Stora språkmodeller Smarta Kontrakt Granskning av Smarta Kontrakt Chat GPT Cybersäkerhet Computer and Information Sciences Data- och informationsvetenskap

Search results