Global ETD Search

11	Exploring the Usage of Neural Networks for Repairing Static Analysis Warnings / Utforsking av användningen av neurala nätverk för att reparera varningar för statisk analys Lohse, Vincent Paul January 2021 (has links) C# provides static analysis libraries for template-based code analysis and code fixing. These libraries have been used by the open-source community to generate numerous NuGet packages for different use-cases. However, due to the unstructured vastness of these packages, it is difficult to find the ones required for a project and creating new analyzers and fixers take time and effort to create. Therefore, this thesis proposes a neural network, which firstly imitates existing fixers and secondly extrapolates to fixes of unseen diagnostics. To do so, the state-of-the-art of static analysis NuGet packages is examined and further used to generate a dataset with diagnostics and corresponding code fixes for 24,622 data points. Since many C# fixers apply formatting changes, all formatting is preserved in the dataset. Furthermore, since the fixers also apply identifier changes, the tokenization of the dataset is varied between splitting identifiers by camelcase and preserving them. The neural network uses a sequence-to-sequence learning approach with the Transformer model and takes file context, diagnostic message and location as input and predicts a diff as output. It is capable of imitating 46.3% of the fixes, normalized by diagnostic type, and for data points with unseen diagnostics, it is able to extrapolate to 11.9% of normalized data points. For both experiments, splitting identifiers by camelcase produces the best results. Lastly, it is found that a higher proportion of formatting tokens in input has minimal positive impact on prediction success rates, whereas the proportion of formatting in output has no impact on success rates. / C# tillhandahåller statiska analysbibliotek för mallbaserad kodanalys och kodfixering. Dessa bibliotek har använts av open source-gemenskapen för att generera många NuGet-paket för olika användningsfall. Men på grund av mängden av dessa paket är det svårt att hitta de som krävs för ett projekt och att skapa nya analysatorer och fixare tar tid och ansträngning att skapa. Därför föreslår denna avhandling ett neuralt nätverk, som för det första imiterar befintliga korrigeringar och för det andra extrapolerar till korrigeringar av osynlig diagnostik. För att göra det har det senaste inom statisk analys NuGetpaketen undersökts och vidare använts för att generera en datauppsättning med diagnostik och motsvarande kodfixar för 24 622 datapunkter. Eftersom många C# fixers tillämpar formateringsändringar, bevaras all formatering i datasetet. Dessutom, eftersom fixarna också tillämpar identifieringsändringar, varieras tokeniseringen av datamängden mellan att dela upp identifierare efter camelcase och att bevara dem. Det neurala nätverket använder en sekvenstill- sekvens-inlärningsmetod med Transformer-modellen och tar filkontext, diagnostiskt meddelande och plats som indata och förutsäger en skillnad som utdata. Den kan imitera 46,3% av korrigeringarna, normaliserade efter diagnostisk typ, och för datapunkter med osynlig diagnostik kan den extrapolera till 11,9% av normaliserade datapunkter. För båda experimenten ger uppdelning av identifierare efter camelcase de bästa resultaten. Slutligen har det visat sig att en högre andel formateringstokens i indata har minimal positiv inverkan på åndelen korrekta förutsägelser, medan andelen formatering i utdata inte har någon inverkan på åndelen korrekta förutsägelser. Automatic Program Repair Neural Machine Translation Static Analysis Transformer Model Formatting Automatisk programreparation neural maskinöversättning statisk analys transformatormodell formatering Software Engineering Programvaruteknik
12	An Initial Investigation of Neural Decompilation for WebAssembly / En Första Undersökning av Neural Dekompilering för WebAssembly Benali, Adam January 2022 (has links) WebAssembly is a new standard of the World Wide Web that is used as a compilation target and which is meant to enable high-performance applications. As it becomes more popular, the need for corresponding decompilers increases, for security reasons for instance. However, building an accurate decompiler capable of restoring the original source code is a complicated task. Recently, Neural Machine Translation (NMT) has been proposed as an alternative to traditional decompilers which involve a lot of manual and laborious work. We investigate the viability of Neural Machine Translation for decompiling WebAssembly binaries to C source code. The state-of-the-art transformer and LSTM sequence-to-sequence (Seq2Seq) models with attention are experimented with. We build a custom randomly-generated dataset ofWebAssembly to C pairs of source code and use different metrics to quantitatively evaluate the performance of the models. Our implementation consists of several processing steps that have the WebAssembly input and the C output as endpoints. The results show that the transformer outperforms the LSTM based neural model. Besides, while the model restores the syntax and control-flow structure with up to 95% of accuracy, it is incapable of recovering the data-flow. The different benchmarks on which we run our evaluation indicate a drop of decompilation accuracy as the cyclomatic complexity and the nesting of the programs increase. Nevertheless, our approach has a lot of potential, encouraging its usage in future works. / WebAssembly est un nouveau standard du World Wide Web utilisé comme cible de compilation et qui est principalement destiné à exécuter des applications dans un navigateur Web avec des performances supérieures. À mesure que le langage devient populaire, le besoin en rétro-ingénierie des fichiers WebAssembly binaires se ressent. Toutefois, la construction d’un bon décompilateur capable de restaurer du code source plus aussi proche que possible de l’original est une tâche compliquée. Récemment, la traduction automatique neuronale a été proposée comme alternative aux décompilateurs traditionnels qui impliquent du travail fastidieux, coûteux et difficilement adaptable à d’autres langages. Nous investiguons les chances de succès de la traduction automatique neuronale pour décompiler des fichiers binaires WebAssembly en code source C. Les modèles du transformeur et du LSTM séquence-à-séquence (Seq2Seq) sont utilisés. Nous construisons un jeu de données généré de manière aléatoire constitué de paires de code source WebAssembly et C et nous utilisons différentes métriques pour évaluer quantitativement les performances des deux modèles. Notre implémentation consiste en plusieurs phases de traitement qui reçoivent en entrée le code WebAssembly et produisent en sortie le code source C. Les résultats montrent que le transformeur est plus performant que le modèle basé sur les LSTMs. De plus, bien que le modèle puisse restaurer la syntaxe ainsi que la structure de contrôle du programme avec jusqu’à 95% de précision, il est incapable de produire un flux de données équivalent. Les différents jeux de données produits indiquent une chute de performance à mesure que la complexité cyclomatique ainsi que le niveau d’imbrication augmentent. Nous estimons, toutefois, que cette approche possède du potentiel. / WebAssembluy är en ny standard för World Wide Web som används som ett kompileringsmål och som är tänkt att möjliggöra högpresterande applikationer i webbläsaren. När det blir mer populärt ökar behovet av motsvarande dekompilatorer. Att bygga en exakt dekompilator som kan återställa den ursprungliga källkoden är dock en komplicerad uppgift. Nyligen har Neural Maskinöversättning (NMT) föreslagits som ett alternativ till traditionella dekompilatorer som innebär mycket manuellt och mödosamt arbete. Vi undersöker genomförbarheten hos Neural Maskinöversättning för dekompilering av WebAssembly -binärer till C -källkod. De toppmoderna transformer och LSTM sequence-to-sequence (Seq2Seq) modellerna med attention experimenteras med. Vi bygger en anpassad slumpmässigt genererad dataset för WebAssembly till C-källkodspar och använder olika mätvärden för att kvantitativt utvärdera modellernas prestanda. Vår implementering består av flera bearbetningssteg som har WebAssembly -ingången och C -utgången som slutpunkter. Resultaten visar att transformer överträffar den LSTM -baserade neuralmodellen. Även om modellen återställer syntaxen och kontrollflödesstrukturen med upp till 95 % noggrannhet, är den oförmögen att återställa dataflödet. De olika benchmarks som vi använder vår utvärdering på indikerar en minskning av dekompilationsnoggrannheten när den cyklomatiska komplexiteten och häckningen av programmen ökar. Vi tror dock att detta tillvägagångssätt har stor potential. Decompilation WebAssembly Reverse Engineering Neural Machine Translation Décompilation WebAssembly Rétro-Ingénierie Traduction Automatique Neuronale Dekompilering WebAssembly Reverse Engineering Neural Maskinöversättning Computer Sciences Datavetenskap (datalogi)
13	Translation of keywords between English and Swedish / Översättning av nyckelord mellan engelska och svenska Ahmady, Tobias, Klein Rosmar, Sander January 2014 (has links) In this project, we have investigated how to perform rule-based machine translation of sets of keywords between two languages. The goal was to translate an input set, which contains one or more keywords in a source language, to a corresponding set of keywords, with the same number of elements, in the target language. However, some words in the source language may have several senses and may be translated to several, or no, words in the target language. If ambiguous translations occur, the best translation of the keyword should be chosen with respect to the context. In traditional machine translation, a word's context is determined by a phrase or sentences where the word occurs. In this project, the set of keywords represents the context. By investigating traditional approaches to machine translation (MT), we designed and described models for the specific purpose of keyword- translation. We have proposed a solution, based on direct translation for translating keywords between English and Swedish. In the proposed solu- tion, we also introduced a simple graph-based model for solving ambigu- ous translations. / I detta projekt har vi undersökt hur man utför regelbaserad maskinöver- sättning av nyckelord mellan två språk. Målet var att översätta en given mängd med ett eller flera nyckelord på ett källspråk till en motsvarande, lika stor mängd nyckelord på målspråket. Vissa ord i källspråket kan dock ha flera betydelser och kan översättas till flera, eller inga, ord på målsprå- ket. Om tvetydiga översättningar uppstår ska nyckelordets bästa över- sättning väljas med hänsyn till sammanhanget. I traditionell maskinö- versättning bestäms ett ords sammanhang av frasen eller meningen som det befinner sig i. I det här projektet representerar den givna mängden nyckelord sammanhanget. Genom att undersöka traditionella tillvägagångssätt för maskinöversätt- ning har vi designat och beskrivit modeller specifikt för översättning av nyckelord. Vi har presenterat en direkt maskinöversättningslösning av nyckelord mellan engelska och svenska där vi introducerat en enkel graf- baserad modell för tvetydiga översättningar. machine translation MT rule-based machine translation RBMT word sense disambiguation WSD translation disambiguation translation keyword translation maskinöversättning översättning tvetydiga översättningar disambiguering
14	Round-Trip Translation : A New Path for Automatic Program Repair using Large Language Models / Tur och retur-översättning : En ny väg för automatisk programreparation med stora språkmodeller Vallecillos Ruiz, Fernando January 2023 (has links) Research shows that grammatical mistakes in a sentence can be corrected by machine translating it to another language and back. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR), a software engineering task. Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing fine-tuning and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. / Forskning visar att grammatiska fel i en mening kan korrigeras genom att maskinöversätta den till ett annat språk och tillbaka. Vi undersöker om denna korrigeringsegenskap hos stora språkmodeller (LLMs) även gäller för Automatisk Programreparation (APR), en uppgift inom mjukvaruteknik. Nuvarande generativa modeller för APR är förtränade på källkod och finjusterade för reparation. Denna artikel föreslår att man undviker finjustering och använder Tur och retur-översättning (RTT): översättning av kod från ett programmeringsspråk till ett annat programmerings- eller naturspråk, och tillbaka. Vi antar att RTT med LLMs utför en regression mot medelvärdet, vilket tar bort buggar eftersom de är en form av brus med avseende på den mer frekventa, naturliga, buggfria koden i träningsdatan. För att testa denna hypotes använder vi åtta nyligen förtränade LLMs på kod, inklusive de senaste GPT-versionerna, och fyra vanliga programreparationsstandarder i Java. Vi upptäcker att RTT med engelska som ett mellanspråk reparerade 101 av 164 buggar med GPT-4 på HumanEval-Java-datasetet. Dessutom är 46 av dessa unika buggar som inte repareras av andra LLMs finjusterade för APR. Våra resultat belyser genomförbarheten av tur och retur-översättning med LLMs som en teknik för automatiserad programreparation och dess potential för forskning inom mjukvaruteknik. Automatic Program Repair Software Engineering Large Language Models Round-Trip Translation Neural Machine Translation Automatisk programreparation Mjukvaruutveckling Stora språkmodeller Tur och retur-översättning Neural maskinöversättning Computer and Information Sciences Data- och informationsvetenskap

Page generated in 0.06 seconds