Global ETD Search

11	Addressing the Rare Word Problem in Source Code Modelling Ivstam, Linn January 2020 (has links) The field of automatic program repair has adapteddeep learning techniques. Sequence to sequence neural networkshave successfully been applied in neural machine translation(NMT). This can also be applied to automatic program repair,attempting to translate buggy source code into fixed sourcecode. However, the frequent occurrence of user-defined variablesmakes the rare word problem a significant issue. Techniquesused in NMT to handle the rare word problem specifically bytepairing encoding (BPE) and the copy mechanism were appliedand evaluated on source code. The results showed that whenobserving the exact sequence match of the predicted output andtarget output, techniques were not an improvement. However,when observing correct syntax techniques outperformed theoriginal model without any techniques applied. To be able tosee an improvement in exact sequence match there should be agreater variety of sequence length and vocabulary size also, moreextensive hyperparameter tuning should be performed. / Inom området för automatisk mjukvarureparation har djupinlärningstekniker implementerats. Neurala nätverk av typen sekvens till sekvens har blivit framgångsrikt applicerade inom neural maskinöversättning av mänskliga språk. Dessa neurala nätverk kan också appliceras inom automatisk mjukvarureparation genom att översätta källkod innehållande buggar till en lagad kod utan buggar. Den frekventa användningen av användardefinierade variabler gör att ”the rare word problem” är en signifikant svaghet. Tekniker som används i neural maskinöversättning, ”byte pairing encoding” (BPE) och ”the copy mechanism” har applicerats och utvärderats på källkod. Resultaten visar att då modellens förutsagda utdata jämförs med det förväntat utdata visar teknikerna ingen förbättring. Dock hanterar nätverk med tekniker applicerade syntax för programmeringsspråket c avsevärt bättre. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm utomatic program repair neural machinetranslation sequence to sequence byte pairing encoding copymechanism Elektroteknik och elektronik
12	Uma proposta de representação e operadores genéticos para algoritmos evolucionários aplicados no reparo automatizado de software / A proposed representation and genetic operators for evolutionary algorithms applied in automated software repair Oliveira, Vinícius Paulo Lopes de 14 August 2017 (has links) Submitted by JÚLIO HEBER SILVA (julioheber@yahoo.com.br) on 2017-09-13T17:19:44Z No. of bitstreams: 2 Dissertação - Vinícius Paulo Lopes de Oliveira - 2017.pdf: 2066886 bytes, checksum: c610d8e21e23795d1cea6eeca17b5e5e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-09-19T13:58:48Z (GMT) No. of bitstreams: 2 Dissertação - Vinícius Paulo Lopes de Oliveira - 2017.pdf: 2066886 bytes, checksum: c610d8e21e23795d1cea6eeca17b5e5e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-09-19T13:58:48Z (GMT). No. of bitstreams: 2 Dissertação - Vinícius Paulo Lopes de Oliveira - 2017.pdf: 2066886 bytes, checksum: c610d8e21e23795d1cea6eeca17b5e5e (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-08-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Maintenance and software repair are responsible for most of the cost of a software in the course of its life. Software repair through genetic evolution may repair errors and improve software, reducing its high cost. GenProg is a technique that uses this approach and through patches evolution it is capable to fix errors in large and small softwares. A patch composed by low-granularity operations compromise the manipulation of these operations. These operations consist of three subspaces: operation, location of application of the operation and what the operation will apply at the location of the fault (operator, fault and fix, respectively). The recombination and mutation operators applied to a low granulation representation limits the ability of the technique to navigate in search space efficiently. It is proposed the reformulation of the representation, in order to allow greater search capability. Theoretical analysis of the representation showed that the new representation has a greater locality than the original one. Through experimentation, validation and genotypic analysis it is shown that the proposed changes have led to a better performance with respect to the original operators and parameters in terms of efficiency, in the first experiments the operator UnifSingle with memorization was 48.88% more effective than the Original operator and then the operator OPSingle_V2 was 26% more effective than the operator UnifSingle with memorization. Some characteristics of these cross-operators were observed through a genotype distance analysis and their influence on the automatic software reapair problem. The proposed mutation operator shown superior results if compared to original. Combination between operator UniSingle with memorization showed the best efficacy among all combinations of operators and parameters (28.29% superior to the best result of the original GenProg). / Manutenção e reparo de software é responsável pela maior parte do custo de um software no decorrer de sua vida. O reparo de software por meio de evolução genética pode reparar erros e/ou melhorar softwares, diminuindo seu alto custo. GenProg é uma técnica em desenvolvimento que utiliza esta abordagem e por meio de evolução de patches é capaz de reparar erros em grandes e pequenos softwares. Um patch é composto por operações de edições de baixa granularidade o que compromete a separação e edição dessas operações. Essas operações são formadas por três subespaços: operação, local da aplicação da operação e o que a operação irá aplicar no local da falha (operator, fault, fix, respectivamente). Os operadores de recombinação e mutação aplicados às representações de baixa granularidade limita a habilidade da técnica de navegar no espaço de busca de forma eficiente. É proposto neste estudo, a reformulação da representação, do operador de cruzamento e mutação a fim de permitir uma maior capacidade de busca. Análises teóricas da representação demonstraram que a nova representação possui localidade maior que a original. Por meio de experimentações, validações e análises genotípicas é mostrado que as mudanças propostas levaram a uma melhoria em relação aos operadores e parâmetros originais em termos de eficácia, sendo que nos experimentos iniciais o operador UnifSingle com memorização apresentou eficácia 45,88% superior ao melhor caso do operador Original e em seguida o operador posteriormente proposto OPSingle_V2 apresentou eficácia 26% superior ao UnifSingle com memorização. Foram observadas algumas características desses operadores de cruzamento por meio de uma análise por distância genotípica e suas influências no problema de reparo automatizado de software. O operador de mutação proposto apresentou resultados superiores ao operador de mutação original e combinado com operador UnifSingle com memorização, apresentou a melhor eficácia entre todas as combinações de operadores e parâmetros. Reparo automatizado de software Evolução genética Programação genética Representação Operadores genéticos Automatic software repair Automated program repair Genetic improvement Genetic programming Representation
13	Exploring the Usage of Neural Networks for Repairing Static Analysis Warnings / Utforsking av användningen av neurala nätverk för att reparera varningar för statisk analys Lohse, Vincent Paul January 2021 (has links) C# provides static analysis libraries for template-based code analysis and code fixing. These libraries have been used by the open-source community to generate numerous NuGet packages for different use-cases. However, due to the unstructured vastness of these packages, it is difficult to find the ones required for a project and creating new analyzers and fixers take time and effort to create. Therefore, this thesis proposes a neural network, which firstly imitates existing fixers and secondly extrapolates to fixes of unseen diagnostics. To do so, the state-of-the-art of static analysis NuGet packages is examined and further used to generate a dataset with diagnostics and corresponding code fixes for 24,622 data points. Since many C# fixers apply formatting changes, all formatting is preserved in the dataset. Furthermore, since the fixers also apply identifier changes, the tokenization of the dataset is varied between splitting identifiers by camelcase and preserving them. The neural network uses a sequence-to-sequence learning approach with the Transformer model and takes file context, diagnostic message and location as input and predicts a diff as output. It is capable of imitating 46.3% of the fixes, normalized by diagnostic type, and for data points with unseen diagnostics, it is able to extrapolate to 11.9% of normalized data points. For both experiments, splitting identifiers by camelcase produces the best results. Lastly, it is found that a higher proportion of formatting tokens in input has minimal positive impact on prediction success rates, whereas the proportion of formatting in output has no impact on success rates. / C# tillhandahåller statiska analysbibliotek för mallbaserad kodanalys och kodfixering. Dessa bibliotek har använts av open source-gemenskapen för att generera många NuGet-paket för olika användningsfall. Men på grund av mängden av dessa paket är det svårt att hitta de som krävs för ett projekt och att skapa nya analysatorer och fixare tar tid och ansträngning att skapa. Därför föreslår denna avhandling ett neuralt nätverk, som för det första imiterar befintliga korrigeringar och för det andra extrapolerar till korrigeringar av osynlig diagnostik. För att göra det har det senaste inom statisk analys NuGetpaketen undersökts och vidare använts för att generera en datauppsättning med diagnostik och motsvarande kodfixar för 24 622 datapunkter. Eftersom många C# fixers tillämpar formateringsändringar, bevaras all formatering i datasetet. Dessutom, eftersom fixarna också tillämpar identifieringsändringar, varieras tokeniseringen av datamängden mellan att dela upp identifierare efter camelcase och att bevara dem. Det neurala nätverket använder en sekvenstill- sekvens-inlärningsmetod med Transformer-modellen och tar filkontext, diagnostiskt meddelande och plats som indata och förutsäger en skillnad som utdata. Den kan imitera 46,3% av korrigeringarna, normaliserade efter diagnostisk typ, och för datapunkter med osynlig diagnostik kan den extrapolera till 11,9% av normaliserade datapunkter. För båda experimenten ger uppdelning av identifierare efter camelcase de bästa resultaten. Slutligen har det visat sig att en högre andel formateringstokens i indata har minimal positiv inverkan på åndelen korrekta förutsägelser, medan andelen formatering i utdata inte har någon inverkan på åndelen korrekta förutsägelser. Automatic Program Repair Neural Machine Translation Static Analysis Transformer Model Formatting Automatisk programreparation neural maskinöversättning statisk analys transformatormodell formatering Software Engineering Programvaruteknik
14	An Empirical Study on Using Codex for Automated Program Repair Zhao, Pengyu January 2023 (has links) This thesis explores the potential of Codex, a pre-trained Large Language Model (LLM), for Automated Program Repair (APR) by assessing its performance on the Defects4J benchmark that includes real-world Java bugs. The study aims to provide a comprehensive understanding of Codex’s capabilities and limitations in generating syntactically and semantically equivalent patches for defects, as well as evaluating its ability to handle defects with different levels of importance and complexity. Additionally, we aim to compare the performance of Codex with other LLMs in the APR domain. To achieve these objectives, we employ a systematic methodology that includes prompt engineering, Codex parameter adjustment, code extraction, patch verification, and Abstract Syntax Tree (AST) comparison. We successfully verified 528 bugs in Defects4J, which represents the highest number among other studies, and achieved 53.98% of plausible and 26.52% correct patches. Furthermore, we introduce the elle-elle-aime framework, which extends the RepairThemAll for Codex-based APR and is adaptable for evaluating other LLMs, such as ChatGPT and GPT-4. The findings of this empirical study provide valuable insights into the factors that impact Codex’s performance on APR, helping to create new prompt strategies and techniques that improve research productivity. / Denna avhandling utforskar potentialen hos Codex, en förtränad LLM, för APR genom att utvärdera dess prestanda på Defects4J-benchmarket som inkluderar verkliga Java-buggar. Studien syftar till att ge en omfattande förståelse för Codex förmågor och begränsningar när det gäller att generera syntaktiskt och semantiskt ekvivalenta patchar för defekter samt att utvärdera dess förmåga att hantera defekter med olika nivåer av betydelse och komplexitet. Dessutom är vårt mål att jämföra prestanda hos Codex med andra LLM inom APR-området. För att uppnå dessa mål använder vi en systematisk metodik som inkluderar prompt engineering, justering av Codex-parametrar, kodextraktion, patchverifiering och jämförelse av AST. Vi verifierade framgångsrikt 528 buggar i Defects4J, vilket representerar det högsta antalet bland andra studier, och uppnådde 53,98% plausibla och 26,52% korrekta patchar. Vidare introducerar vi elle-elle-aime ramverket, som utvidgar RepairThemAll för Codex-baserad APR och är anpassningsbart för att utvärdera andra LLM, såsom ChatGPT och GPT-4. Resultaten av denna empiriska studie ger värdefulla insikter i de faktorer som påverkar Codex prestanda på APR och hjälper till att skapa nya promptstrategier och tekniker som förbättrar forskningsproduktiviteten. Automated Program Repair Codex Large Language Models Defects4J Patch Generation Prompt Engineering Automatiserad Programreparation Codex Storskaliga Språkmodeller Defects4J Patchgenerering Promptteknik Computer and Information Sciences Data- och informationsvetenskap
15	Round-Trip Translation : A New Path for Automatic Program Repair using Large Language Models / Tur och retur-översättning : En ny väg för automatisk programreparation med stora språkmodeller Vallecillos Ruiz, Fernando January 2023 (has links) Research shows that grammatical mistakes in a sentence can be corrected by machine translating it to another language and back. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR), a software engineering task. Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing fine-tuning and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. / Forskning visar att grammatiska fel i en mening kan korrigeras genom att maskinöversätta den till ett annat språk och tillbaka. Vi undersöker om denna korrigeringsegenskap hos stora språkmodeller (LLMs) även gäller för Automatisk Programreparation (APR), en uppgift inom mjukvaruteknik. Nuvarande generativa modeller för APR är förtränade på källkod och finjusterade för reparation. Denna artikel föreslår att man undviker finjustering och använder Tur och retur-översättning (RTT): översättning av kod från ett programmeringsspråk till ett annat programmerings- eller naturspråk, och tillbaka. Vi antar att RTT med LLMs utför en regression mot medelvärdet, vilket tar bort buggar eftersom de är en form av brus med avseende på den mer frekventa, naturliga, buggfria koden i träningsdatan. För att testa denna hypotes använder vi åtta nyligen förtränade LLMs på kod, inklusive de senaste GPT-versionerna, och fyra vanliga programreparationsstandarder i Java. Vi upptäcker att RTT med engelska som ett mellanspråk reparerade 101 av 164 buggar med GPT-4 på HumanEval-Java-datasetet. Dessutom är 46 av dessa unika buggar som inte repareras av andra LLMs finjusterade för APR. Våra resultat belyser genomförbarheten av tur och retur-översättning med LLMs som en teknik för automatiserad programreparation och dess potential för forskning inom mjukvaruteknik. Automatic Program Repair Software Engineering Large Language Models Round-Trip Translation Neural Machine Translation Automatisk programreparation Mjukvaruutveckling Stora språkmodeller Tur och retur-översättning Neural maskinöversättning Computer and Information Sciences Data- och informationsvetenskap
16	Towards Understanding and Securing the OSS Supply Chain Vu Duc, Ly 14 March 2022 (has links) Free and Open-Source Software (FOSS) has become an integral part of the software supply chain in the past decade. Various entities (automated tools and humans) are involved at different stages of the software supply chain. Some actions that occur in the chain may result in vulnerabilities or malicious code injected in a published artifact distributed in a package repository. At the end of the software supply chain, developers or end-users may consume the resulting artifacts altered in transit, including benign and malicious injection. This dissertation starts from the first link in the software supply chain, ‘developers’. Since many developers do not update their vulnerable software libraries, thus exposing the user of their code to security risks. To understand how they choose, manage and update the libraries, packages, and other Open-Source Software (OSS) that become the building blocks of companies’ completed products consumed by end-users, twenty-five semi-structured interviews were conducted with developers of both large and small-medium enterprises in nine countries. All interviews were transcribed, coded, and analyzed according to applied thematic analysis. Although there are many observations about developers’ attitudes on selecting dependencies for their projects, additional quantitative work is needed to validate whether behavior matches or whether there is a gap. Therefore, we provide an extensive empirical analysis of twelve quality and popularity factors that should explain the corresponding popularity (adoption) of PyPI packages was conducted using our tool called py2src. At the end of the software supply chain, software libraries (or packages) are usually downloaded directly from the package registries via package dependency management systems under the comfortable assumption that no discrepancies are introduced in the last mile between the source code and their respective packages. However, such discrepancies might be introduced by manual or automated build tools (e.g., metadata, Python bytecode files) or for evil purposes (malicious code injects). To identify differences between the published Python packages in PyPI and the source code stored on Github, we developed a new approach called LastPyMile . Our approach has been shown to be promising to integrate within the current package dependency management systems or company workflow for vetting packages at a minimal cost. With the ever-increasing numbers of software bugs and security vulnerabilities, the burden of secure software supply chain management on developers and project owners increases. Although automated program repair approaches promise to reduce the burden of bug-fixing tasks by suggesting likely correct patches for software bugs, little is known about the practical aspects of using APR tools, such as how long one should wait for a tool to generate a bug fix. To provide a realistic evaluation of five state-of-the-art APR tools, 221 bugs from 44 open-source Java projects were run within a reasonable developers’ time and effort.
17	An initial investigation of Automatic Program Repair for Solidity Smart Contracts with Large Language Models / En första undersökning av automatisk lagning av solidity smarta kontrakt med stora språkmodeller Cruz, Erik January 2023 (has links) This thesis investigates how Large Language Models can be used to repair Solidity Smart Contracts automatically through the main contribution of this thesis, the Transformative Repair Tool. The Transformative Repair Tool achieves similar results to current state-of-the-art tools on the Smartbugs Curated Dataset and is the first published tool that uses Large Language Models to repair Solidity Smart Contracts. Moreover, the thesis explores different prompt strategies to repair Smart Contracts and assess their performance. / Detta masterexamensarbete undersöker hur stora språkmodeller kan användas för att automatisk laga solidity smarta kontrakt genom verktyget Transformative Repair Tool, som är detta masterexamensarbete huvudsakliga bidrag. Transformative Repair Tool presterar liknande som dagens bästa verktyg inom automatisk lagning av smarta kontrakt på Smartbugs Curated datasettet och är det första publicerade verktyget som just använder stora språkmodeller för att reparera solidity smarta kontrakt. Dessutom så utforskar denna rapport olika textprompts och dess prestanda för att laga smarta kontrakt Automatic Program Repair APR Large Language Models LLM Smart Contracts Smart Contract Audit Chat GPT Cybersecurity Automatisk Lagning av Kod Stora språkmodeller Smarta Kontrakt Granskning av Smarta Kontrakt Chat GPT Cybersäkerhet Computer and Information Sciences Data- och informationsvetenskap

Page generated in 0.6311 seconds