Global ETD Search

1	Detecting code duplications in the NPM community Liu, Hanwen 09 September 2021 (has links) In the modern software development process, it has become a very mainstream practice to build software projects on top of third-party packages to simplify the development process. In this development method, it is quite common to copy existing code or files in other libraries instead of making regular calls. Although this approach can reduce the project's dependence on other libraries and make the project more streamlined, it also causes difficulties in maintenance and understanding. The ignorance of code duplication by third-party library community can even be exploited for malicious purpose, such as typo-squatting attack. This paper serves as a starting point to analyze the growing code duplication issues surrounding third-party open source packages, and what is the root cause of code duplication. In this paper, I conducted code duplication-related research based on some popular packages in the third-party open source packages community, the NPM community, by using the tokenizer tool and the code comparison tool to compute the code similarity, quantitatively analyzed the prevalence of code duplication in the NPM community, and did some related experiments based on this similarity. In the experiments, I found that code duplication is very common in NPM community: 17.1% of all the files have 1-93 similar file in other package when the threshold of similar file is set to 0.5. 29.3% of all the packages has at least one "similar package" when the threshold of similar package is set to 0.5. In all the 951 similar package pairs, 33.9% of them, 323 package pairs comes from the same domain. The ultimate goal of this paper is to promote the awareness of the commonness and the importance of code duplication in the third-party package community and the reasonable use of code duplication by developers in the project development. / In the modern software development process, developers often call other people's completed code to build their own programs. There are generally two ways to do this: indirectly call other people's code through "import" or similar instructions in the program, or directly copy and paste other people's code and make slight modifications. The second method can make the program more independent and easy to use, but the code duplication problem caused by this method also has great security risks.This paper serves as a starting point to analyze the growing code duplication issues, and what is the root cause of code duplication. In this paper, I conducted code duplication-related research based on some popular code packages in the NPM community.I used some tools to compute a value to define how different codes are similar to each other, quantitatively analyzed the prevalence of code duplication in the NPM community, and did some related experiments based on this similarity. In the experiments, I found that code duplication is very common in the NPM community: 17.1% of all the files have 1-93 similar file in other package, and 29.3% of all the package have at least one "similar package", when the definition of similar files and packages are not that "strict".In all the 951 similar package pairs, 33.9% of them, 323 package pairs comes from the same domain. The ultimate goal of this paper is to promote the awareness of the commonness and the importance of code duplication in the third-party package community and the reasonable use of code duplication by developers in the project development. Code duplication NPM Clustering Code Similarity
2	Toward an Understanding of Software Code Cloning as a Development Practice Kapser, Cory 18 September 2009 (has links) Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong. code clone clone detection clone analysis code duplication Computer Science
3	Toward an Understanding of Software Code Cloning as a Development Practice Kapser, Cory 18 September 2009 (has links) Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong. code clone clone detection clone analysis code duplication Computer Science
4	Contributions à l’usage des détecteurs de clones pour des tâches de maintenance logicielle / Contributions to the use of code clone detectors in software maintenance tasks Charpentier, Alan 17 October 2016 (has links) L’existence de plusieurs copies d’un même fragment de code (nommées des clones dans lalittérature) dans un logiciel peut compliquer sa maintenance et son évolution. La duplication decode peut poser des problèmes de consistance, notamment lors de la propagation de correction debogues. La détection de clones est par conséquent un enjeu important pour préserver et améliorerla qualité logicielle, propriété primordiale pour le succès d’un logiciel.L’objectif général de cette thèse est de contribuer à l’usage des détecteurs de clones dans destâches de maintenance logicielle. Nous avons centré nos contributions sur deux axes de recherche.Premièrement, la méthodologie pour comparer et évaluer les détecteurs de clones, i.e. les benchmarksde clones. Nous avons empiriquement évalué un benchmark de clones et avons montré queles résultats dérivés de ce dernier n’étaient pas fiables. Nous avons également identifié des recommandationspour fiabiliser la construction de benchmarks de clones. Deuxièmement, la spécialisationdes détecteurs de clones dans des tâches de maintenance logicielle.Nous avons développé uneapproche spécialisée dans un langage et une tâche (la réingénierie) qui permet aux développeursd’identifier et de supprimer la duplication de code de leurs logiciels. Nous avons mené des étudesde cas avec des experts du domaine pour évaluer notre approche. / The existence of several copies of a same code fragment—called code clones in the literature—in a software can complicate its maintenance and evolution. Code duplication can lead to consistencyproblems, especially during bug fixes propagation. Code clone detection is therefore a majorconcern to maintain and improve software quality, which is an essential property for a software’ssuccess.The general objective of this thesis is to contribute to the use of code clone detection in softwaremaintenance tasks. We chose to focus our contributions on two research topics. Firstly, themethodology to compare and assess code clone detectors, i.e. clone benchmarks. We perform anempirical assessment of a clone benchmark and we found that results derived from this latter arenot reliable. We also identified recommendations to construct more reliable clone benchmarks.Secondly, the adaptation of code clone detectors in software maintenance tasks. We developed aspecialized approach in one language and one task—refactoring—allowing developers to identifyand remove code duplication in their softwares. We conducted case studies with domain experts toevaluate our approach. Maintenance logicielle Duplication de code Études empiriques Software maintenance Code duplication Empirical studies
5	Code duplication and reuse in Jupyter notebooks Koenzen, Andreas Peter 21 September 2020 (has links) Reusing code can expedite software creation, analysis and exploration of data. Expediency can be particularly valuable for users of computational notebooks, where duplication allows them to quickly test hypotheses and iterate over data, without creating code from scratch. In this thesis, I’ll explore the topic of code duplication and the behaviour of code reuse for Jupyter notebooks; quantifying and describing snippets of code and explore potential barriers for reuse. As part of this thesis I conducted two studies into Jupyter notebooks use. In my first study, I mined GitHub repositories, quantifying and describing code duplicates contained within repositories that contained at least one Jupyter notebook. For my second study, I conducted an observational user study using a contextual inquiry, where my participants solved specific tasks using notebooks, while I observed and took notes. The work in this thesis can be categorized as exploratory, since both my studies were aimed at generating hypotheses for which further studies can build upon. My contributions with this thesis is two-fold: a thorough description of code duplicates contained within GitHub repositories and an exploration of the behaviour behind code reuse in Jupyter notebooks. It is my desire that others can build upon this work to provide new tools, addressing some of the issues outlined in this thesis. / Graduate Jupyter computational notebooks code duplication code clones code reuse data analysis data exploration exploratory programming
6	Detecção interprocedimental de clones semânticos / Interprocedural semantic clone detection Felipe de Alencar Albuquerque 08 November 2013 (has links) Fragmentos de código duplicado, ou clones, são inseridos em aplicativos por serem uma maneira simples de reúso, dentre outros motivos. Clones são, portanto, comuns em programas. No entanto, a atividade de manutenção pode ficar custosa se o código do programa analisado possuir muitos clones, principalmente os semânticos, os quais podem possuir códigos distintos, mas realizam tarefas similares. Nesse sentido, a utilização de ferramentas que automatizam a tarefa de detectar clones é desejável. Ferramentas atuais de detecção de clones semânticos são capazes de identificar esses clones com altas taxas de acerto. No entanto, elas não são capazes de identificar clones semânticos considerando também os fluxos dos procedimentos ou funções que são invocados dentro dos fragmentos de código comparados. Essa limitação pode levar as ferramentas a indicarem clones semânticos falso positivos. Este trabalho apresenta uma técnica de detecção de clones semânticos que considera as chamadas de procedimentos presentes nos programas. Essa técnica apresentou uma taxa de acertos 2,5% maior do que técnicas convencionais de acordo com um benchmark, também desenvolvido neste trabalho. Esse benchmark foi criado com base nas classificações de clones fornecidas por programadores da indústria e da academia. A técnica interprocedimental de detecção de clones semânticos pode ser utilizada para evolução, manutenção, refatoração e entendimento de programas. / Fragments of duplicated code, or clones, are embedded in applications as they are a simple way to reuse code, among other reasons. Clones are therefore common in programs. However, the maintenance activity may be costly if the program code has many clones to analyze, specially semantic clones, which are semantically similar but may have different syntax. In this regard, the use of tools that automate the task of detecting clones is desirable. Current tools for detecting semantic clones are able to identify those clones with high hit rates. However, they are not able to detect semantic clones also considering the flow of procedures or functions that are invoked within the compared code fragments. This limitation can lead the tools to indicate false positive semantic clones. This paper presents a technique that takes into account the procedure calls in programs to detect semantic clones. This technique showed a 2.5% higher hit rate than conventional techniques according to a benchmark also developed in this work. This benchmark was created based on evaluations provided by programmers from academic and industrial settings. The interprocedural semantic clone detection technique can be used for evolution, maintenance, refactoring and understanding of programs. Clones semânticos Detecção de clones Duplicação de código Grafos de dependência de programas Clone detection Code duplication Program dependence graphs Semantic clones
7	Detecção interprocedimental de clones semânticos / Interprocedural semantic clone detection Albuquerque, Felipe de Alencar 08 November 2013 (has links) Fragmentos de código duplicado, ou clones, são inseridos em aplicativos por serem uma maneira simples de reúso, dentre outros motivos. Clones são, portanto, comuns em programas. No entanto, a atividade de manutenção pode ficar custosa se o código do programa analisado possuir muitos clones, principalmente os semânticos, os quais podem possuir códigos distintos, mas realizam tarefas similares. Nesse sentido, a utilização de ferramentas que automatizam a tarefa de detectar clones é desejável. Ferramentas atuais de detecção de clones semânticos são capazes de identificar esses clones com altas taxas de acerto. No entanto, elas não são capazes de identificar clones semânticos considerando também os fluxos dos procedimentos ou funções que são invocados dentro dos fragmentos de código comparados. Essa limitação pode levar as ferramentas a indicarem clones semânticos falso positivos. Este trabalho apresenta uma técnica de detecção de clones semânticos que considera as chamadas de procedimentos presentes nos programas. Essa técnica apresentou uma taxa de acertos 2,5% maior do que técnicas convencionais de acordo com um benchmark, também desenvolvido neste trabalho. Esse benchmark foi criado com base nas classificações de clones fornecidas por programadores da indústria e da academia. A técnica interprocedimental de detecção de clones semânticos pode ser utilizada para evolução, manutenção, refatoração e entendimento de programas. / Fragments of duplicated code, or clones, are embedded in applications as they are a simple way to reuse code, among other reasons. Clones are therefore common in programs. However, the maintenance activity may be costly if the program code has many clones to analyze, specially semantic clones, which are semantically similar but may have different syntax. In this regard, the use of tools that automate the task of detecting clones is desirable. Current tools for detecting semantic clones are able to identify those clones with high hit rates. However, they are not able to detect semantic clones also considering the flow of procedures or functions that are invoked within the compared code fragments. This limitation can lead the tools to indicate false positive semantic clones. This paper presents a technique that takes into account the procedure calls in programs to detect semantic clones. This technique showed a 2.5% higher hit rate than conventional techniques according to a benchmark also developed in this work. This benchmark was created based on evaluations provided by programmers from academic and industrial settings. The interprocedural semantic clone detection technique can be used for evolution, maintenance, refactoring and understanding of programs. Clone detection Clones semânticos Code duplication Detecção de clones Duplicação de código Grafos de dependência de programas Program dependence graphs Semantic clones

1

Page generated in 0.1183 seconds