Global ETD Search

1	Code Clone Discovery Based on Functional Behavior Krawitz, Ronald Michael 01 January 2012 (has links) Code clone Discovery Based on Functional Behavior by Ronald M Krawitz 2012 Legacy programs are used for many years and experience many cycles of use-maintenance-use-maintenance-use-etc. Source code or source code functionality is frequently replicated within these programs when it is written, as well as when it is maintained. Over time many different developers with greater or lesser understanding of the source code maintain the source code. Maintenance developers, when they have limited time or lack understanding of the program, frequently resort to short cuts that include cutting and pasting existing code and re-implementing functionality instead of refactoring. This means a specific functionality is often repeated several times, sometimes using different source code. Blocks of replicated source code or source code functionality are called code clones. Removing code clones improves extensibility, maintainability, and reusability of a program in addition to making the program more easily understood. It is generally accepted that four types of code clones exist. Type-1 and Type-2 code clones are comparatively straightforward to locate and tools exist to locate them. However, Type-3 and Type-4 code clones are very difficult to locate with only a few specialized tools capable of locating them with a lower level of precision. This dissertation presents a new methodology that discovered code clones by studying the functional behavior of blocks of code. Code Clone Discovery based on Functional Behavior (FCD) located code clone by comparing how the blocks of code reacted to various inputs. FCD stimulated the code blocks with the same input patterns and compared the resulting outputs. When a significant portion of the outputs matched, those blocks were declared to be a code clone candidate. Manual analysis confirmed that those blocks of code were code clones. Since FCD discovered code clones based on their black-box behavior, the actual source code syntax was irrelevant and manual inspection further confirmed FCD located code clones that included Type-3 and Type-4 code clones which are frequently excluded from code clone detection tools. FCD recognized the code clones regardless of whether or not they use identical code, similar code, or totally dissimilar code. This new technique allows for an improvement in software quality and has the potential to significantly reduce the cost of software over its lifetime. black-box testing code clone functional testing Computer Sciences
2	Toward an Understanding of Software Code Cloning as a Development Practice Kapser, Cory 18 September 2009 (has links) Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong. code clone clone detection clone analysis code duplication Computer Science
3	Toward an Understanding of Software Code Cloning as a Development Practice Kapser, Cory 18 September 2009 (has links) Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong. code clone clone detection clone analysis code duplication Computer Science
4	CloneCompass: visualizations for code clone analysis Wang, Ying 05 May 2020 (has links) Code clones are identical or similar code fragments in a single software system or across multiple systems. Frequent copy-paste-modify activities and reuse of existing systems result in maintenance difficulties and security issues. Addressing these problems requires analysts to undertake code clone analysis, which is an intensive process to discover problematic clones in existing software. To improve the efficiency of this process, tools for code clone detection and analysis, such as Kam1n0 and CCFinder, were created. Kam1n0 is an efficient code clone search engine that facilitates assembly code analysis. However, Kam1n0 search results can contain millions of function-clone pairs, and efficiently exploring and comprehensively understanding the resulting data can be challenging. This thesis presents a design study whereby we collaborated with analyst stakeholders to identify requirements for a tool that visualizes and scales to millions of function-clone pairs. These requirements led to the design of an interactive visual tool, CloneCompass, consisting of novel TreeMap Matrix and Adjacency Matrix visualizations to aid in the exploration of assembly code clones extracted from Kam1n0. We conducted a preliminary evaluation with the analyst stakeholders, and we show how CloneCompass enables these users to visually and interactively explore assembly code clones detected by Kam1n0 with suspected vulnerabilities. To further validate our tool and extend its usability to source code clones, we carried out a Linux case study, where we explored the clones in the Linux kernel detected by CCFinder and gained a number of insights about the cloning activities that may have occurred in the development of the Linux kernel. / Graduate data visualisation Adjacency Matrix visualizations code clone analysis
5	Reduktion von Quellcoderedundanz als Motivator der Evolution von Programmiersprachen am Beispiel von Java 8 Triebel, Anna Juliane 23 April 2018 (has links) Ist die Reduktion von Quellcoderedundanz ein Motivator für die Evolution von Programmiersprachen? Das ist die Ausgangsfrage der Untersuchung, die exemplarisch an Sprachfeatures von Java 8 beleuchtet wird. Code Clones und Boilerplate Code werden als Formen von Quellcoderedundanz aufgefasst, beschrieben und definiert. Quellcoderedundanz wird als das Verhältnis der Komplexität des Ausdrucks und der durch diesen transportierte Information definiert und operationalisiert. Zur Messung der Änderung der Quellcoderedundanz durch Java 8 werden Codesegmente von Java 7 auf Java 8 migriert. Bei konstantem Informationsgehalt wird die Ausdruckskomplexität durch Maße der statischen Codeanalyse verglichen. Die Untersuchung zeigt für alle betrachteten Sprachfeatures eine Abnahme der Quellcoderedundanz, die aus einer Reduktion von Boilerplate Code oder dem Wegfall von Code Clones resultiert. Die Ergebnisse deuten darauf hin, dass die Reduktion von Quellcoderedundanz für die mit Java 8 in die Sprache eingeführten Neuerungen zumindest eine notwendige Eigenschaft ist. Um im Ökosystem der Programmiersprachen weiter bestehen zu können, müssen sich Sprachen weiterentwickeln, da ihr technologisches Umfeld stets im Wandel ist. Um seinen Nutzern die Möglichkeit zu geben, qualitativ hochwertigen Quellcode zu verfassen, müssen Sprachmittel zur Verfügung gestellt werden, die eine elegante Ausdrucksform komplexer Sachverhalte erlauben. Eine geringe Quellcoderedundanz kann also als Qualitätsmerkmal für Quellcode gelten und deren Ermöglichung als Evolutionsvorteil für Programmiersprachen angesehen werden.:1 Einleitung 2 Programmiersprachen und Quellcoderedundanz 2.1 Code Clones 2.2 Boilerplate Code 2.3 Quellcoderedundanz 3 Paradigmenwechsel mit Java 8 3.1 Streams API 3.2 Lambda-Ausdrücke 3.3 Die Klasse Optional<T> 3.4 default-Methoden in Interfaces 4 Schluss info:eu-repo/classification/ddc/330 ddc:330
6	Management Aspects of Software Clone Detection and Analysis 2014 June 1900 (has links) Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision. Code Clone Clone Management Software Maintenance Reengineering Software Evolution Software Engineering
7	Cloneless: Code Clone Detection via Program Dependence Graphs with Relaxed Constraints Simko, Thomas J 01 June 2019 (has links) (PDF) Code clones are pieces of code that have the same functionality. While some clones may structurally match one another, others may look drastically different. The inclusion of code clones clutters a code base, leading to increased costs through maintenance. Duplicate code is introduced through a variety of means, such as copy-pasting, code generated by tools, or developers unintentionally writing similar pieces of code. While manual clone identification may be more accurate than automated detection, it is infeasible due to the extensive size of many code bases. Software code clone detection methods have differing degree of success based on the analysis performed. This thesis outlines a method of detecting clones using a program dependence graph and subgraph isomorphism to identify similar subgraphs, ultimately illuminating clones. The project imposes few constraints when comparing code segments to potentially reveal more clones. Code Clone Program Dependence Graph PDG Duplicate Code Clone Detection Computational Engineering
8	Code Clone Detection for Equivalence Assurance Ersson, Sara January 2020 (has links) To support multiple programming languages, the concept of offering applicationprogramming interfaces (APIs) in multiple programming languages hasbecome commonplace. However, this also brings the challenge of ensuringthat the APIs are equivalent regarding their interface. To achieve this, codeclone detection techniqueswere adapted to match similar function declarationsin the APIs. Firstly, existing code clone detection tools were investigated. Asthey did not perform well, a tree-based syntactic approach was used, where allheader files were compiled with Clang. The abstract syntax trees, which wereobtained during the compilation, were then traversed to locate the functiondeclaration nodes, and to store function names and parameter variable names.When matching the function names, a textual approach was used, transformingthe function names according to a set of implemented rules.A strict rule compares transformations of full function names in a preciseway, whereas a loose rule only compares transformations of parts of functionnames, and matches anything for the remainder. The rules were appliedboth by themselves, and in different combinations, starting with the strictestrule, followed by the second strictest rule, and so fourth.The best-matching rules showed to be the ones which are strict, and are notaffected by the order of the functions in which they are matched. These rulesshowed to be very robust to API evolution, meaning an increase in number ofpublic functions. Rules which are less strict and stable, and not robust to APIevolution, can still be used, such as matching functions on the first or last wordin the function names, but preferably as a complement to the stricter and morestable rules, when most of the functions already have been matched.The tool has been evaluated on the two APIs in King’s software developmentkit, and covered 94% of the 124 available function matches. / För att stödja flera olika programmingsspråk har det blivit alltmer vanligt atterbjuda applikationsprogrammeringsgränssnitt (API:er) på olika programmeringsspråk.Detta resulterar dock i utmaningen att säkerställa att API:erna ärekvivalenta angående deras gränssnitt. För att uppnå detta har kodklonsdetekteringsteknikeranpassats, för att matcha liknande funktionsdeklarationeri API:erna. Först undersöktes existerande kodklonsverktyg. Eftersom de intepresterade bra, användes ett trädbaserat syntaktiskt tillvägagångssätt, där allaheader-filer kompilerades med Clang. De abstrakta syntaxträden, som erhöllsunder kompileringen, traverserades sedan för att lokalisera funktionsdeklarationsnoderna,och för att lagra funktionsnamnen och parametervariabelnamnen.När funktionsnamnen matchades, användes ett textbaserat tillvägagångssätt,som omvandlade funktionsnamnen enligt en uppsättning implementeraderegler.En strikt regel jämför omvandlingar av hela funktionsnamn på ett exakt sätt,medan en lös regel bara jämför omvandlingar av delar of funktionsnamn, ochmatchar den resterande delen med vadsomhelst. Reglerna applicerades bådasjälva och i olika kombinationer, där den striktaste regeln applicerades först,följt av den näst strikaste, och så vidare.De regler som matchar bäst visade sig vara de som är striktast, och som intepåverkas av ordningen på funktionerna i vilken de matchas. Dessa reglervisade sig vara väldigt robusta mot API-evolution, dvs. ett ökat antal publikafunktioner i API:erna. Regler som är mindre strikta och stabila, och interobusta mot API-evolution kan fortfarande användas, men helst som ett komplementtill de striktare och mer stabila reglerna, när de flesta av funktionernaredan har blivit matchade.Verktyget har evaluerats på de två API:erna i Kings mjukvaruutvecklarkit, ochtäckte 94% av de tillgängliga funktionsmatchningarna. APIs Code Clone Detection API Mapping API:er kodklonsdetektering Elektroteknik och elektronik
9	A Topic Modeling approach for Code Clone Detection Khan, Mohammed Salman 01 January 2019 (has links) In this thesis work, the potential benefits of Latent Dirichlet Allocation (LDA) as a technique for code clone detection has been described. The objective is to propose a language-independent, effective, and scalable approach for identifying similar code fragments in relatively large software systems. The main assumption is that the latent topic structure of software artifacts gives an indication of the presence of code clones. It can be hypothesized that artifacts with similar topic distributions contain duplicated code fragments and to prove this hypothesis, an experimental investigation using multiple datasets from various application domains were conducted. In addition, CloneTM, an LDA-based working prototype for code clone detection was developed. Results showed that, if calibrated properly, topic modeling can deliver a satisfactory performance in capturing different types of code clones, showing particularity good performance in detecting Type III clones. CloneTM also achieved levels of performance comparable to already existing practical tools that adopt different clone detection strategies. Thesis University of North Florida UNF code clone clone detection topic modeling machine learning software refactoring software engineering Latent Dirichlet Allocation -- Testing Topic models -- Testing Generative statistical models -- Testing Code clone detection -- Software CloneTM -- Testing Computer and Systems Architecture Software Engineering
10	Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code Jang, Jiyong 01 August 2013 (has links) Software security is a big data problem. The volume of new software artifacts created far outpaces the current capacity of software analysis. This gap has brought an urgent challenge to our security community—scalability. If our techniques cannot cope with an ever increasing volume of software, we will always be one step behind attackers. Thus developing scalable analysis to bridge the gap is essential. In this dissertation, we argue that automatic code reuse detection enables an efficient data reduction of a high volume of incoming malware for downstream analysis and enhances software security by efficiently finding known vulnerabilities across large code bases. In order to demonstrate the benefits of automatic software similarity detection, we discuss two representative problems that are remedied by scalable analysis: malware triage and unpatched code clone detection. First, we tackle the onslaught of malware. Although over one million new malware are reported each day, existing research shows that most malware are not written from scratch; instead, they are automatically generated variants of existing malware. When groups of highly similar variants are clustered together, new malware more easily stands out. Unfortunately, current systems struggle with handling this high volume of malware. We scale clustering using feature hashing and perform semantic analysis using co-clustering. Our evaluation demonstrates that these techniques are an order of magnitude faster than previous systems and automatically discover highly correlated features and malware groups. Furthermore, we design algorithms to infer evolutionary relationships among malware, which helps analysts understand trends over time and make informed decisions about which malware to analyze first. Second, we address the problem of detecting unpatched code clones at scale. When buggy code gets copied from project to project, eventually all projects will need to be patched. We call clones of buggy code that have been fixed in only a subset of projects unpatched code clones. Unfortunately, code copying is usually ad-hoc and is often not tracked, which makes it challenging to identify all unpatched vulnerabilities in code basesat the scale of entire OS distributions. We scale unpatched code clone detection to spot over15,000 latent security vulnerabilities in 2.1 billion lines of code from the Linux kernel, allDebian and Ubuntu packages, and all C/C++ projects in SourceForge in three hours on asingle machine. To the best of our knowledge, this is the largest set of bugs ever reported in a single paper. Malware Triage Feature Hashing Co-clustering Hadoop Unpatched Code Clone Bloom Filter Lineage Binary Analysis Code Reuse Big Data Electrical and Computer Engineering

Search results