Global ETD Search

1	Empirical Studies of Code Clone Genealogies BARBOUR, LILIANE JEANNE 31 January 2012 (has links) Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs so as to properly propagate any changes between clones. A clone pair experiences many changes during the creation and maintenance of software systems. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. However, if a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this thesis, we provide a formal definition of clone genealogies, and perform two case studies to examine clone genealogies. In the first study, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We also build prediction models using clone metrics from one snapshot and compare them to models that include historical evolutionary information about code clones. We examine three long-lived software systems and identify clones using Simian and CCFinder clone detection tools. The results show that there is a relationship between the size of the clone and the time interval between changes and fault-proneness of a clone pair. Additionally, we show that adding evolutionary information increases the precision, recall, and F-Measure of fault prediction models by up to 26%. In our second study, we define 8 types of late propagation and compare them to other forms of clone evolution. Our results not only verify that late propagation is more harmful to software systems, but also establish that some specific cases of late propagations are more harmful than others. Specifically, two cases are most risky: (1) when a clone experiences inconsistent changes and then a re-synchronizing change without any modification to the other clone in a clone pair; and (2) when two clones undergo an inconsistent modification followed by a re-synchronizing change that modifies both the clones in a clone pair. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2012-01-31 11:39:10.503 clone genealogies code clones software engineering
2	Identifying malware similarity through token-based and semantic code clones Lanclos, Christopher I. G. 08 December 2023 (has links) (PDF) Malware is the source or a catalyst for many of the attacks on our cyberspace. Malware analysts and other cybersecurity professionals are responsible for responding to and understanding attacks to mount a defense against the attacks in our cyberspace. The sheer amount of malware alone makes this a difficult task, but malware is also increasing in complexity. This research provides empirical evidence that a hybrid approach using token-based and semantic-based code clones can identify similarities between malware. In addition, the use of different normalization techniques and the use of undirected matrices versus directed matrices were studied. Lastly, the impact of the use of inexact code clones was evaluated. Our results showed that our approach to determining the similarity between malware outperforms two methods currently used in malware analyses. In addition, we showed that overly generalized normalization of code sections would hinder the performance of the proposed method. At the same time, there is no significant difference between the use of directed and undirected matrices. This research also confirmed the positive impact of using inexact code clones when determining similarity. Malware Code Clones Assembly Code Matrices
3	Normalizer: Augmenting Code Clone Detectors Using Source Code Normalization Ly, Kevin 01 March 2017 (has links) (PDF) Code clones are duplicate fragments of code that perform the same task. As software code bases increase in size, the number of code clones also tends to increase. These code clones, possibly created through copy-and-paste methods or unintentional duplication of effort, increase maintenance cost over the lifespan of the software. Code clone detection tools exist to identify clones where a human search would prove unfeasible, however the quality of the clones found may vary. I demonstrate that the performance of such tools can be improved by normalizing the source code before usage. I developed Normalizer, a tool to transform C source code to normalized source code where the code is written as consistently as possible. By maintaining the code's function while enforcing a strict format, the variability of the programmer's style will be taken out. Thus, code clones may be easier to detect by tools regardless of how it was written. Reordering statements, removing useless code, and renaming identifiers are used to achieve normalized code. Normalizer was used to show that more clones can be found in Introduction to Computer Networks assignments by normalizing the source code versus the original source code using a small variety of code clone detection tools. code clones normalization static analysis Other Computer Engineering
4	Understanding the Evolution of Code Clones in Software Systems 2013 August 1900 (has links) Code cloning is a common practice in software development. However, code cloning has both positive aspects such as accelerating the development process and negative aspects such as causing code bloat. After a decade of active research, it is clear that removing all of the clones from a software system is not desirable. Therefore, it is better to manage clones than to remove them. A software system can have thousands of clones in it, which may serve multiple purposes. However, some of the clones may cause unwanted management difficulties and clones like these should be refactored. Failure to manage clones may cause inconsistencies in the code, which is prone to error. Managing thousands of clones manually would be a difficult task. A clone management system can help manage clones and find patterns of how clones evolve during the evolution of a software system. In this research, we propose a framework for constructing and visualizing clone genealogies with change patterns (e.g., inconsistent changes), bug information, developer information and several other important metrics in a software system. Based on the framework we design and build an interactive prototype for a multi-touch surface (e.g., an iPad). The prototype uses a variety of techniques to support understanding clone genealogies, including: identifying and providing a compact overview of the clone genealogies along with their key characteristics; providing interactive navigation of genealogies, cloned source code and the differences between clone fragments; providing the ability to filter and organize genealogies based on their properties; providing a feature for annotating clone fragments with comments to aid future review; and providing the ability to contact developers from within the system to find out more information about specific clones. To investigate the suitability of the framework and prototype for investigating and managing cloned code, we elicit feedback from practicing researchers and developers, and we conduct two empirical studies: a detailed investigation into the evolution of function clones and a detailed investigation into how clones contribute to bugs. In both empirical studies we are able to use the prototype to quickly investigate the cloned source code to gain insights into clone use. We believe that the clone management system and the findings will play an important role in future studies and in managing code clones in software systems.
5	Empirische Untersuchung der Eignung von Code-Clones für den Nachweis der Redundanz als Treiber für die Evolution von Programmierkonzepten Harnisch, Björn Ole 12 February 2018 (has links) Bei der Entwicklung von Programmen werden durch Entwickler regelmäßig Code-Clones durch das Kopieren von Quellcode erzeugt. In dieser Arbeit wird ein Ansatz zur automatisierten Messung dieses duplizierten Codes mit Hilfe von Clone-Detection-Tools über mehrere Versionen von verschiedenen Software-Produkten gezeigt. Anhand der Historien von Code-Clones werden Einflüsse auf die Redundanzen dieser Software empirisch gemessen. Damit wird eine Grundlage für den Beweis, dass die Entwicklung von Programmiersprachen zu einem dominanten Teil durch Redundanzreduzierung getrieben wird, geschaffen.:Inhaltsverzeichnis Abstract I Inhaltsverzeichnis II 1 Einleitung 1 1.1 Problemstellung 1 1.2 Zielsetzung 1 1.3 Vorgehensweise 3 2 Vorbetrachtung 5 2.1 Programmierkonzepte 5 2.1.1 Definition 5 2.1.2 Programmierkonzepte in Java 5 2.2 Treiber für die Entwicklung von Programmierkonzepten 8 2.2.1 Arten der Treiber von Programmierkonzepten 9 2.2.2 Reduzierung von Redundanz in Software 10 2.2.2.1 Arten von Redundanz in Software 10 2.2.2.2 Code-Clones 11 2.2.2.3 Folgen von Redundanz in Software 13 2.2.3 Ansätze für den Nachweis von Redundanzreduzierung als Treiber 14 2.3 Auswahl Software Repositories für die Analysen 16 2.3.1 Arten von Software Repositories 16 2.3.2 Anforderung an Software Repositories 17 3 Erhebungsprozess für die Analyse von Software auf Clones 20 3.1 Aufbau des Erhebungsprozesses 20 3.1.1 Lösungsansatz 20 3.1.2 Prozessteuerung 21 3.2 Umgang mit Versionierung 22 3.2.1 Allgemein 22 3.2.2 Commit-Filter 24 3.3 Clone-Detection 25 3.3.1 Arten und Vertreter 25 3.3.2 Eigene Verwendung 28 3.3.2.1 Simian 28 3.3.2.2 CCFinderX 29 3.3.3 Laufzeitproblem und Lösungsansätze 31 3.4 Datenaggregation 32 4 Auswertung der Messungen 35 4.1 Vorgehensweise der Auswertung 35 4.2 Betrachtung von Code-Clone-Historien 35 4.3 Vergleich unterschiedlicher Konfigurationen 41 4.3.1 Vergleich unterschiedlicher Clone-Detection-Tools 41 4.3.2 Vergleich unterschiedlicher Commit-Filter 45 4.3.3 Vergleich unterschiedlicher Schwellwerte für die Erkennung 46 4.4 Untersuchung verschiedener Interessenpunkte 48 5 Nachbetrachtung 53 5.1 Fehlerbetrachtung 53 5.2 Erweiterungsmöglichkeiten 55 5.3 Schlussbemerkung 57 Anhang V Vorgehensweise der Literaturrecherchen V Verwendete Computerkonfiguration IX Beispiele für Dateien X Beispiel für Detailausgabe von Simian X Beispiel für Detailausgabe von CCFinderX XI Beispiel für aggregierte Daten XII Abbildungsverzeichnis XIII Tabellenverzeichnis XIV Programmtextverzeichnis XV Abkürzungsverzeichnis XVI Literaturverzeichnis XVII Eidesstattliche Erklärung XXIII info:eu-repo/classification/ddc/330 ddc:330
6	Arten der Redundanz im Zusammenhang mit Code-Clones Willert, Nico 19 November 2018 (has links) Durch Redundanz im Quellcode kommt es zur Einschränkung wichtiger Faktoren wie der Lesbarkeit oder Wartbarkeit des Codes. Damit einhergehend kann Fehlverhalten im Programmablauf entstehen, wenn Code-Fragmente gezielt dupliziert werden, anstatt sie wiederzuverwenden. Für die frühzeitige Erkennung solcher Probleme ist es daher nötig, die Redundanz in ihren verschiedenen Formen aufzuschlüsseln. Das Ziel dieser Arbeit war es zu untersuchen, wodurch sich diese Formen beziehungsweise Arten der Redundanz unterscheiden, wie diese zusammenhängen und auf welche Weise man Redundanz mit dem Begriff Code-Clone zusammenführen kann. Zu diesem Zweck wurde eine Literaturstudie durchgeführt, um den aktuellen Forschungsstand zu erfassen. Dabei wurden neben der Redundanz auch die Themen Code-Clones und Ähnlichkeit betrachtet. Die Ergebnisse der Literaturstudie wurden anhand der Arten der Redundanz gegliedert und durch Code-Clone-Beispiele verdeutlicht. Die Literaturstudie ergab, dass Redundanz vorwiegend durch Duplikation von Code- Fragmenten entsteht, wodurch sich mithilfe von Code-Clones ein Großteil der Redundanz abbilden lässt. Des Weiteren sind die Arten der Redundanz nicht disjunkt, wodurch sich eine hundertprozentige Untergliederung nicht durchführen lässt.:Gliederung AbbildungsverzeichnisI Quellcode-Listing 1. Einleitung 1.1 Motivation 1.2 Zielstellung 1.3 Aufbau der Arbeit 2. Definitionen 3.Vorgehen 3.1Methodisches Vorgehen 3.2 Planung 3.3 Selektion 3.4 Extraktion 3.5 Ausführung 4. Ergebnisse 4.1 Negative Software Redundanz 4.2 Textuelle Redundanzen 4.3 Funktionelle Redundanz 4.4 Boilerplate-Code 4.5 Entstehungsgrund-basierte Redundanzen 4.5.1 Gezwungene Redundanz 4.5.2 Zufällige Redundanz 4.6 Abgrenzung der Redundanzarten voneinander 5. Fazit 6. Ausblick Quellen Redundanz, Code Clones, Ähnlichkeit info:eu-repo/classification/ddc/330 ddc:330
7	Code duplication and reuse in Jupyter notebooks Koenzen, Andreas Peter 21 September 2020 (has links) Reusing code can expedite software creation, analysis and exploration of data. Expediency can be particularly valuable for users of computational notebooks, where duplication allows them to quickly test hypotheses and iterate over data, without creating code from scratch. In this thesis, I’ll explore the topic of code duplication and the behaviour of code reuse for Jupyter notebooks; quantifying and describing snippets of code and explore potential barriers for reuse. As part of this thesis I conducted two studies into Jupyter notebooks use. In my first study, I mined GitHub repositories, quantifying and describing code duplicates contained within repositories that contained at least one Jupyter notebook. For my second study, I conducted an observational user study using a contextual inquiry, where my participants solved specific tasks using notebooks, while I observed and took notes. The work in this thesis can be categorized as exploratory, since both my studies were aimed at generating hypotheses for which further studies can build upon. My contributions with this thesis is two-fold: a thorough description of code duplicates contained within GitHub repositories and an exploration of the behaviour behind code reuse in Jupyter notebooks. It is my desire that others can build upon this work to provide new tools, addressing some of the issues outlined in this thesis. / Graduate Jupyter computational notebooks code duplication code clones code reuse data analysis data exploration exploratory programming
8	Frequent Subgraph Analysis and its Software Engineering Applications Henderson, Tim A. D. 06 September 2017 (has links) No description available. Computer Science
9	Analysis of cross-system porting and porting errors in software projects Ray, Baishakhi 11 November 2013 (has links) Software forking---creating a variant product by copying and modifying an existing project---is often considered an ad hoc, low cost alternative to principled product line development. To maintain forked projects, developers need to manually port existing features or bug-fixes from one project to another. Such manual porting is not only tedious but also error-prone. When the contexts of the ported code vary, developers often have to adapt the ported code to fit its surroundings. Faulty adaptations or inconsistent updates of the ported code could potentially introduce subtle inconsistencies in the codebase. To build a deeper understanding to cross-system porting and porting related errors, this dissertation investigates: (1) How can we identify ported code from software version histories? (2) What is the overhead of cross-system porting required to maintain forked projects? (3) What is the extent and characteristics of porting errors that occur in practice? and (4) How can we detect and characterize potential porting errors? As a first step towards assessing the overhead of cross-system porting, we implement REPERTOIRE, a tool to analyze repeated work of cross-system porting across peer projects. REPERTOIRE can detect ported edits between program patches with high accuracy of 94% precision and 84% recall. Using REPERTOIRE, we study the temporal, spatial, and developer dimensions of cross-system porting using 18 years of parallel evolution history of the BSD product family. Our study finds that cross-system porting happens periodically and the porting rate does not necessarily decrease over time. The upkeep work of porting changes from peer projects is significant and currently, porting practice seems to heavily depend on developers doing their porting job on time. Analyzing version histories of Linux and FreeBSD, we derive five categories of porting errors, including incorrect control- and data-flow, code redundancy, and inconsistent identifier and token renamings. Leveraging this categorization, we design a static control- and data-dependence analysis technique, SPA, to detect and characterize porting inconsistencies. SPA detects porting inconsistencies with 65% to 73% precision and 90% recall, and identify inconsistency types with 58% to 63% precision and 92% recall on average. In a comparison with two existing error detection tools, SPA outperforms them with 14% to 17% better precision. / text Software evolution Forking Porting Repetitive changes Code clones Static analysis Subgraph isomorphism Bug Error detection Copy-paste error

Search results