Global ETD Search

1	AN EMPIRICAL STUDY FOR THE IMPACT OF MAINTENANCE ACTIVITIES IN CLONE EVOLUTION MARKS, LIONEL 26 November 2009 (has links) Code clones are duplicated code fragments that are copied to re-use functionality and speed up development. However, due to the duplicate nature of code clones, inconsistent updates can lead to bugs in the software system. Existing research investigates the inconsistent updates through analysis of the updates to code clones and the bug fixes used to fix the inconsistent updates. We extend the work by investigating other factors that affect clone evolution, such as the number of developers. On two levels of analysis, the method and clone class level, we conduct an empirical study on clone evolution. We analyze the factors affecting bug fixes and co-change (i.e. update cloned methods at the same time) using our new metrics. Our metrics are related to the developers, code complexity, and stages of development. We use these metrics to find ways to improve the maintenance of cloned code. We discover that one way to improve maintenance of code clones is the decrease of code complexity. We find that increased code complexity leads to a decrease in co-change, which can lead to bugs in the software. We perform our study on 6 applications. To maximize the number of clones detected, we use two existing code clone detection tools: SimScan and Simian. SimScan was used to find clones in 5 of the applications due to its versatility in finding code clones. Simian was used to detect clones due to its reliability to find code clones regardless of language or compilation problems. To analyze and determine the significance of the metrics, we use the R Statistical Toolkit. / Thesis (Master, Computing) -- Queen's University, 2009-11-25 14:18:05.884 Clone Detection Clone Evolution
2	DETECTING PDF JAVASCRIPT MALWARE USING CLONE DETECTION Karademir, SARUHAN 02 October 2013 (has links) One common vector of malware is JavaScript in Adobe Acrobat (PDF) files. In this thesis, we investigate using near-miss clone detectors to find this malware. We start by collecting a set of PDF files containing JavaScript malware and a set with clean JavaScript from the VirusTotal repository. We use the NiCad clone detector to find the classes of clones in a small subset of the malicious PDF files. We evaluate how clone classes can be used to find similar malicious files in the rest of the malicious collection while avoiding files in the benign collection. Our results show that a 10% subset training set produced 75% detection of previously known malware with 0% false positives. We also used the NiCad as a pattern matcher for reflexive calls common in JavaScript malware. Our results show a 57% detection of malicious collection with no false positives. When the two experiments’ results are combined, the total coverage of malware rises to 85% and maintains 100% precision. The results are heavily affected by the third-party PDF to JavaScript extractor used. When only successfully extracted PDFs are considered, recall increases to 99% and precision remains at 100%. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2013-09-30 11:50:15.156 nicad clone detection PDF security Acrobat malware
3	An Approach to Clone Detection in Behavioral Models ANTONY, ELIZABETH 04 March 2014 (has links) In this thesis, we present an approach for identifying near-miss interaction clones in reverse-engineered UML behavioural models. Our goal is to identify patterns of interaction ("conversations") that can be used to characterize and abstract the run-time behaviour of web applications and other interactive systems. In order to leverage robust near-miss code clone technology, our approach is text-based, working on the level of XMI, the standard interchange serialization for UML. Behavioural model clone detection presents several challenges - first, it is not clear how to break a continuous stream of interaction between lifelines (lifelines represent the objects or actors in the system) into meaningful conversational units. Second, unlike programming languages, the XMI text representation for UML is highly non-local, using attributes to reference information in the model file remotely. In this work we use a set of contextualizing source transformations on the XMI text representation to reveal the hidden hierarchical structure of the model and granularize behavioural interactions into conversational units. Then we adapt NiCad, a near-miss code clone detection tool, to help us identify conversational clones in reverse-engineered behavioural models. These conversational clones are then analysed to find worrisome patterns of security access violations. / Thesis (Master, Computing) -- Queen's University, 2014-03-03 19:36:25.776 access violations clone detection, behavioral models
4	Detection and Analysis of \\ Detection and Analysis of Near-Miss Software Clones Roy, CHANCHAL 31 August 2009 (has links) Software clones are considered harmful in software maintenance and evolution. However, despite a decade of active research, there is a marked lack of work in the detection and analysis of near-miss software clones, those where minor to extensive modifications have been made to the copied fragments. In this thesis, we advance the state-of-the-art in clone detection and analysis in several ways. First, we develop a hybrid clone detection method, called NICAD, that can detect both exact and near-miss clones with high precision and recall and with reasonable performance. Second, in order to address the decade of vagueness in clone definition, we propose an editing taxonomy for clone creation that models developers' editing activities in the copy/pasted code in a top-down fashion. NICAD is designed to address the different types of clones in the editing taxonomy. Third, we have conducted a scenario-based qualitative comparison and evaluation of all of the currently available clone detection techniques and tools in the context of a unified conceptual framework. Using the results of this study one can more easily choose the right tools to meet the requirements and constraints of any particular application, and can identify opportunities for hybridizing different techniques. The hybrid architecture of NICAD was derived from this study. Fourth, in order to evaluate and compare the available tools in a realistic setting and to avoid the challenges and huge manual effort in validating candidate clones, we have developed a mutation-based framework that automatically and efficiently measures (and compares) the recall and precision of clone detection tools for different fine-grained clone types of the proposed editing taxonomy. We have evaluated NICAD using this framework and found that it is capable of detecting different types of clones with high precision and recall. Finally, we have conducted a large scale empirical study of cloning in open source systems, both to evaluate NICAD and to study the cloning characteristics of these systems in several different dimensions. The study has demonstrated that NICAD is capable of accurately finding both exact and near-miss function clones even in large systems and different languages, and that there seem to be a large number of clones in those systems. / Thesis (Ph.D, Computing) -- Queen's University, 2009-08-31 14:05:30.233 Software Clone Clone Detection and Analysis Software Maintenance
5	Detection and Analysis of \\ Detection and Analysis of Near-Miss Software Clones Roy, CHANCHAL 31 August 2009 (has links) Software clones are considered harmful in software maintenance and evolution. However, despite a decade of active research, there is a marked lack of work in the detection and analysis of near-miss software clones, those where minor to extensive modifications have been made to the copied fragments. In this thesis, we advance the state-of-the-art in clone detection and analysis in several ways. First, we develop a hybrid clone detection method, called NICAD, that can detect both exact and near-miss clones with high precision and recall and with reasonable performance. Second, in order to address the decade of vagueness in clone definition, we propose an editing taxonomy for clone creation that models developers' editing activities in the copy/pasted code in a top-down fashion. NICAD is designed to address the different types of clones in the editing taxonomy. Third, we have conducted a scenario-based qualitative comparison and evaluation of all of the currently available clone detection techniques and tools in the context of a unified conceptual framework. Using the results of this study one can more easily choose the right tools to meet the requirements and constraints of any particular application, and can identify opportunities for hybridizing different techniques. The hybrid architecture of NICAD was derived from this study. Fourth, in order to evaluate and compare the available tools in a realistic setting and to avoid the challenges and huge manual effort in validating candidate clones, we have developed a mutation-based framework that automatically and efficiently measures (and compares) the recall and precision of clone detection tools for different fine-grained clone types of the proposed editing taxonomy. We have evaluated NICAD using this framework and found that it is capable of detecting different types of clones with high precision and recall. Finally, we have conducted a large scale empirical study of cloning in open source systems, both to evaluate NICAD and to study the cloning characteristics of these systems in several different dimensions. The study has demonstrated that NICAD is capable of accurately finding both exact and near-miss function clones even in large systems and different languages, and that there seem to be a large number of clones in those systems. / Thesis (Ph.D, Computing) -- Queen's University, 2009-08-31 14:05:30.233 Software Clone Clone Detection and Analysis Software Maintenance
6	Towards Web Service Tagging By Similarity Detection Martin, Douglas 04 October 2011 (has links) The web of the future will require automated tagging of equivalent or similar services in support of service discovery and the selection of appropriate alternatives in case of failure. Code similarity detection tools, or clone detectors, provide a mature and scalable method of identifying these kinds of similarities and can be used to assist in this problem. However, they require a set of units to be compared; something to which the most popular description language, WSDL (Web Service Description Language), does not lend itself. First, each WSDL description can contain more than one operation description, which does not provide the granularity we need to compare services on the operation level. Secondly, these operation descriptions are mixed together throughout the file, often sharing some common elements. This thesis describes a technique for extracting the elements of each operation description and consolidating them into a self-contained unit using TXL, a source transformation language. These units, referred to as Web Service Cells or WSCells (pronounced “wizzles”), can then be used by similarity detectors to search for similarities. We describe a modified architecture to the NICAD clone detector to support the creation of WSCells, and the implementation of a special WSDL extractor we used to emulate this modification in its absence. / Thesis (Master, Computing) -- Queen's University, 2011-10-04 09:33:36.932 computer science clone detection web services
7	NeCO: Ontology Alignment using Near-miss Clone Detection Geesaman, Paul Louis 29 January 2014 (has links) The Semantic Web is an endeavour to enhance the web with the ability to represent knowledge. The knowledge is expressed through what are called ontologies. In order to make ontologies useful, it is important to be able to match the knowledge represented in different ontologies. This task is commonly known as ontology alignment. Ontology alignment has been studied, but it remains an open problem with an annual competition dedicated to measure alignment tools' performance. Many alignment tools are computationally heavy, require training, or are useful in a specific field of study. We propose an ontology alignment method, NeCO, that builds on clone detection techniques to align ontologies. NeCO inherits the clone detection features, and it is light-weight, does not require training, and is useful for any ontology. / Thesis (Master, Computing) -- Queen's University, 2014-01-29 14:38:52.873 Ontology Clone Detection Near-miss Alignment
8	How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects? Chen, Juntong 09 September 2022 (has links) StackOverflow (SO) is a widely used question-and-answer (QandA) website for software developers and computer scientists. GitHub is a code hosting platform for collaboration and version control. Popular software libraries are open-source and published in repositories on GitHub. Preliminary observation shows developers cite SO questions in their GitHub repository. This observation inspired us to explore the relationship between SO posts and GitHub repositories; to help software developers better understand the characterization of SO answers that are reused by GitHub projects. For this study, we conducted an empirical study to investigate the SO answers reused by Java code from public GitHub projects. We used a hybrid approach to ensure precise results: code clone detection, keyword-based search, and manual inspection. This approach helped us identify the leveraged answers from developers. Based on the identified answers, we further investigated the topics of the discussion threads; answer characteristics (e.g., scores, ages, code lengths, and text lengths) and developers' reuse practices. We observed both reused and unused answers. Compared with unused answers, We found that the reused answers mostly have higher scores, longer code, and longer plain text explanations. Most reused answers were related to implementing specific coding tasks. In one of our observations, 9% (40/430) of scenarios, developers entirely copied code from one or multiple answers of an SO discussion thread. Furthermore, we observed that in the other 91% (390/430) of scenarios, developers only partially reused code or created brand new code from scratch. We investigated 130 SO discussion threads referred to by Java developers in 356 GitHub projects. We then arranged those into five different categories. Our findings can help the SO community have a better distribution of programming knowledge and skills, as well as inspire future research related to SO and GitHub. / Master of Science / StackOverflow (SO) is a widely used question-and-answer (QandA) website for software developers and computer scientists. GitHub is a code hosting platform for collaboration and version control. Popular software libraries are open-source and published in repositories on GitHub. Preliminary observation shows developers cite SO questions in their GitHub repository. This observation inspired us to explore the relationship between SO posts and GitHub repositories; to help software developers better understand the characterization of SO answers that are reused by GitHub projects. Our objectives are to guide SO answerers to help developers better; help tool builders understand how SO answers shape software products. Thus, we conducted an empirical study to investigate the SO answers reused by Java code from public GitHub projects. We used a hybrid approach to refine our dataset and to ensure precise results. Our hybrid approach includes three steps. The first step is code clone detection. We compared two code snippets with a code clone detection tool to find the similarity. The second step is a keyword-based search. We created multiple keywords to search within GitHub code to find the referenced answers missed by step one. Lastly, we manually inspected the outputs of both step one and two to ensure zero false positives in our data. This approach helped us identify the leveraged answers from developers. Based on the identified answers, we further investigated the topics of the discussion threads, answer characteristics, and developers' reuse practices. We observed both reused and unused answers. Compared with unused answers, We found that the reused answers mostly have higher scores, longer code, and longer plain text explanations. Most reused answers were related to implementing specific coding tasks. In one of our observations, 9% of scenarios, developers entirely copied code from one or multiple answers of an SO discussion thread. Furthermore, we observed that in the other 91% of scenarios, developers only partially reused code or created brand new code from scratch. Our findings can help the SO community have a better distribution of programming knowledge and skills, as well as inspire future research related to SO and GitHub. Empirical StackOverflow GitHub answer reuse clone detection
9	Detecting Semantic Method Clones In Java Code Using Method Ioe-behavior Elva, Rochelle 01 January 2013 (has links) The determination of semantic equivalence is an undecidable problem; however, this dissertation shows that a reasonable approximation can be obtained using a combination of static and dynamic analysis. This study investigates the detection of functional duplicates, referred to as semantic method clones (SMCs), in Java code. My algorithm extends the input-output notion of observable behavior, used in related work [1, 2], to include the effects of the method. The latter property refers to the persistent changes to the heap, brought about by the execution of the method. To differentiate this from the typical input-output behavior used by other researchers, I have coined the term method IOE-Behavior; which means its input-output and effects behavior [3]. Two methods are defined as semantic method clones, if they have identical IOE-Behavior; that is, for the same inputs (actual parameters and initial heap state), they produce the same output (that is result- for non-void methods, an final heap state). The detection process consists of two static pre-filters used to identify candidate clone sets. This is followed by dynamic tests that actually run the candidate methods, to determine semantic equivalence. The first filter groups the methods by type. The second filter refines the output of the first, grouping methods by their effects. This algorithm is implemented in my tool JSCTracker, used to automate the SMC detection process. The algorithm and tool are validated using a case study comprising of 12 open source Java projects, from different application domains and ranging in size from 2 KLOC (thousand lines of code) to 300 KLOC. The objectives of the case study are posed as 4 research questions: 1. Can method IOE-Behavior be used in SMC detection? 2. What is the impact of the use of the pre-filters on the efficiency of the algorithm? 3. How does the performance of method IOE-Behavior compare to using only inputoutput for identifying SMCs? 4. How reliable are the results obtained when method IOE-Behavior is used in SMC detection? Responses to these questions are obtained by checking each software sample with JSCTracker and analyzing the results. The number of SMCs detected range from 0-45 with an average execution time of 8.5 seconds. The use of the two pre-filters reduces the number of methods that reach the dynamic test phase, by an average of 34%. The IOE-Behavior approach takes an average of 0.010 seconds per method while the input-output approach takes an average of 0.015 seconds. The former also identifies an average of 32% false positives, while the SMCs identified using input-output, have an average of 92% false positives. In terms of reliability, the IOE-Behavior method produces results with precision values of an average of 68% and recall value of 76% on average. These reliability values represent an improvement of over 37% (for precision) and 30% (for recall) of the values in related work [4, 5]. Thus, it is my conclusion that IOE-Behavior can be used to detect SMCs in Java code with reasonable reliability. Semantic clone detection semantic method clones method ioe behavior clone detection program analysis Computer Sciences Engineering
10	Stable marriage problem based adaptation for clone detection and service selection Al Hakami, Hosam Hasan January 2015 (has links) Current software engineering topics such as clone detection and service selection need to improve the capability of detection process and selection process. The clone detection is the process of finding duplicated code through the system for several purposes such as removal of repeated portions as maintenance part of legacy system. Service selection is the process of finding the appropriate web service which meets the consumer’s request. Both problems can be converted into a matching problem. Matching process forms an essential part of software engineering activities. In this research, a well-known mathematical algorithm Stable Marriage Problem (SMP) and its variations are investigated to fulfil the purposes of matching processes in software engineering area. We aim to provide a competitive matching algorithm that can help to detect cloned software accurately and ensure high scalability, precision and recall. We also aim to apply matching algorithm on incoming request and service profile to deal with the web service as a clever independent object so that we can allow the services to accept or decline requests (equal opportunity) rather than the current state of service selection (search-based), in which service lacks of interacting as an independent candidate. In order to meet the above aims, the traditional SMP algorithm has been extended to achieve the cardinality of many-to-many. This adaptation is achieved by defining the selective strategy which is the main engine of the new adaptations. Two adaptations, Dual-Proposed and Dual-Multi-Allocation, have been proposed to both service selection and clone detection process. The proposed approach (SMP-based) shows very competitive results compare to existing software clone approaches, especially in identifying type 3 (copy with further modifications such update, add and delete statements) of cloned software. It performs the detection process with a relatively high precision and recall compare to the CloneDR tool and shows good scalability on a middle sized program. For service selection, the proposed approach has several advantages such as service protection and service quality. The services gain equal opportunity against the incoming requests. Therefore, the intelligent service interaction is achieved, and both stability and satisfaction of the candidates are ensured. This dissertation contributes to several contributions firstly, the new extended SMP algorithm by introducing selective strategy to accommodate many-to-many matching problems, to improve overall features. Secondly, a new SMP-based clone detection approach to detect cloned software accurately and ensures high precision and recall. Ultimately, a new SMPbased service selection approach allows equal opportunity between services and requests. This led to improve service protection and service quality. Case studies are carried out for experiments with the proposed approach, which show that the new adaptations can be applied effectively to clone detection and service selection processes with several features (e.g. accuracy). It can be concluded that the match based approach is feasible and promising in software engineering domain. 005.1

Search results