Global ETD Search

21	Abstract Syntax Tree Analysis for Plagiarism Detection / Analys av abstrakta syntaxträd för detektion av plagiat Nilsson, Erik January 2012 (has links) Today, universities rely heavily on systems for detecting plagiarism in students’essays and reports. Code submissions however require specific tools. A numberof approaches to finding plagiarisms in code have already been tried, includingtechniques based on comparing textual transformations of code, token strings,parse trees and graph representations. In this master’s thesis, a new system, cojac,is presented which combines textual, tree and graph techniques to detect a broadspectrum of plagiarism attempts. The system finds plagiarisms in C, C++ and Adasource files. This thesis discusses the method used for obtaining parse trees fromthe source code and the abstract syntax tree analysis. For comparison of syntaxtrees, we generate sets of fingerprints, digest forms of trees, which makes thecomparison algorithm more scalable. To evaluate the method, a set of benchmarkfiles have been constructed containing plagiarism scenarios which was analyzedboth by our system and Moss, another available system for plagiarism detection incode. The results show that our abstract syntax tree analysis can effectively detectplagiarisms such as changing the format of the code and renaming of identifiersand is at least as effective as Moss for detecting plagiarisms of these kinds abstract syntax tree plagiarism clone detection fingerprint Computer Sciences Datavetenskap (datalogi)
22	Code Clone Detection for Equivalence Assurance Ersson, Sara January 2020 (has links) To support multiple programming languages, the concept of offering applicationprogramming interfaces (APIs) in multiple programming languages hasbecome commonplace. However, this also brings the challenge of ensuringthat the APIs are equivalent regarding their interface. To achieve this, codeclone detection techniqueswere adapted to match similar function declarationsin the APIs. Firstly, existing code clone detection tools were investigated. Asthey did not perform well, a tree-based syntactic approach was used, where allheader files were compiled with Clang. The abstract syntax trees, which wereobtained during the compilation, were then traversed to locate the functiondeclaration nodes, and to store function names and parameter variable names.When matching the function names, a textual approach was used, transformingthe function names according to a set of implemented rules.A strict rule compares transformations of full function names in a preciseway, whereas a loose rule only compares transformations of parts of functionnames, and matches anything for the remainder. The rules were appliedboth by themselves, and in different combinations, starting with the strictestrule, followed by the second strictest rule, and so fourth.The best-matching rules showed to be the ones which are strict, and are notaffected by the order of the functions in which they are matched. These rulesshowed to be very robust to API evolution, meaning an increase in number ofpublic functions. Rules which are less strict and stable, and not robust to APIevolution, can still be used, such as matching functions on the first or last wordin the function names, but preferably as a complement to the stricter and morestable rules, when most of the functions already have been matched.The tool has been evaluated on the two APIs in King’s software developmentkit, and covered 94% of the 124 available function matches. / För att stödja flera olika programmingsspråk har det blivit alltmer vanligt atterbjuda applikationsprogrammeringsgränssnitt (API:er) på olika programmeringsspråk.Detta resulterar dock i utmaningen att säkerställa att API:erna ärekvivalenta angående deras gränssnitt. För att uppnå detta har kodklonsdetekteringsteknikeranpassats, för att matcha liknande funktionsdeklarationeri API:erna. Först undersöktes existerande kodklonsverktyg. Eftersom de intepresterade bra, användes ett trädbaserat syntaktiskt tillvägagångssätt, där allaheader-filer kompilerades med Clang. De abstrakta syntaxträden, som erhöllsunder kompileringen, traverserades sedan för att lokalisera funktionsdeklarationsnoderna,och för att lagra funktionsnamnen och parametervariabelnamnen.När funktionsnamnen matchades, användes ett textbaserat tillvägagångssätt,som omvandlade funktionsnamnen enligt en uppsättning implementeraderegler.En strikt regel jämför omvandlingar av hela funktionsnamn på ett exakt sätt,medan en lös regel bara jämför omvandlingar av delar of funktionsnamn, ochmatchar den resterande delen med vadsomhelst. Reglerna applicerades bådasjälva och i olika kombinationer, där den striktaste regeln applicerades först,följt av den näst strikaste, och så vidare.De regler som matchar bäst visade sig vara de som är striktast, och som intepåverkas av ordningen på funktionerna i vilken de matchas. Dessa reglervisade sig vara väldigt robusta mot API-evolution, dvs. ett ökat antal publikafunktioner i API:erna. Regler som är mindre strikta och stabila, och interobusta mot API-evolution kan fortfarande användas, men helst som ett komplementtill de striktare och mer stabila reglerna, när de flesta av funktionernaredan har blivit matchade.Verktyget har evaluerats på de två API:erna i Kings mjukvaruutvecklarkit, ochtäckte 94% av de tillgängliga funktionsmatchningarna. APIs Code Clone Detection API Mapping API:er kodklonsdetektering Elektroteknik och elektronik
23	Helping Developers Migrate their Code across Programming Languages Elarnaoty, Mohammed Elsayed 15 October 2024 (has links) Migrating source code from one programming language to another is a common task in software development. This migration can be done by completely rewriting the code in the target language, or it can be facilitated through code-reuse or automation techniques. This thesis explores both approaches. For code-reuse, two new cross-language code search techniques are proposed that enable developers to search for code in one language using code from another. These techniques address the limitations of existing methods in the context of code migration. The first technique leverages a Siamese network combined with Word2Vec embeddings, while the second employs transformers. For code automation, the concept of Translation Types is introduced to categorize code translations. An empirical study was conducted to analyze the differences between human-translated and machine-translated code. Based on these findings, two multi-output code translation techniques were developed that produce multiple translations aligned with the different styles that developers use when translating their code. The first tool employs a denoising autoencoder and a blueprint-guided beam search algorithm to generate translations of specific types. This algorithm mimics the translation operations that developers apply in similar software projects. The second tool utilizes GPT-4 with a specialized prompt to generate translations tailored to the requested types. In the evaluation, these approaches produced automated code translations that better aligned with developer preferences while maintaining correctness compared to existing methods. / Doctor of Philosophy / In the world of software development, it is often necessary to convert code written in one programming language into another. This process can be quite time-consuming, especially if developers have to rewrite everything from scratch. To make this task easier, this thesis explores two approaches: finding reusable code snippets in other languages and using automated tools to translate code. Firstly, this thesis presents two techniques that help developers search for similar code written in different programming languages. These techniques aim to accurately retrieve potential code snippets, ensuring that developers find what they need quickly, with the most relevant results appearing at the top of the list. The two techniques use machine learning models to understand and match code across languages. Additionally, this thesis explores ways to automate code translation by recognizing that different developers have their own style when translating code. A taxonomy of "Translation Types" is introduced to capture these differences. After studying how human and machine translations vary, two existing tools were adapted to generate translations. The first tool uses machine learning to create translations based on common developer patterns, while the second employs the powerful GPT-4 model to produce translations tailored to specific developer styles. Overall, the presented approaches in this thesis enable developers to convert code accurately and efficiently, reducing the time and effort needed for software migration. cross-language code migration clone detection code translation software engineering machine learning
24	Detecção interprocedimental de clones semânticos / Interprocedural semantic clone detection Felipe de Alencar Albuquerque 08 November 2013 (has links) Fragmentos de código duplicado, ou clones, são inseridos em aplicativos por serem uma maneira simples de reúso, dentre outros motivos. Clones são, portanto, comuns em programas. No entanto, a atividade de manutenção pode ficar custosa se o código do programa analisado possuir muitos clones, principalmente os semânticos, os quais podem possuir códigos distintos, mas realizam tarefas similares. Nesse sentido, a utilização de ferramentas que automatizam a tarefa de detectar clones é desejável. Ferramentas atuais de detecção de clones semânticos são capazes de identificar esses clones com altas taxas de acerto. No entanto, elas não são capazes de identificar clones semânticos considerando também os fluxos dos procedimentos ou funções que são invocados dentro dos fragmentos de código comparados. Essa limitação pode levar as ferramentas a indicarem clones semânticos falso positivos. Este trabalho apresenta uma técnica de detecção de clones semânticos que considera as chamadas de procedimentos presentes nos programas. Essa técnica apresentou uma taxa de acertos 2,5% maior do que técnicas convencionais de acordo com um benchmark, também desenvolvido neste trabalho. Esse benchmark foi criado com base nas classificações de clones fornecidas por programadores da indústria e da academia. A técnica interprocedimental de detecção de clones semânticos pode ser utilizada para evolução, manutenção, refatoração e entendimento de programas. / Fragments of duplicated code, or clones, are embedded in applications as they are a simple way to reuse code, among other reasons. Clones are therefore common in programs. However, the maintenance activity may be costly if the program code has many clones to analyze, specially semantic clones, which are semantically similar but may have different syntax. In this regard, the use of tools that automate the task of detecting clones is desirable. Current tools for detecting semantic clones are able to identify those clones with high hit rates. However, they are not able to detect semantic clones also considering the flow of procedures or functions that are invoked within the compared code fragments. This limitation can lead the tools to indicate false positive semantic clones. This paper presents a technique that takes into account the procedure calls in programs to detect semantic clones. This technique showed a 2.5% higher hit rate than conventional techniques according to a benchmark also developed in this work. This benchmark was created based on evaluations provided by programmers from academic and industrial settings. The interprocedural semantic clone detection technique can be used for evolution, maintenance, refactoring and understanding of programs. Clones semânticos Detecção de clones Duplicação de código Grafos de dependência de programas Clone detection Code duplication Program dependence graphs Semantic clones
25	Detecção interprocedimental de clones semânticos / Interprocedural semantic clone detection Albuquerque, Felipe de Alencar 08 November 2013 (has links) Fragmentos de código duplicado, ou clones, são inseridos em aplicativos por serem uma maneira simples de reúso, dentre outros motivos. Clones são, portanto, comuns em programas. No entanto, a atividade de manutenção pode ficar custosa se o código do programa analisado possuir muitos clones, principalmente os semânticos, os quais podem possuir códigos distintos, mas realizam tarefas similares. Nesse sentido, a utilização de ferramentas que automatizam a tarefa de detectar clones é desejável. Ferramentas atuais de detecção de clones semânticos são capazes de identificar esses clones com altas taxas de acerto. No entanto, elas não são capazes de identificar clones semânticos considerando também os fluxos dos procedimentos ou funções que são invocados dentro dos fragmentos de código comparados. Essa limitação pode levar as ferramentas a indicarem clones semânticos falso positivos. Este trabalho apresenta uma técnica de detecção de clones semânticos que considera as chamadas de procedimentos presentes nos programas. Essa técnica apresentou uma taxa de acertos 2,5% maior do que técnicas convencionais de acordo com um benchmark, também desenvolvido neste trabalho. Esse benchmark foi criado com base nas classificações de clones fornecidas por programadores da indústria e da academia. A técnica interprocedimental de detecção de clones semânticos pode ser utilizada para evolução, manutenção, refatoração e entendimento de programas. / Fragments of duplicated code, or clones, are embedded in applications as they are a simple way to reuse code, among other reasons. Clones are therefore common in programs. However, the maintenance activity may be costly if the program code has many clones to analyze, specially semantic clones, which are semantically similar but may have different syntax. In this regard, the use of tools that automate the task of detecting clones is desirable. Current tools for detecting semantic clones are able to identify those clones with high hit rates. However, they are not able to detect semantic clones also considering the flow of procedures or functions that are invoked within the compared code fragments. This limitation can lead the tools to indicate false positive semantic clones. This paper presents a technique that takes into account the procedure calls in programs to detect semantic clones. This technique showed a 2.5% higher hit rate than conventional techniques according to a benchmark also developed in this work. This benchmark was created based on evaluations provided by programmers from academic and industrial settings. The interprocedural semantic clone detection technique can be used for evolution, maintenance, refactoring and understanding of programs. Clone detection Clones semânticos Code duplication Detecção de clones Duplicação de código Grafos de dependência de programas Program dependence graphs Semantic clones
26	On Occurrence Of Plagiarism In Published Computer Science Thesis Reports At Swedish Universities Anbalagan, Sindhuja January 2010 (has links) In recent years, it has been observed that software clones and plagiarism are becoming an increased threat for one?s creativity. Clones are the results of copying and using other?s work. According to the Merriam – Webster dictionary, “A clone is one that appears to be a copy of an original form”. It is synonym to duplicate. Clones lead to redundancy of codes, but not all redundant code is a clone.On basis of this background knowledge ,in order to safeguard one?s idea and to avoid intentional code duplication for pretending other?s work as if their owns, software clone detection should be emphasized more. The objective of this paper is to review the methods for clone detection and to apply those methods for finding the extent of plagiarism occurrence among the Swedish Universities in Master level computer science department and to analyze the results.The rest part of the paper, discuss about software plagiarism detection which employs data analysis technique and then statistical analysis of the results.Plagiarism is an act of stealing and passing off the idea?s and words of another person?s as one?s own. Using data analysis technique, samples(Master level computer Science thesis report) were taken from various Swedish universities and processed in Ephorus anti plagiarism software detection. Ephorus gives the percentage of plagiarism for each thesis document, from this results statistical analysis were carried out using Minitab Software.The results gives a very low percentage of Plagiarism extent among the Swedish universities, which concludes that Plagiarism is not a threat to Sweden?s standard of education in computer science.This paper is based on data analysis, intelligence techniques, EPHORUS software plagiarism detection tool and MINITAB statistical software analysis. Software clones Software clones Detection Clone Detection Methodologies Plagiarism Software Plagiarism Detection Data Analysis Ephorus Minitab Statistical Analysis
27	Efficient Authentication, Node Clone Detection, and Secure Data Aggregation for Sensor Networks Li, Zhijun January 2010 (has links) Sensor networks are innovative wireless networks consisting of a large number of low-cost, resource-constrained sensor nodes that collect, process, and transmit data in a distributed and collaborative way. There are numerous applications for wireless sensor networks, and security is vital for many of them. However, sensor nodes suffer from many constraints, including low computation capability, small memory, limited energy resources, susceptibility to physical capture, and the lack of infrastructure, all of which impose formidable security challenges and call for innovative approaches. In this thesis, we present our research results on three important aspects of securing sensor networks: lightweight entity authentication, distributed node clone detection, and secure data aggregation. As the technical core of our lightweight authentication proposals, a special type of circulant matrix named circulant-P2 matrix is introduced. We prove the linear independence of matrix vectors, present efficient algorithms on matrix operations, and explore other important properties. By combining circulant-P2 matrix with the learning parity with noise problem, we develop two one-way authentication protocols: the innovative LCMQ protocol, which is provably secure against all probabilistic polynomial-time attacks and provides remarkable performance on almost all metrics except one mild requirement for the verifier's computational capacity, and the HB$^C$ protocol, which utilizes the conventional HB-like authentication structure to preserve the bit-operation only computation requirement for both participants and consumes less key storage than previous HB-like protocols without sacrificing other performance. Moreover, two enhancement mechanisms are provided to protect the HB-like protocols from known attacks and to improve performance. For both protocols, practical parameters for different security levels are recommended. In addition, we build a framework to extend enhanced HB-like protocols to mutual authentication in a communication-efficient fashion. Node clone attack, that is, the attempt by adversaries to add one or more nodes to the network by cloning captured nodes, imposes a severe threat to wireless sensor networks. To cope with it, we propose two distributed detection protocols with difference tradeoffs on network conditions and performance. The first one is based on distributed hash table, by which a fully decentralized, key-based caching and checking system is constructed to deterministically catch cloned nodes in general sensor networks. The protocol performance of efficient storage consumption and high security level is theoretically deducted through a probability model, and the resulting equations, with necessary adjustments for real application, are supported by the simulations. The other is the randomly directed exploration protocol, which presents notable communication performance and minimal storage consumption by an elegant probabilistic directed forwarding technique along with random initial direction and border determination. The extensive experimental results uphold the protocol design and show its efficiency on communication overhead and satisfactory detection probability. Data aggregation is an inherent requirement for many sensor network applications, but designing secure mechanisms for data aggregation is very challenging because the aggregation nature that requires intermediate nodes to process and change messages, and the security objective to prevent malicious manipulation, conflict with each other to a great extent. To fulfill different challenges of secure data aggregation, we present two types of approaches. The first is to provide cryptographic integrity mechanisms for general data aggregation. Based on recent developments of homomorphic primitives, we propose three integrity schemes: a concrete homomorphic MAC construction, homomorphic hash plus aggregate MAC, and homomorphic hash with identity-based aggregate signature, which provide different tradeoffs on security assumption, communication payload, and computation cost. The other is a substantial data aggregation scheme that is suitable for a specific and popular class of aggregation applications, embedded with built-in security techniques that effectively defeat outside and inside attacks. Its foundation is a new data structure---secure Bloom filter, which combines HMAC with Bloom filter. The secure Bloom filter is naturally compatible with aggregation and has reliable security properties. We systematically analyze the scheme's performance and run extensive simulations on different network scenarios for evaluation. The simulation results demonstrate that the scheme presents good performance on security, communication cost, and balance. wireless sensor networks security protocols efficient entity authentication node clone detection secure data aggregation homomorphic primitives Electrical and Computer Engineering
28	Efficient Authentication, Node Clone Detection, and Secure Data Aggregation for Sensor Networks Li, Zhijun January 2010 (has links) Sensor networks are innovative wireless networks consisting of a large number of low-cost, resource-constrained sensor nodes that collect, process, and transmit data in a distributed and collaborative way. There are numerous applications for wireless sensor networks, and security is vital for many of them. However, sensor nodes suffer from many constraints, including low computation capability, small memory, limited energy resources, susceptibility to physical capture, and the lack of infrastructure, all of which impose formidable security challenges and call for innovative approaches. In this thesis, we present our research results on three important aspects of securing sensor networks: lightweight entity authentication, distributed node clone detection, and secure data aggregation. As the technical core of our lightweight authentication proposals, a special type of circulant matrix named circulant-P2 matrix is introduced. We prove the linear independence of matrix vectors, present efficient algorithms on matrix operations, and explore other important properties. By combining circulant-P2 matrix with the learning parity with noise problem, we develop two one-way authentication protocols: the innovative LCMQ protocol, which is provably secure against all probabilistic polynomial-time attacks and provides remarkable performance on almost all metrics except one mild requirement for the verifier's computational capacity, and the HB$^C$ protocol, which utilizes the conventional HB-like authentication structure to preserve the bit-operation only computation requirement for both participants and consumes less key storage than previous HB-like protocols without sacrificing other performance. Moreover, two enhancement mechanisms are provided to protect the HB-like protocols from known attacks and to improve performance. For both protocols, practical parameters for different security levels are recommended. In addition, we build a framework to extend enhanced HB-like protocols to mutual authentication in a communication-efficient fashion. Node clone attack, that is, the attempt by adversaries to add one or more nodes to the network by cloning captured nodes, imposes a severe threat to wireless sensor networks. To cope with it, we propose two distributed detection protocols with difference tradeoffs on network conditions and performance. The first one is based on distributed hash table, by which a fully decentralized, key-based caching and checking system is constructed to deterministically catch cloned nodes in general sensor networks. The protocol performance of efficient storage consumption and high security level is theoretically deducted through a probability model, and the resulting equations, with necessary adjustments for real application, are supported by the simulations. The other is the randomly directed exploration protocol, which presents notable communication performance and minimal storage consumption by an elegant probabilistic directed forwarding technique along with random initial direction and border determination. The extensive experimental results uphold the protocol design and show its efficiency on communication overhead and satisfactory detection probability. Data aggregation is an inherent requirement for many sensor network applications, but designing secure mechanisms for data aggregation is very challenging because the aggregation nature that requires intermediate nodes to process and change messages, and the security objective to prevent malicious manipulation, conflict with each other to a great extent. To fulfill different challenges of secure data aggregation, we present two types of approaches. The first is to provide cryptographic integrity mechanisms for general data aggregation. Based on recent developments of homomorphic primitives, we propose three integrity schemes: a concrete homomorphic MAC construction, homomorphic hash plus aggregate MAC, and homomorphic hash with identity-based aggregate signature, which provide different tradeoffs on security assumption, communication payload, and computation cost. The other is a substantial data aggregation scheme that is suitable for a specific and popular class of aggregation applications, embedded with built-in security techniques that effectively defeat outside and inside attacks. Its foundation is a new data structure---secure Bloom filter, which combines HMAC with Bloom filter. The secure Bloom filter is naturally compatible with aggregation and has reliable security properties. We systematically analyze the scheme's performance and run extensive simulations on different network scenarios for evaluation. The simulation results demonstrate that the scheme presents good performance on security, communication cost, and balance. wireless sensor networks security protocols efficient entity authentication node clone detection secure data aggregation homomorphic primitives Electrical and Computer Engineering
29	Intelligent Simulink Modeling Assistance via Model Clones and Machine Learning Adhikari, Bhisma 26 July 2021 (has links) No description available. Computer Science Model Driven Engineering Intelligent Modeling Assistant Model Clone Detection Machine Learning Code Completion Model Completion
30	A Topic Modeling approach for Code Clone Detection Khan, Mohammed Salman 01 January 2019 (has links) In this thesis work, the potential benefits of Latent Dirichlet Allocation (LDA) as a technique for code clone detection has been described. The objective is to propose a language-independent, effective, and scalable approach for identifying similar code fragments in relatively large software systems. The main assumption is that the latent topic structure of software artifacts gives an indication of the presence of code clones. It can be hypothesized that artifacts with similar topic distributions contain duplicated code fragments and to prove this hypothesis, an experimental investigation using multiple datasets from various application domains were conducted. In addition, CloneTM, an LDA-based working prototype for code clone detection was developed. Results showed that, if calibrated properly, topic modeling can deliver a satisfactory performance in capturing different types of code clones, showing particularity good performance in detecting Type III clones. CloneTM also achieved levels of performance comparable to already existing practical tools that adopt different clone detection strategies. Thesis University of North Florida UNF code clone clone detection topic modeling machine learning software refactoring software engineering Latent Dirichlet Allocation -- Testing Topic models -- Testing Generative statistical models -- Testing Code clone detection -- Software CloneTM -- Testing Computer and Systems Architecture Software Engineering

Search results