• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 23
  • 2
  • 1
  • 1
  • Tagged with
  • 30
  • 30
  • 12
  • 12
  • 11
  • 8
  • 7
  • 7
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Abstract Syntax Tree Analysis for Plagiarism Detection / Analys av abstrakta syntaxträd för detektion av plagiat

Nilsson, Erik January 2012 (has links)
Today, universities rely heavily on systems for detecting plagiarism in students’essays and reports. Code submissions however require specific tools. A numberof approaches to finding plagiarisms in code have already been tried, includingtechniques based on comparing textual transformations of code, token strings,parse trees and graph representations. In this master’s thesis, a new system, cojac,is presented which combines textual, tree and graph techniques to detect a broadspectrum of plagiarism attempts. The system finds plagiarisms in C, C++ and Adasource files. This thesis discusses the method used for obtaining parse trees fromthe source code and the abstract syntax tree analysis. For comparison of syntaxtrees, we generate sets of fingerprints, digest forms of trees, which makes thecomparison algorithm more scalable. To evaluate the method, a set of benchmarkfiles have been constructed containing plagiarism scenarios which was analyzedboth by our system and Moss, another available system for plagiarism detection incode. The results show that our abstract syntax tree analysis can effectively detectplagiarisms such as changing the format of the code and renaming of identifiersand is at least as effective as Moss for detecting plagiarisms of these kinds
22

Code Clone Detection for Equivalence Assurance

Ersson, Sara January 2020 (has links)
To support multiple programming languages, the concept of offering applicationprogramming interfaces (APIs) in multiple programming languages hasbecome commonplace. However, this also brings the challenge of ensuringthat the APIs are equivalent regarding their interface. To achieve this, codeclone detection techniqueswere adapted to match similar function declarationsin the APIs. Firstly, existing code clone detection tools were investigated. Asthey did not perform well, a tree-based syntactic approach was used, where allheader files were compiled with Clang. The abstract syntax trees, which wereobtained during the compilation, were then traversed to locate the functiondeclaration nodes, and to store function names and parameter variable names.When matching the function names, a textual approach was used, transformingthe function names according to a set of implemented rules.A strict rule compares transformations of full function names in a preciseway, whereas a loose rule only compares transformations of parts of functionnames, and matches anything for the remainder. The rules were appliedboth by themselves, and in different combinations, starting with the strictestrule, followed by the second strictest rule, and so fourth.The best-matching rules showed to be the ones which are strict, and are notaffected by the order of the functions in which they are matched. These rulesshowed to be very robust to API evolution, meaning an increase in number ofpublic functions. Rules which are less strict and stable, and not robust to APIevolution, can still be used, such as matching functions on the first or last wordin the function names, but preferably as a complement to the stricter and morestable rules, when most of the functions already have been matched.The tool has been evaluated on the two APIs in King’s software developmentkit, and covered 94% of the 124 available function matches. / För att stödja flera olika programmingsspråk har det blivit alltmer vanligt atterbjuda applikationsprogrammeringsgränssnitt (API:er) på olika programmeringsspråk.Detta resulterar dock i utmaningen att säkerställa att API:erna ärekvivalenta angående deras gränssnitt. För att uppnå detta har kodklonsdetekteringsteknikeranpassats, för att matcha liknande funktionsdeklarationeri API:erna. Först undersöktes existerande kodklonsverktyg. Eftersom de intepresterade bra, användes ett trädbaserat syntaktiskt tillvägagångssätt, där allaheader-filer kompilerades med Clang. De abstrakta syntaxträden, som erhöllsunder kompileringen, traverserades sedan för att lokalisera funktionsdeklarationsnoderna,och för att lagra funktionsnamnen och parametervariabelnamnen.När funktionsnamnen matchades, användes ett textbaserat tillvägagångssätt,som omvandlade funktionsnamnen enligt en uppsättning implementeraderegler.En strikt regel jämför omvandlingar av hela funktionsnamn på ett exakt sätt,medan en lös regel bara jämför omvandlingar av delar of funktionsnamn, ochmatchar den resterande delen med vadsomhelst. Reglerna applicerades bådasjälva och i olika kombinationer, där den striktaste regeln applicerades först,följt av den näst strikaste, och så vidare.De regler som matchar bäst visade sig vara de som är striktast, och som intepåverkas av ordningen på funktionerna i vilken de matchas. Dessa reglervisade sig vara väldigt robusta mot API-evolution, dvs. ett ökat antal publikafunktioner i API:erna. Regler som är mindre strikta och stabila, och interobusta mot API-evolution kan fortfarande användas, men helst som ett komplementtill de striktare och mer stabila reglerna, när de flesta av funktionernaredan har blivit matchade.Verktyget har evaluerats på de två API:erna i Kings mjukvaruutvecklarkit, ochtäckte 94% av de tillgängliga funktionsmatchningarna.
23

Detecção interprocedimental de clones semânticos / Interprocedural semantic clone detection

Felipe de Alencar Albuquerque 08 November 2013 (has links)
Fragmentos de código duplicado, ou clones, são inseridos em aplicativos por serem uma maneira simples de reúso, dentre outros motivos. Clones são, portanto, comuns em programas. No entanto, a atividade de manutenção pode ficar custosa se o código do programa analisado possuir muitos clones, principalmente os semânticos, os quais podem possuir códigos distintos, mas realizam tarefas similares. Nesse sentido, a utilização de ferramentas que automatizam a tarefa de detectar clones é desejável. Ferramentas atuais de detecção de clones semânticos são capazes de identificar esses clones com altas taxas de acerto. No entanto, elas não são capazes de identificar clones semânticos considerando também os fluxos dos procedimentos ou funções que são invocados dentro dos fragmentos de código comparados. Essa limitação pode levar as ferramentas a indicarem clones semânticos falso positivos. Este trabalho apresenta uma técnica de detecção de clones semânticos que considera as chamadas de procedimentos presentes nos programas. Essa técnica apresentou uma taxa de acertos 2,5% maior do que técnicas convencionais de acordo com um benchmark, também desenvolvido neste trabalho. Esse benchmark foi criado com base nas classificações de clones fornecidas por programadores da indústria e da academia. A técnica interprocedimental de detecção de clones semânticos pode ser utilizada para evolução, manutenção, refatoração e entendimento de programas. / Fragments of duplicated code, or clones, are embedded in applications as they are a simple way to reuse code, among other reasons. Clones are therefore common in programs. However, the maintenance activity may be costly if the program code has many clones to analyze, specially semantic clones, which are semantically similar but may have different syntax. In this regard, the use of tools that automate the task of detecting clones is desirable. Current tools for detecting semantic clones are able to identify those clones with high hit rates. However, they are not able to detect semantic clones also considering the flow of procedures or functions that are invoked within the compared code fragments. This limitation can lead the tools to indicate false positive semantic clones. This paper presents a technique that takes into account the procedure calls in programs to detect semantic clones. This technique showed a 2.5% higher hit rate than conventional techniques according to a benchmark also developed in this work. This benchmark was created based on evaluations provided by programmers from academic and industrial settings. The interprocedural semantic clone detection technique can be used for evolution, maintenance, refactoring and understanding of programs.
24

Detecção interprocedimental de clones semânticos / Interprocedural semantic clone detection

Albuquerque, Felipe de Alencar 08 November 2013 (has links)
Fragmentos de código duplicado, ou clones, são inseridos em aplicativos por serem uma maneira simples de reúso, dentre outros motivos. Clones são, portanto, comuns em programas. No entanto, a atividade de manutenção pode ficar custosa se o código do programa analisado possuir muitos clones, principalmente os semânticos, os quais podem possuir códigos distintos, mas realizam tarefas similares. Nesse sentido, a utilização de ferramentas que automatizam a tarefa de detectar clones é desejável. Ferramentas atuais de detecção de clones semânticos são capazes de identificar esses clones com altas taxas de acerto. No entanto, elas não são capazes de identificar clones semânticos considerando também os fluxos dos procedimentos ou funções que são invocados dentro dos fragmentos de código comparados. Essa limitação pode levar as ferramentas a indicarem clones semânticos falso positivos. Este trabalho apresenta uma técnica de detecção de clones semânticos que considera as chamadas de procedimentos presentes nos programas. Essa técnica apresentou uma taxa de acertos 2,5% maior do que técnicas convencionais de acordo com um benchmark, também desenvolvido neste trabalho. Esse benchmark foi criado com base nas classificações de clones fornecidas por programadores da indústria e da academia. A técnica interprocedimental de detecção de clones semânticos pode ser utilizada para evolução, manutenção, refatoração e entendimento de programas. / Fragments of duplicated code, or clones, are embedded in applications as they are a simple way to reuse code, among other reasons. Clones are therefore common in programs. However, the maintenance activity may be costly if the program code has many clones to analyze, specially semantic clones, which are semantically similar but may have different syntax. In this regard, the use of tools that automate the task of detecting clones is desirable. Current tools for detecting semantic clones are able to identify those clones with high hit rates. However, they are not able to detect semantic clones also considering the flow of procedures or functions that are invoked within the compared code fragments. This limitation can lead the tools to indicate false positive semantic clones. This paper presents a technique that takes into account the procedure calls in programs to detect semantic clones. This technique showed a 2.5% higher hit rate than conventional techniques according to a benchmark also developed in this work. This benchmark was created based on evaluations provided by programmers from academic and industrial settings. The interprocedural semantic clone detection technique can be used for evolution, maintenance, refactoring and understanding of programs.
25

On Occurrence Of Plagiarism In Published Computer Science Thesis Reports At Swedish Universities

Anbalagan, Sindhuja January 2010 (has links)
In recent years, it has been observed that software clones and plagiarism are becoming an increased threat for one?s creativity. Clones are the results of copying and using other?s work. According to the Merriam – Webster dictionary, “A clone is one that appears to be a copy of an original form”. It is synonym to duplicate. Clones lead to redundancy of codes, but not all redundant code is a clone.On basis of this background knowledge ,in order to safeguard one?s idea and to avoid intentional code duplication for pretending other?s work as if their owns, software clone detection should be emphasized more. The objective of this paper is to review the methods for clone detection and to apply those methods for finding the extent of plagiarism occurrence among the Swedish Universities in Master level computer science department and to analyze the results.The rest part of the paper, discuss about software plagiarism detection which employs data analysis technique and then statistical analysis of the results.Plagiarism is an act of stealing and passing off the idea?s and words of another person?s as one?s own. Using data analysis technique, samples(Master level computer Science thesis report) were taken from various Swedish universities and processed in Ephorus anti plagiarism software detection. Ephorus gives the percentage of plagiarism for each thesis document, from this results statistical analysis were carried out using Minitab Software.The results gives a very low percentage of Plagiarism extent among the Swedish universities, which concludes that Plagiarism is not a threat to Sweden?s standard of education in computer science.This paper is based on data analysis, intelligence techniques, EPHORUS software plagiarism detection tool and MINITAB statistical software analysis.
26

Efficient Authentication, Node Clone Detection, and Secure Data Aggregation for Sensor Networks

Li, Zhijun January 2010 (has links)
Sensor networks are innovative wireless networks consisting of a large number of low-cost, resource-constrained sensor nodes that collect, process, and transmit data in a distributed and collaborative way. There are numerous applications for wireless sensor networks, and security is vital for many of them. However, sensor nodes suffer from many constraints, including low computation capability, small memory, limited energy resources, susceptibility to physical capture, and the lack of infrastructure, all of which impose formidable security challenges and call for innovative approaches. In this thesis, we present our research results on three important aspects of securing sensor networks: lightweight entity authentication, distributed node clone detection, and secure data aggregation. As the technical core of our lightweight authentication proposals, a special type of circulant matrix named circulant-P2 matrix is introduced. We prove the linear independence of matrix vectors, present efficient algorithms on matrix operations, and explore other important properties. By combining circulant-P2 matrix with the learning parity with noise problem, we develop two one-way authentication protocols: the innovative LCMQ protocol, which is provably secure against all probabilistic polynomial-time attacks and provides remarkable performance on almost all metrics except one mild requirement for the verifier's computational capacity, and the HB$^C$ protocol, which utilizes the conventional HB-like authentication structure to preserve the bit-operation only computation requirement for both participants and consumes less key storage than previous HB-like protocols without sacrificing other performance. Moreover, two enhancement mechanisms are provided to protect the HB-like protocols from known attacks and to improve performance. For both protocols, practical parameters for different security levels are recommended. In addition, we build a framework to extend enhanced HB-like protocols to mutual authentication in a communication-efficient fashion. Node clone attack, that is, the attempt by adversaries to add one or more nodes to the network by cloning captured nodes, imposes a severe threat to wireless sensor networks. To cope with it, we propose two distributed detection protocols with difference tradeoffs on network conditions and performance. The first one is based on distributed hash table, by which a fully decentralized, key-based caching and checking system is constructed to deterministically catch cloned nodes in general sensor networks. The protocol performance of efficient storage consumption and high security level is theoretically deducted through a probability model, and the resulting equations, with necessary adjustments for real application, are supported by the simulations. The other is the randomly directed exploration protocol, which presents notable communication performance and minimal storage consumption by an elegant probabilistic directed forwarding technique along with random initial direction and border determination. The extensive experimental results uphold the protocol design and show its efficiency on communication overhead and satisfactory detection probability. Data aggregation is an inherent requirement for many sensor network applications, but designing secure mechanisms for data aggregation is very challenging because the aggregation nature that requires intermediate nodes to process and change messages, and the security objective to prevent malicious manipulation, conflict with each other to a great extent. To fulfill different challenges of secure data aggregation, we present two types of approaches. The first is to provide cryptographic integrity mechanisms for general data aggregation. Based on recent developments of homomorphic primitives, we propose three integrity schemes: a concrete homomorphic MAC construction, homomorphic hash plus aggregate MAC, and homomorphic hash with identity-based aggregate signature, which provide different tradeoffs on security assumption, communication payload, and computation cost. The other is a substantial data aggregation scheme that is suitable for a specific and popular class of aggregation applications, embedded with built-in security techniques that effectively defeat outside and inside attacks. Its foundation is a new data structure---secure Bloom filter, which combines HMAC with Bloom filter. The secure Bloom filter is naturally compatible with aggregation and has reliable security properties. We systematically analyze the scheme's performance and run extensive simulations on different network scenarios for evaluation. The simulation results demonstrate that the scheme presents good performance on security, communication cost, and balance.
27

Efficient Authentication, Node Clone Detection, and Secure Data Aggregation for Sensor Networks

Li, Zhijun January 2010 (has links)
Sensor networks are innovative wireless networks consisting of a large number of low-cost, resource-constrained sensor nodes that collect, process, and transmit data in a distributed and collaborative way. There are numerous applications for wireless sensor networks, and security is vital for many of them. However, sensor nodes suffer from many constraints, including low computation capability, small memory, limited energy resources, susceptibility to physical capture, and the lack of infrastructure, all of which impose formidable security challenges and call for innovative approaches. In this thesis, we present our research results on three important aspects of securing sensor networks: lightweight entity authentication, distributed node clone detection, and secure data aggregation. As the technical core of our lightweight authentication proposals, a special type of circulant matrix named circulant-P2 matrix is introduced. We prove the linear independence of matrix vectors, present efficient algorithms on matrix operations, and explore other important properties. By combining circulant-P2 matrix with the learning parity with noise problem, we develop two one-way authentication protocols: the innovative LCMQ protocol, which is provably secure against all probabilistic polynomial-time attacks and provides remarkable performance on almost all metrics except one mild requirement for the verifier's computational capacity, and the HB$^C$ protocol, which utilizes the conventional HB-like authentication structure to preserve the bit-operation only computation requirement for both participants and consumes less key storage than previous HB-like protocols without sacrificing other performance. Moreover, two enhancement mechanisms are provided to protect the HB-like protocols from known attacks and to improve performance. For both protocols, practical parameters for different security levels are recommended. In addition, we build a framework to extend enhanced HB-like protocols to mutual authentication in a communication-efficient fashion. Node clone attack, that is, the attempt by adversaries to add one or more nodes to the network by cloning captured nodes, imposes a severe threat to wireless sensor networks. To cope with it, we propose two distributed detection protocols with difference tradeoffs on network conditions and performance. The first one is based on distributed hash table, by which a fully decentralized, key-based caching and checking system is constructed to deterministically catch cloned nodes in general sensor networks. The protocol performance of efficient storage consumption and high security level is theoretically deducted through a probability model, and the resulting equations, with necessary adjustments for real application, are supported by the simulations. The other is the randomly directed exploration protocol, which presents notable communication performance and minimal storage consumption by an elegant probabilistic directed forwarding technique along with random initial direction and border determination. The extensive experimental results uphold the protocol design and show its efficiency on communication overhead and satisfactory detection probability. Data aggregation is an inherent requirement for many sensor network applications, but designing secure mechanisms for data aggregation is very challenging because the aggregation nature that requires intermediate nodes to process and change messages, and the security objective to prevent malicious manipulation, conflict with each other to a great extent. To fulfill different challenges of secure data aggregation, we present two types of approaches. The first is to provide cryptographic integrity mechanisms for general data aggregation. Based on recent developments of homomorphic primitives, we propose three integrity schemes: a concrete homomorphic MAC construction, homomorphic hash plus aggregate MAC, and homomorphic hash with identity-based aggregate signature, which provide different tradeoffs on security assumption, communication payload, and computation cost. The other is a substantial data aggregation scheme that is suitable for a specific and popular class of aggregation applications, embedded with built-in security techniques that effectively defeat outside and inside attacks. Its foundation is a new data structure---secure Bloom filter, which combines HMAC with Bloom filter. The secure Bloom filter is naturally compatible with aggregation and has reliable security properties. We systematically analyze the scheme's performance and run extensive simulations on different network scenarios for evaluation. The simulation results demonstrate that the scheme presents good performance on security, communication cost, and balance.
28

Intelligent Simulink Modeling Assistance via Model Clones and Machine Learning

Adhikari, Bhisma 26 July 2021 (has links)
No description available.
29

A Topic Modeling approach for Code Clone Detection

Khan, Mohammed Salman 01 January 2019 (has links)
In this thesis work, the potential benefits of Latent Dirichlet Allocation (LDA) as a technique for code clone detection has been described. The objective is to propose a language-independent, effective, and scalable approach for identifying similar code fragments in relatively large software systems. The main assumption is that the latent topic structure of software artifacts gives an indication of the presence of code clones. It can be hypothesized that artifacts with similar topic distributions contain duplicated code fragments and to prove this hypothesis, an experimental investigation using multiple datasets from various application domains were conducted. In addition, CloneTM, an LDA-based working prototype for code clone detection was developed. Results showed that, if calibrated properly, topic modeling can deliver a satisfactory performance in capturing different types of code clones, showing particularity good performance in detecting Type III clones. CloneTM also achieved levels of performance comparable to already existing practical tools that adopt different clone detection strategies.
30

Sur l'élaboration de meilleures techniques pour l'apprentissage auto-supervisé des représentations du code

Maes, Lucas 07 1900 (has links)
Les représentations du code apprises par les modèles d’apprentissage profond sont une composante cruciale pour certaines applications en génie logiciel telles que la recherche de code ou la détection de clones. Les performances de ces applications dépendent de la qualité des représentations apprises par les modèles. De fait, des représentations possédant peu de bruit et contenant des informations avec un haut niveau d’abstraction, comme la sémantique fonctionnelle, facilitent la résolution de ces tâches. En effet, la recherche de code nécessite de comprendre les objectifs des morceaux de code pour les comparer avec une requête en langage naturel, tandis que la détection de clone exige de déterminer si deux morceaux de code ont la même sémantique fonctionnelle. La capacité des modèles à apprendre des représentations contenant de telles informations abstraites est donc cruciale pour la bonne résolution de ces tâches. Cependant, il est toujours difficile pour les modèles de code d’apprendre des représentations abstraites indépendantes de la syntaxe, par exemple la sémantique fonctionnelle. Ce mémoire se consacre donc à l’élaboration de meilleures techniques pour l’apprentissage des représentations du code via l’apprentissage auto-supervisé. Plus spécifiquement, nous nous sommes concentrés sur deux tâches centrales dans l’automatisation du génie logiciel nécessitant un minimum de compréhension de la sémantique fonctionnelle, à savoir, la recherche de code et la détection de clones de type 4. Ce mémoire propose différentes approches à différents degrés d’entraînement. Le premier degré est le pré-entraînement et consiste à apprendre des représentations génériques du code adaptables à n’importe quels problèmes. Le second est le peaufinage, modifiant les représentations apprises pour un problème spécifique. Tout d’abord, nous proposons un nouvel algorithme de pré-entraînement pour les modèles de code utilisant une méthode non contrastive régularisée adaptée de VICReg, permettant l’apprentissage de représentations génériques. Ensuite, nous proposons un nouvel objectif de peaufinage des modèles de code utilisant la distillation des connaissances d’un ensemble de modèles déjà peaufinés, appelés enseignants, sur un modèle étudiant, lui permettant ainsi l’apprentissage de représentations plus abstraites. L’ensemble des contributions vise à améliorer les représentations du code et à maximiser les performances des modèles d’apprentissage automatique pour le code, mais aussi à déterminer quel est le meilleur degré d’entraînement à adopter pour cela. Les résultats expérimentaux et les analyses menées dans ce mémoire sont préliminaires et ne permettent pas de tirer de conclusions définitives. Néanmoins, il est important de souligner que la deuxième contribution surpasse la méthode classique de peaufinage des modèles pour la recherche de code. De plus, les approches décrites proposent des pistes de directions de recherche innovantes et non conventionnelles. / Code representations learned by deep learning models are a crucial component for certain software engineering applications such as code search or clone detection. The performance of these applications depends on the quality of the representations learned by the models. In fact, low-noise representations containing highly abstract information, such as functional semantics, facilitate the resolution of these tasks. Indeed, code search requires understanding the objectives of code snippets in order to compare them with a natural language query, while clone detection requires determining whether two code snippets have the same functional semantics. The ability of models to learn representations containing such abstract information is therefore crucial to the successful resolution of these tasks. However, it is still difficult for code models to learn abstract representations that are independent of syntax, such as functional semantics. This thesis is therefore dedicated to developing better techniques for learning code representations via self-supervised learning. More specifically, we focus on two central tasks in software engineering automation requiring a minimum understanding of functional semantics, namely, code search and type 4 clone detection. This work proposes different approaches with different degrees of training. The first, pre-training, consists in learning generic code representations that can be adapted to any problem. The second is fine-tuning, modifying the representations learned for a specific problem. First, we propose a new pre-training algorithm for code models using a regularized non-contrastive method adapted from VICReg [14] enabling the learning of generic representations. Secondly, we propose a new code model refinement objective using knowledge distillation of a set of already refined models, called teachers, on a student model allowing it to learn more abstract representations. The aim of all these contributions is not only to improve code representations and maximize the performance of machine learning models for code, but also to determine the best degree of training to adopt for this purpose. The experimental results and analyses carried out in this thesis are preliminary and do not allow to draw formal conclusions. Nevertheless, it is important to underline that the second contribution outperforms the classical model refinement method for code search. Moreover, the approaches described suggest innovative and unconventional research directions.

Page generated in 0.4905 seconds