• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 6
  • 4
  • 1
  • 1
  • 1
  • Tagged with
  • 26
  • 26
  • 7
  • 6
  • 6
  • 6
  • 6
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Sherlock N-Overlap: normalization invasive and overlap coefficient for analysis of similarity between source code in programming disciplines / Sherlock N-Overlap: normalizaÃÃo invasiva e coeficiente de sobreposiÃÃo para anÃlise de similaridade entre cÃdigos-fonte em disciplinas de programaÃÃo

Danilo Leal Maciel 07 July 2014 (has links)
CoordenaÃÃo de AperfeÃoamento de Pessoal de NÃvel Superior / This work is contextualized in the problem of plagiarism detection among source codes in programming classes. Despite the wide set of tools available for the detection of plagiarism, only few tools are able to effectively identify all lexical and semantic similarities between pairs of codes, because of the complexity inherent to this type of analysis. Therefore to the problem and the scenario in question, it was made a study about the main approaches discussed in the literature on detecting plagiarism in source code and as a main contribution, conceived to be a relevant tool in the field of laboratory practices. The tool is based on Sherlock algorithm, which has been enhanced as of two perspectives: firstly, with changes in the similarity coefficient used by the algorithm in order to improve its sensitivity for comparison of signatures; secondly, proposing intrusive techniques preprocessing that, besides eliminating irrelevant information, are also able to overemphasize structural aspects of the programming language, or gathering separating strings whose meaning is more significant for the comparison or even eliminating sequences less relevant to highlight other enabling better inference about the degree of similarity. The tool, called Sherlock N-Overlap was subjected to rigorous evaluation methodology, both in simulated scenarios as classes in programming, with results exceeding tools currently highlighted in the literature on plagiarism detection. / Este trabalho se contextualiza no problema da detecÃÃo de plÃgio entre cÃdigos-fonte em turmas de programaÃÃo. Apesar da ampla quantidade de ferramentas disponÃveis para a detecÃÃo de plÃgio, poucas sÃo capazes de identificar, de maneira eficaz, todas as semelhanÃas lÃxicas e semÃnticas entre pares de cÃdigos, o que se deve à complexidade inerente a esse tipo de anÃlise. Fez-se, portanto, para o problema e o cenÃrio em questÃo, um estudo das principais abordagens discutidas na literatura sobre detecÃÃo de plÃgio em cÃdigo-fonte e, como principal contribuiÃÃo, concebeu-se uma ferramenta aplicÃvel no domÃnio de prÃticas laboratoriais. A ferramenta tem por base o algoritmo Sherlock, que foi aprimorado sob duas perspectivas: a primeira, com modificaÃÃes no coeficiente de similaridade usado pelo algoritmo, de maneira a melhorar a sua sensibilidade para comparaÃÃo de assinaturas; a segunda, propondo tÃcnicas de prÃ-processamento invasivas que, alÃm de eliminar informaÃÃo irrelevante, sejam tambÃm capazes de sobrevalorizar aspectos estruturais da linguagem de programaÃÃo, reunindo ou separando sequÃncias de caracteres cujo significado seja mais expressivo para a comparaÃÃo ou, ainda, eliminando sequÃncias menos relevantes para destacar outras que permitam melhor inferÃncia sobre o grau de similaridade. A ferramenta, denominada Sherlock N-Overlap, foi submetida a rigorosa metodologia de avaliaÃÃo, tanto em cenÃrios simulados como em turmas de programaÃÃo, apresentando resultados superiores a ferramentas atualmente em destaque na literatura sobre detecÃÃo de plÃgio.
12

Cross-language plagiarism detection / Detecção de plágio multilíngue

Pereira, Rafael Corezola January 2010 (has links)
Plágio é um dos delitos mais graves no meio acadêmico. É definido como “o uso do trabalho de uma pessoa sem a devida referência ao trabalho original”. Em contrapartida a esse problema, existem diversos métodos que tentam detectar automaticamente plágio entre documentos. Nesse contexto, esse trabalho propõe um novo método para Análise de Plágio Multilíngue. O objetivo do método é detectar casos de plágio em documentos suspeitos baseado em uma coleção de documentos ditos originais. Para realizar essa tarefa, é proposto um método de detecção de plágio composto por cinco fases principais: normalização do idioma, recuperação dos documentos candidatos, treinamento do classificador, análise de plágio, pós-processamento. Uma vez que o método é projetado para detectar plágio entre documentos escritos em idiomas diferentes, nós usamos um language guesser para identificar o idioma de cada documento e um tradutor automático para traduzir todos os documentos para um idioma comum (para que eles possam ser analisados de uma mesma forma). Após a normalização, nós aplicamos um algoritmo de classificação com o objetivo de construir um modelo que consiga diferenciar entre um trecho plagiado e um trecho não plagiado. Após a fase de treinamento, os documentos suspeitos podem ser analisados. Um sistema de recuperação é usado para buscar, baseado em trechos extraídos de cada documento suspeito, os trechos dos documentos originais que são mais propensos de terem sido utilizados como fonte de plágio. Somente após os trechos candidatos terem sido retornados, a análise de plágio é realizada. Por fim, uma técnica de pós-processamento é aplicada nos resultados da detecção a fim de juntar os trechos plagiados que estão próximos um dos outros. Nós avaliamos o métodos utilizando três coleções de testes disponíveis. Duas delas foram criadas para as competições PAN (PAN’09 e PAN’10), que são competições internacionais de detecção de plágio. Como apenas um pequeno percentual dos casos de plágio dessas coleções era multilíngue, nós criamos uma coleção com casos de plágio multilíngue artificiais. Essa coleção foi chamada de ECLaPA (Europarl Cross-Language Plagiarism Analysis). Os resultados alcançados ao analisar as três coleções de testes mostraram que o método proposto é uma alternativa viável para a tarefa de detecção de plágio multilíngue. / Plagiarism is one of the most serious forms of academic misconduct. It is defined as “the use of another person's written work without acknowledging the source”. As a countermeasure to this problem, there are several methods that attempt to automatically detect plagiarism between documents. In this context, this work proposes a new method for Cross-Language Plagiarism Analysis. The method aims at detecting external plagiarism cases, i.e., it tries to detect the plagiarized passages in the suspicious documents (the documents to be investigated) and their corresponding text fragments in the source documents (the original documents). To accomplish this task, we propose a plagiarism detection method composed by five main phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and postprocessing. Since the method is designed to detect cross-language plagiarism, we used a language guesser to identify the language of the documents and an automatic translation tool to translate all the documents in the collection into a common language (so they can be analyzed in a uniform way). After language normalization, we applied a classification algorithm in order to build a model that is able to differentiate a plagiarized text passage from a non-plagiarized one. Once the classifier is trained, the suspicious documents can be analyzed. An information retrieval system is used to retrieve, based on passages extracted from each suspicious document, the passages from the original documents that are more likely to be the source of plagiarism. Only after the candidate passages are retrieved, the plagiarism analysis is performed. Finally, a postprocessing technique is applied in the reported results in order to join the contiguous plagiarized passages. We evaluated our method using three freely available test collections. Two of them were created for the PAN competitions (PAN’09 and PAN’10), which are international competitions on plagiarism detection. Since only a small percentage of these two collections contained cross-language plagiarism cases, we also created an artificial test collection especially designed to contain this kind of offense. We named the test collection ECLaPA (Europarl Cross-Language Plagiarism Analysis). The results achieved while analyzing these collections showed that the proposed method is a viable approach to the task of cross-language plagiarism analysis.
13

Combinando métricas baseadas em conteúdo e em referências para a detecção de plágio em artigos científicos / Combining content- and citation-based metrics for plagiarism detection in scientific papers

Pertile, Solange de Lurdes January 2015 (has links)
A grande quantidade de artigos científicos disponíveis on-line faz com que seja mais fácil para estudantes e pesquisadores reutilizarem texto de outros autores, e torna mais difícil a verificação da originalidade de um determinado texto. Reutilizar texto sem creditar a fonte é considerado plágio. Uma série de estudos relatam a alta prevalência de plágio no meio acadêmico e científico. Como consequência, inúmeras instituições e pesquisadores têm se dedicado à elaboração de sistemas para automatizar o processo de verificação de plágio. A maioria dos trabalhos existentes baseia-se na análise da similaridade do conteúdo textual dos documentos para avaliar a existência de plágio. Mais recentemente, foram propostas métricas de similaridade que desconsideram o texto e analisam apenas as citações e/ou referências bibliográficas compartilhadas entre documentos. Entretanto, casos em que o autor não referencia a fonte original pode passar despercebido pelas métricas baseadas apenas na análise de referências/citações. Neste contexto, a solução proposta é baseada na hipótese de que a combinação de métricas de similaridade de conteúdo e de citações/referências pode melhorar a qualidade da detecção de plágio. Duas formas de combinação são propostas: (i) os escores produzidos pelas métricas de similaridade são utilizados para ranqueamento dos pares de documentos e (ii) os escores das métricas são utilizados para construir vetores de características que serão usados por algoritmos de Aprendizagem de Máquina para classificar os documentos. Os experimentos foram realizados com conjuntos de dados reais de artigos científicos. A avaliação experimental mostra que a hipótese foi confirmada quando a combinação das métricas de similaridade usando Aprendizagem de Máquina é comparada com a combinação simples. Ainda, ambas as combinações apresentaram ganhos quando comparadas com as métricas aplicadas de forma individual. / The large amount of scientific documents available online makes it easier for students and researchers reuse text from other authors, and makes it difficult to verify the originality of a given text. Reusing text without crediting the source is considered plagiarism. A number of studies have reported on the high prevalence of plagiarism in academia. As a result, many institutions and researchers have developed systems that automate the plagiarism detection process. Most of the existing work is based on the analysis of the similarity of the textual content of documents to assess the existence of plagiarism. More recently, similarity metrics that ignore the text and just analyze the citations and/or references shared between documents have been proposed. However, cases in which the author does not reference the original source may go unnoticed by metrics based only on the references/citations analysis. In this context, the proposed solution is based on the hypothesis that the combination of content similarity metrics and references/citations can improve the quality of plagiarism detection. Two forms of combination are proposed: (i) scores produced by the similarity metrics are used to ranking of pairs of documents and (ii) scores of metrics are used to construct feature vectors that are used by algorithms machine learning to classify documents. The experiments were performed with real data sets of papers. The experimental evaluation shows that the hypothesis was confirmed when the combination of the similarity metrics using machine learning is compared with the simple combining. Also, both compounds showed gains when compared with the metrics applied individually.
14

Detekce plagiátů programových kódů / Plagiarism detection of program codes

Nečadová, Anežka January 2015 (has links)
This semestral thesis presents definition of plagiarism and focuses primarily on solving this problem in academic world. The main topic is the detection of plagiarism. It is discussed the various steps of the detection process and special attention is given to plagiarism detection of program codes. The work mentions question of the reliability of detection tools and divides the plagiarism detection methods into basic groups. One chapter is devoted metrics for comparing files. Mentioned are two tools available to detect plagiarism. In the last chapter is analyzed own draft program for plagiarism detection of program codes. The detector was applied to a database of student’s works, and the results were plotted.
15

Uncharted Territory: Receptions of Philosophy in Apollonius Rhodius’ Argonautica

Marshall, Laura Ann January 2017 (has links)
No description available.
16

A study on plagiarism detection and plagiarism direction identification using natural language processing techniques

Chong, Man Yan Miranda January 2013 (has links)
Ever since we entered the digital communication era, the ease of information sharing through the internet has encouraged online literature searching. With this comes the potential risk of a rise in academic misconduct and intellectual property theft. As concerns over plagiarism grow, more attention has been directed towards automatic plagiarism detection. This is a computational approach which assists humans in judging whether pieces of texts are plagiarised. However, most existing plagiarism detection approaches are limited to super cial, brute-force stringmatching techniques. If the text has undergone substantial semantic and syntactic changes, string-matching approaches do not perform well. In order to identify such changes, linguistic techniques which are able to perform a deeper analysis of the text are needed. To date, very limited research has been conducted on the topic of utilising linguistic techniques in plagiarism detection. This thesis provides novel perspectives on plagiarism detection and plagiarism direction identi cation tasks. The hypothesis is that original texts and rewritten texts exhibit signi cant but measurable di erences, and that these di erences can be captured through statistical and linguistic indicators. To investigate this hypothesis, four main research objectives are de ned. First, a novel framework for plagiarism detection is proposed. It involves the use of Natural Language Processing techniques, rather than only relying on the vii traditional string-matching approaches. The objective is to investigate and evaluate the in uence of text pre-processing, and statistical, shallow and deep linguistic techniques using a corpus-based approach. This is achieved by evaluating the techniques in two main experimental settings. Second, the role of machine learning in this novel framework is investigated. The objective is to determine whether the application of machine learning in the plagiarism detection task is helpful. This is achieved by comparing a thresholdsetting approach against a supervised machine learning classi er. Third, the prospect of applying the proposed framework in a large-scale scenario is explored. The objective is to investigate the scalability of the proposed framework and algorithms. This is achieved by experimenting with a large-scale corpus in three stages. The rst two stages are based on longer text lengths and the nal stage is based on segments of texts. Finally, the plagiarism direction identi cation problem is explored as supervised machine learning classi cation and ranking tasks. Statistical and linguistic features are investigated individually or in various combinations. The objective is to introduce a new perspective on the traditional brute-force pair-wise comparison of texts. Instead of comparing original texts against rewritten texts, features are drawn based on traits of texts to build a pattern for original and rewritten texts. Thus, the classi cation or ranking task is to t a piece of text into a pattern. The framework is tested by empirical experiments, and the results from initial experiments show that deep linguistic analysis contributes to solving the problems we address in this thesis. Further experiments show that combining shallow and viii deep techniques helps improve the classi cation of plagiarised texts by reducing the number of false negatives. In addition, the experiment on plagiarism direction detection shows that rewritten texts can be identi ed by statistical and linguistic traits. The conclusions of this study o er ideas for further research directions and potential applications to tackle the challenges that lie ahead in detecting text reuse.
17

On Occurrence Of Plagiarism In Published Computer Science Thesis Reports At Swedish Universities

Anbalagan, Sindhuja January 2010 (has links)
In recent years, it has been observed that software clones and plagiarism are becoming an increased threat for one?s creativity. Clones are the results of copying and using other?s work. According to the Merriam – Webster dictionary, “A clone is one that appears to be a copy of an original form”. It is synonym to duplicate. Clones lead to redundancy of codes, but not all redundant code is a clone.On basis of this background knowledge ,in order to safeguard one?s idea and to avoid intentional code duplication for pretending other?s work as if their owns, software clone detection should be emphasized more. The objective of this paper is to review the methods for clone detection and to apply those methods for finding the extent of plagiarism occurrence among the Swedish Universities in Master level computer science department and to analyze the results.The rest part of the paper, discuss about software plagiarism detection which employs data analysis technique and then statistical analysis of the results.Plagiarism is an act of stealing and passing off the idea?s and words of another person?s as one?s own. Using data analysis technique, samples(Master level computer Science thesis report) were taken from various Swedish universities and processed in Ephorus anti plagiarism software detection. Ephorus gives the percentage of plagiarism for each thesis document, from this results statistical analysis were carried out using Minitab Software.The results gives a very low percentage of Plagiarism extent among the Swedish universities, which concludes that Plagiarism is not a threat to Sweden?s standard of education in computer science.This paper is based on data analysis, intelligence techniques, EPHORUS software plagiarism detection tool and MINITAB statistical software analysis.
18

Efektivní metody detekce plagiátů v rozsáhlých dokumentových skladech / Effective methods of plagiarism detectios in large document repositories

Přibil, Jiří January 2009 (has links)
The work focuses on issues of plagiarism detection in large document repositories. Taking into account real situation that needs to be addressed now in the university environment in the Czech Republic and proposes a system that will be able to carry out this analysis in real time and yet be able to capture the widest possible range of plagiarism methods. The main contribution of this work is taking the definition of so-called unordered n-grams - {n}-grams - which can be used just to detect some forms of advanced plagiarism methods. All cited recommendations that relate to the various components of the system to detect plagiarism - preprocessing the document before document insertion into the corpus, the representation of documents in document storage, identification of potential sources of plagiarism to calculate rates of similarity; visualization analysis of plagiarism - are subject to discussion and appropriately quantified. The result is a set of design parameters of the system so that it can in detect plagiarism in the Czech language language quickly, accurately and yet in most forms.
19

(Ur)kunden har alltid rätt : En studie om plagiat och användandet av plagiatkontrollsystem vid ett svenskt lärosäte / The customer (Urkund) is always right

Hedlund, Elin January 2019 (has links)
The purpose of this study was to better understand how the universities in Sweden work to prevent plagiarism, how they use plagiarism detection tools and what knowledge the students have or wish to have about plagiarism. This was studied through interviews with teachers and administration staff from one smaller university in Sweden together with a questionnaire for students attending any university in Sweden. The study shows that the information about plagiarism can be clearer from teachers according to the students, to prevent plagiarism as far as possible. It also shows that the university is already working to improve the information so that the students in a better way can understand, but it is hard to know how to give the information and what information that the students need. The examined university are using the system Urkund as a plagiarism detection tool and the study shows that teachers in general are satisfied with the system and how they use it. However, some teachers see an opportunity to use the detection tool in their teaching practises but need more knowledge of the system to do so efficiently. The students reportedly know what plagiarism means according to the study, but they do not have a deeper knowledge about it which they want from their universities, especially how the school is handling plagiarism cases and how the plagiarism detection tool works. The study concludes with recommendations for the universities to improve their work against plagiarism. / Syftet med denna studie var att ta reda på hur ett lärosäte i dagsläget arbetar för att motverka plagiat, hur man använder plagiatkontrollsystem och vad studenterna har för kunskap eller vilken information de känner att de saknar gällande plagiering. Detta undersöktes genom att intervjua lärare och förvaltningsanställda på en mindre högskola i Sverige och även genom en enkätundersökning som gick ut till studenter vid högre utbildning i hela Sverige. Studien visade att många studenter upplever att informationen från lärare och lärosätet kan bli tydligare för att i största möjliga mån undvika plagieringsfall. Den visar på att lärosätet redan jobbar för att förbättra informationen för studenterna men att det är svårt att veta vad som fungerar optimalt. Studien visar också på hur systemet Urkund används och i vilken grad. De allra flesta lärarna tycker att det är ett bra och användbartsystem och det känner sig bekväma med användandet, många kan också se en möjlighet till att använda systemet på ett mer pedagogiskt vis men behöver mer kunskap i hur det skulle kunna fungera. Studenterna anser sig veta vad plagiat innebär men en stor del tycker sig sakna mer djupgående information från sitt lärosäte för att få en bättre förståelse i främst förfarande vid upptäckt av plagiat och hur plagiatkontrollsystem fungerar. Slutligen ges rekommendationer till lärosäten för att kunna utveckla sitt arbete med att motverka plagiering och förbättra kommunikationen mellan lärare och studenter för att minimera plagiatfall.
20

Similarités textuelles sémantiques translingues : vers la détection automatique du plagiat par traduction / Cross-lingual semantic textual similarity : towards automatic cross-language plagiarism detection

Ferrero, Jérémy 08 December 2017 (has links)
La mise à disposition massive de documents via Internet (pages Web, entrepôts de données,documents numériques, numérisés ou retranscrits, etc.) rend de plus en plus aisée la récupération d’idées. Malheureusement, ce phénomène s’accompagne d’une augmentation des cas de plagiat.En effet, s’approprier du contenu, peu importe sa forme, sans le consentement de son auteur (ou de ses ayants droit) et sans citer ses sources, dans le but de le présenter comme sa propre œuvre ou création est considéré comme plagiat. De plus, ces dernières années, l’expansion d’Internet a également facilité l’accès à des documents du monde entier (écrits dans des langues étrangères)et à des outils de traduction automatique de plus en plus performants, accentuant ainsi la progression d’un nouveau type de plagiat : le plagiat translingue. Ce plagiat implique l’emprunt d’un texte tout en le traduisant (manuellement ou automatiquement) de sa langue originale vers la langue du document dans lequel le plagiaire veut l’inclure. De nos jours, la prévention du plagiat commence à porter ses fruits, grâce notamment à des logiciels anti-plagiat performants qui reposent sur des techniques de comparaison monolingue déjà bien éprouvées. Néanmoins, ces derniers ne traitent pas encore de manière efficace les cas translingues. Cette thèse est née du besoin de Compilatio, une société d’édition de l’un de ces logiciels anti-plagiat, de mesurer des similarités textuelles sémantiques translingues (sous-tâche de la détection du plagiat). Après avoir défini le plagiat et les différents concepts abordés au cours de cette thèse, nous établissons un état de l’art des différentes approches de détection du plagiat translingue. Nousprésentons également les différents corpus déjà existants pour la détection du plagiat translingue et exposons les limites qu’ils peuvent rencontrer lors d’une évaluation de méthodes de détection du plagiat translingue. Nous présentons ensuite le corpus que nous avons constitué et qui ne possède pas la plupart des limites rencontrées par les différents corpus déjà existants. Nous menons,à l’aide de ce nouveau corpus, une évaluation de plusieurs méthodes de l’état de l’art et découvrons que ces dernières se comportent différemment en fonction de certaines caractéristiques des textes sur lesquelles elles opèrent. Ensuite, nous présentons des nouvelles méthodes de mesure de similarités textuelles sémantiques translingues basées sur des représentations continues de mots(word embeddings). Nous proposons également une notion de pondération morphosyntaxique et fréquentielle de mots, qui peut aussi bien être utilisée au sein d’un vecteur qu’au sein d’un sac de mots, et nous montrons que son introduction dans ces nouvelles méthodes augmente leurs performances respectives. Nous testons ensuite différents systèmes de fusion et combinaison entre différentes méthodes et étudions les performances, sur notre corpus, de ces méthodes et fusions en les comparant à celles des méthodes de l’état de l’art. Nous obtenons ainsi de meilleurs résultats que l’état de l’art dans la totalité des sous-corpus étudiés. Nous terminons en présentant et discutant les résultats de ces méthodes lors de notre participation à la tâche de similarité textuelle sémantique (STS) translingue de la campagne d’évaluation SemEval 2017, où nous nous sommes classés 1er à la sous-tâche correspondant le plus au scénario industriel de Compilatio. / The massive amount of documents through the Internet (e.g. web pages, data warehouses anddigital or transcribed texts) makes easier the recycling of ideas. Unfortunately, this phenomenonis accompanied by an increase of plagiarism cases. Indeed, claim ownership of content, withoutthe consent of its author and without crediting its source, and present it as new and original, isconsidered as plagiarism. In addition, the expansion of the Internet, which facilitates access todocuments throughout the world (written in foreign languages) as well as increasingly efficient(and freely available) machine translation tools, contribute to spread a new kind of plagiarism:cross-language plagiarism. Cross-language plagiarism means plagiarism by translation, i.e. a texthas been plagiarized while being translated (manually or automatically) from its original languageinto the language of the document in which the plagiarist wishes to include it. While prevention ofplagiarism is an active field of research and development, it covers mostly monolingual comparisontechniques. This thesis is a joint work between an academic laboratory (LIG) and Compilatio (asoftware publishing company of solutions for plagiarism detection), and proposes cross-lingualsemantic textual similarity measures, which is an important sub-task of cross-language plagiarismdetection.After defining the plagiarism and the different concepts discussed during this thesis, wepresent a state-of-the-art of the different cross-language plagiarism detection approaches. Wealso present the preexisting corpora for cross-language plagiarism detection and show their limits.Then we describe how we have gathered and built a new dataset, which does not contain mostof the limits encountered by the preexisting corpora. Using this new dataset, we conduct arigorous evaluation of several state-of-the-art methods and discover that they behave differentlyaccording to certain characteristics of the texts on which they operate. We next present newmethods for measuring cross-lingual semantic textual similarities based on word embeddings.We also propose a notion of morphosyntactic and frequency weighting of words, which can beused both within a vector and within a bag-of-words, and we show that its introduction inthe new methods increases their respective performance. Then we test different fusion systems(mostly based on linear regression). Our experiments show that we obtain better results thanthe state-of-the-art in all the sub-corpora studied. We conclude by presenting and discussingthe results of these methods obtained during our participation to the cross-lingual SemanticTextual Similarity (STS) task of SemEval-2017, where we ranked 1st on the sub-task that bestcorresponds to Compilatio’s use-case scenario.

Page generated in 0.5154 seconds