Global ETD Search

Return to search

Using Style Markers for Detecting Plagiarism in Natural Language Documents

Most of the existing plagiarism detection systems compare a text to a database of other texts. These external approaches, however, are vulnerable because texts not contained in the database cannot be detected as source texts. This paper examines an internal plagiarism detection method that uses style markers from authorship attribution studies in order to find stylistic changes in a text. These changes might pinpoint plagiarized passages. Additionally, a new style marker called specific words is introduced. A pre-study tests if the style markers can fingerprint an author s style and if they are constant with sample size. It is shown that vocabulary richness measures do not fulfil these prerequisites. The other style markers - simple ratio measures, readability scores, frequency lists, and entropy measures - have these characteristics and are, together with the new specific words measure, used in a main study with an unsupervised approach for detecting stylistic changes in plagiarized texts at sentence and paragraph levels. It is shown that at these small levels the style markers generally cannot detect plagiarized sections because of intra-authorial stylistic variations (i.e. noise), and that at bigger levels the results are strongly a ected by the sliding window approach. The specific words measure, however, can pinpoint single sentences written by another author.

http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-824

plagiarism detection

stylometry

authorship attribution

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:his-824
Date	January 2003
Creators	Kimler, Marco
Publisher	Högskolan i Skövde, Institutionen för datavetenskap, Skövde : Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/postscript, application/pdf
Rights	info:eu-repo/semantics/openAccess, info:eu-repo/semantics/openAccess

Page generated in 0.0025 seconds

Using Style Markers for Detecting Plagiarism in Natural Language Documents

Description

Links & Downloads

Tags

Additional Fields