Return to search

Visualization of Text Duplicates in Documents

In this thesis, a tool to visualize duplicate parts in a series of given documents is developed. Text duplicates are very common nowadays in all fields. This behavior severelyharms the rights of the original authors though it facilitates the work of those whocopy from them. Effective legal measures have been taken when it comes to copyrightissue. An increasing large number of people have paid serious attention to what theywrite when they refer to other people's works. Although references are properly madeby many who admire and respect others' achievements, plagiarism takes place all thetime. Therefore, an intuitive way of visualizing duplicate parts is needed so thatpeople can easily grasp the purpose and decide the legality of those duplicates. Whenit comes to computer science, software clone is very typical phenomenon amongdifferent development groups or even within one group. Since a piece of softwareusually have its hierarchy, it is also interesting to group members when they do aclone detection of their own or other software. For example, if a good overview of thehierarchies is provided in a tree representation, one can easily locate the clones of aparticular node in other trees. More interaction techniques can allow concrete codeaccesses through double clicking on a highlighted node. To visualize duplicate parts in a nice and intuitive way, a visualization tool isdeveloped for this thesis project. By the time it is done, the following features shouldbe fulfilled. First, the tool can visualize similar or identical parts given a data set.Second, hierarchies of those files can be demonstrated with proper layout. Third, theuser can manipulate the data items on the screen in order to get a better insight of thedata set and help with analysis tasks. Forth, different levels of abstraction areprovided so that the user can either get an overview of all the files or specificallycheck the duplicate parts in the documents of interest. / Visualization of Text Duplicates in Documents

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:vxu-5408
Date January 2009
CreatorsWang, Chao, Pan, Han
PublisherVäxjö universitet, Matematiska och systemtekniska institutionen, Växjö universitet, Matematiska och systemtekniska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/masterThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationRapporter från MSI, 1650-2647

Page generated in 0.0016 seconds