Return to search

Využití explicitní sémantické analýzy pro detekci podobností ve zdrojových kódech

This diploma thesis deals with using of explicit semantic analysis for detection similarities in source codes in the context of plagiarism. For building a semantic interpreter 40 829 Wikipedia articles were used and the analysis was tested on 25 specially created documents using plagiarism techniques and 5 downloaded documents. The dataset was consisted of five languages: Java, Javascript, PHP, C++ and Python. Another dataset of 15 documents was used for testing random matches. It was demonstrated that the analysis is capable for the given dataset do detect similarities among different languages. Greedy String Tiling algorithm was used to refine the results and together with the explicit semantic analysis is implemented in the system Anton.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:428881
Date January 2019
CreatorsVšianský, Richard
Source SetsCzech ETDs
LanguageCzech
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.002 seconds