Global ETD Search

Return to search

Využití explicitní sémantické analýzy pro detekci podobností ve zdrojových kódech

This diploma thesis deals with using of explicit semantic analysis for detection similarities in source codes in the context of plagiarism. For building a semantic interpreter 40 829 Wikipedia articles were used and the analysis was tested on 25 specially created documents using plagiarism techniques and 5 downloaded documents. The dataset was consisted of five languages: Java, Javascript, PHP, C++ and Python. Another dataset of 15 documents was used for testing random matches. It was demonstrated that the analysis is capable for the given dataset do detect similarities among different languages. Greedy String Tiling algorithm was used to refine the results and together with the explicit semantic analysis is implemented in the system Anton.

http://www.nusl.cz/ntk/nusl-428881

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:428881
Date	January 2019
Creators	Všianský, Richard
Source Sets	Czech ETDs
Language	Czech
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.002 seconds

Využití explicitní sémantické analýzy pro detekci podobností ve zdrojových kódech

Description

Links & Downloads

Tags

Additional Fields