The CNN-based steganalysis model is able to capture some complex statistical dependencies and also learn feature representations. The proposed model uses a word embedding layer to map the words into dense vectors thus, achieving more accurate representations of the words. The proposed model extracts both, the syntax and semantic features. Files having less than 200 words are referred to as short text. Preprocessing for short text is done through word segmenting and encoding the words into indexes according to the position of words in the dictionary. Once this is performed, the index sequences are fed to the CNN to learn the feature representations. Files containing over 200 words are considered as long texts. Considering the wide range of length variation of these long texts, the proposed model tokenized long texts into their sentence components with a relatively consistent length prior to preprocessing the data. Eventually, the proposed model uses a decision strategy to make the final decision to check if the text file contains stego text or not.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:bth-23405 |
Date | January 2022 |
Creators | Akula, Tejasvi, Pamisetty, Varshitha |
Publisher | Blekinge Tekniska Högskola |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0016 seconds