Global ETD Search

Return to search

Cleartext detection and language identification in ciphers

In historical cryptology, cleartext represents text written in a known language ina cipher (a hand-written manuscript aiming at hiding the content of a message).Cleartext can give us an historical interpretation and contextualisation of themanuscript and could help researchers in cryptanalysis, but to these days thereis still no research on how to automatically detect cleartext and identifying itslanguage. In this paper, we investigate to what extent we can automaticallydistinguish cleartext from ciphertext in transcribed historical ciphers and towhat extent we are able to identify its language. We took a rule-based approachand run 7 different models using historical language models on ciphertextsprovided by the DECRYPT-Project. Our results show that using unigrams andbigrams on a word-level combined with 3-grams, 4-grams and 5-grams on acharacter-level is the best approach to tackle cleartext detection.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446439

historical cryptology

digital humanities

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-446439
Date	January 2021
Creators	Gambardella, Maria-Elena
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds

Cleartext detection and language identification in ciphers

Description

Links & Downloads

Tags

Additional Fields