Return to search

The Influence of Language Models on Decryption of German Historical Ciphers

This thesis assesses the influence of language models on decryption of historical German ciphers. Previous research on language identification and cleartext detection indicates that it is beneficial to use historical language models (LM) while dealing with historical ciphers as they can outperform models trained on present-day data. To date, no systematic investigation has considered the impact of choosing different LMs for the decryption of ciphers. Therefore, we conducted a series of experiments with the aim of exploring this assumption. Using historical data from the HistCorp collection and Project Gutenberg, we have created 3-gram, 4-gram and 5-gram models, as well as constructed substitution ciphers for testing of the models. The results show that in most cases language models trained on historical data perform better than the larger modern models, while the most consistent results for the tested ciphers gave the 4-gram models.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-485679
Date January 2022
CreatorsSikora, Justyna
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0031 seconds