Return to search

Using Attention-based Sequence-to-Sequence Neural Networks for Transcription of Historical Cipher Documents

Encrypted historical manuscripts (also called ciphers), containing encoded information, provides a useful resource for giving new insight into our history. Transcribing these manuscripts from image format to computer readable format is a necessary step for decrypting them. In this thesis project, we explore automatic approaches of Hand Written Text Recognition (HTR) for cipher image transcription line by line.In this thesis project, We applied an attention-based Sequence-to-Sequence (Seq2Seq) model for the automatic transcription of ciphers with three different writing systems. We tested/developed algorithms for the recognition of cipher symbols, and their location. To evaluate our method on different levels, the model is trained and tested on ciphers with various symbol sets, from digits to graphical signs. To find out the useful approaches for improving the transcription performance, we conducted ablation study regarding attention mechanism and other deep learning tricks. The results show an accuracy lower than 50% and indicate a big room for improvements and plenty of future work.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-420322
Date January 2020
CreatorsRenfei, Han
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds