Global ETD Search

Return to search

Semi-automatic Segmentation & Alignment of Handwritten Historical Text Images with the use of Bayesian Optimisation

To effortlessly digitise historical documents has risen to be of great interest for some time. Part of the digitisation is what is called annotating of the data. Such data annotations are obtained in a process called alignment which links words in an image to the transcript. Annotated data have many use cases such as being used in the training of handwritten text recognition models. Relevant to the application above, this project aimed to develop an interactive algorithm for the segmentation and alignment of historical document images. Two different developed methods (referred to as method 1 and method 2) were evaluated and compared on two different data sets Labour’sMemory and IAM. A method to incorporate self-learning was also developed and evaluated with Bayesian optimisation aimed at automatically setting parameters for the algorithm. The results proved that the algorithms perform better on the IAM data set, which could partly be explained by the difference in quality of the ground truth used for calculation of the performance metrics. Moreover, method 2 slightly outperformed method 1 for both data sets. Bayesian optimisation proved to be a reasonable, and more time efficient way of effectively setting parameters compared to manually finding parameters for each document. The work done in this project could serve as the basis for the future development of a useful and interactive tool for the alignment of text documents.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-506000

handwritten text recognition

machine learning

bayesian optimisation

image analysis

segmentation

alignment

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-506000
Date	January 2023
Creators	MacCormack, Philip
Publisher	Uppsala universitet, Avdelningen Vi3
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC X ; 23003

Page generated in 0.0026 seconds

Semi-automatic Segmentation & Alignment of Handwritten Historical Text Images with the use of Bayesian Optimisation

Description

Links & Downloads

Tags

Additional Fields

Semi-automatic Segmentation &amp; Alignment of Handwritten Historical Text Images with the use of Bayesian Optimisation

Description

Links & Downloads

Tags

Additional Fields

Semi-automatic Segmentation & Alignment of Handwritten Historical Text Images with the use of Bayesian Optimisation