Return to search

Data extraction of digitized old newspaper content to streamline the search process for users with a genealogy perspective

This thesis presents the data extraction of digitized old newspaper content and the implementation of a search function to simplify for the user. This is developed as a master’s degree project at Linköping University. The application allows the user to search for interesting content in a database of articles and can be used by both genealogists, local historians and novices. The database is filled with data from OCR scanned newspapers and the user can either search the database by their own or with the help of their family tree. The family tree is implemented by reading the users GEDcom file and extracting useful information that is then used to get better search results. The result is returned to the user in the form of digital articles. The work concludes that the information from GEDcom files can be used to find new interesting facts and that the user should be allowed to affect how the data is reduced, in the form of article categorization and filtering.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-160533
Date January 2019
CreatorsPettersson, Sandra
PublisherLinköpings universitet, Medie- och Informationsteknik, Linköpings universitet, Tekniska fakulteten
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds