Global ETD Search

Return to search

Extracting Textual Data from Historical Newspaper Scans and its Challenges for 'Guerilla-Projects

In 2022, it is a common place that digital historical newspapers (DHN) have become
increasingly available. Despite the undeniable progress in the supply of DHN and the methods to
perform rigorous quantitative analysis, however, working with DHN still poses various pitfalls,
especially when scholars use data provided by third parties, such as libraries or commercial
providers. Reporting from a current project, we want to share our experiences and communicate
the various problems we faced while working with DHN. After a short project summary, we
present the main problems that we faced in our project and that we think might also be relevant
for other scholars, particularly those who work in small research groups. We arrange these
problems according to an archetype workflow, which is divided into the three steps of corpus
acquisition, corpus evaluation, and corpus preparation. By raising some red flags, we want to call
attention to what we think common DHN related problems, to raise awareness for potential
pitfalls, and, this way, to provide some guidelines for scholars who consider using DHN for their
research.

info:eu-repo/classification/ddc/000

ddc:000

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:92579
Date	11 July 2024
Creators	Wehrheim, Lino, Liebl, Bernhard, Burghardt, Manuel
Publisher	Universitätsbibliothel Regensburg
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	info:eu-repo/semantics/submittedVersion, doc-type:workingPaper, info:eu-repo/semantics/workingPaper, doc-type:Text
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Extracting Textual Data from Historical Newspaper Scans and its Challenges for 'Guerilla-Projects

Description

Links & Downloads

Tags

Additional Fields