Return to search

A Computational Expedition into the Undiscovered Country - Evaluating Neural Networks for the Identification of Hamlet Text Reuse

In this article, we describe a two-step processing pipeline for identifying text reuse of Shakespeare’s
Hamlet in a corpus of postmodern fiction by comparing n-grams from both sources. A key feature of
our approach lies in a pre-filtering step, in which we select target sentences in the fiction corpus that
are potential candidates for Hamlet text reuse. Without pre-filtering, the amount of text reuse pairs
(that are no actual quotes) would be extremely high. In a second filtering step, we compare potential
text reuse pairs by their vector representation using a neural network trained in an unsupervised
manner. We found that using the vector similarity produces a problematic amount of false positives.
The created vector representations are created using an unsupervised training approach, resulting in
similarity aspects that are unfavorable for our use case.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:92167
Date20 June 2024
CreatorsBryan, Maximilian, Burghardt, Manuel, Molz, Johannes
PublisherCEUR-WS.org
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds