Global ETD Search

Return to search

“Embed, embed! There’s knocking at the gate.”

The detection of intertextual references in text corpora is a digital humanities topic that has gained a lot of attention in recent years. While intertextuality – from a literary studies perspective – describes the phenomenon of one text being present in another text, the computational problem at hand is the task of text similarity detection, and more concretely, semantic similarity detection. In this notebook, we introduce the Vectorian as a framework to build queries through word embeddings such as fastText and GloVe. We evaluate the influence of computing document similarity through alignments such as Waterman-Smith-Beyer and two variants of Word Mover’s Distance. We also investigate the performance of state-of-art sentence embeddings like Siamese BERT networks for the task - both as document embeddings and as contextual token embeddings. Overall, we find that Waterman-Smith-Beyer with fastText offers highly competitive performance. The notebook can also be used to upload new data for performing custom search queries.

Intertextuality, Embeddings, Vectorian

info:eu-repo/classification/ddc/410

ddc:410

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:91774
Date	30 May 2024
Creators	Burghardt, Manuel, Liebl, Bernhard
Publisher	Melusina Press, Zeitschrift für digitale Geisteswissenschaften
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	info:eu-repo/semantics/publishedVersion, doc-type:article, info:eu-repo/semantics/article, doc-type:Text
Rights	info:eu-repo/semantics/openAccess
Relation	2510-1366, 978-2-919815-25-8

Page generated in 0.0023 seconds

“Embed, embed! There’s knocking at the gate.”

Description

Links & Downloads

Tags

Additional Fields