• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

“Embed, embed! There’s knocking at the gate.”

Burghardt, Manuel, Liebl, Bernhard 30 May 2024 (has links)
The detection of intertextual references in text corpora is a digital humanities topic that has gained a lot of attention in recent years. While intertextuality – from a literary studies perspective – describes the phenomenon of one text being present in another text, the computational problem at hand is the task of text similarity detection, and more concretely, semantic similarity detection. In this notebook, we introduce the Vectorian as a framework to build queries through word embeddings such as fastText and GloVe. We evaluate the influence of computing document similarity through alignments such as Waterman-Smith-Beyer and two variants of Word Mover’s Distance. We also investigate the performance of state-of-art sentence embeddings like Siamese BERT networks for the task - both as document embeddings and as contextual token embeddings. Overall, we find that Waterman-Smith-Beyer with fastText offers highly competitive performance. The notebook can also be used to upload new data for performing custom search queries.

Page generated in 0.1362 seconds