Return to search

SubRosa: Determining Movie Similarities based on Subtitles

For streaming websites, media shopping platforms and movie databases, movie recommendation systems have become an important technology, where mostly hybrid methods of collaborative
and content-based filtering on the basis of user ratings and user-generated content have proven
to be effective. However, these methods can lead to popularity-biased results that show an underrepresentation of those movies for which only little user-generated data exists. In this paper we will
discuss the possibility of generating movie recommendations that are not based on user-generated data
or metadata, but solely on the content of the movies themselves, confining ourselves to movie dialog.
We extract low-level features from movie subtitles by using methods from Information Retrieval,
Natural Language Processing and Stylometry, and examine a possible correlation of these features’
similarity with the overall movie similarity. In addition we present a novel web application called
SubRosa (http://ch01.informatik.uni-leipzig.de:5001/), which can be used to interactively
compare the results of different feature combinations.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:92312
Date26 June 2024
CreatorsLuhmann, Jan, Burghardt, Manuel, Tiepmar, Jochen
PublisherGesellschaft für Informatik e.V.
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess
Relationhttps://doi.org/10.18420/inf2020_119

Page generated in 0.002 seconds