Global ETD Search

Return to search

FREDDY

Word embeddings are useful in many tasks in Natural Language Processing and Information Retrieval, such as text mining and classification, sentiment analysis, sentence completion, or dictionary construction. Word2vec and its predecessor fastText, both well-known models to produce word embeddings, are powerful techniques to study the syntactic and semantic relations between words by representing them in a low-dimensional vector. By applying algebraic operations on these vectors semantic relationships such as word analogies, gender-inflections, or geographical relationships can be easily recovered. The aim of this work is to investigate how word embeddings could be utilized to augment and enrich queries in DBMSs, e.g. to compare text values according to their semantic relation or to group rows according to the similarity of their text values. For this purpose, we use pre-trained word embedding models of large text corpora such as Wikipedia. By exploiting this external knowledge during query processing we are able to apply inductive reasoning on text values. Thereby, we reduce the demand for explicit knowledge in database systems. In the context of the IMDB database schema, this allows for example to query movies that are semantically close to genres such as historical fiction or road movie without maintaining this information. Another example query is sketched in Listing 1, that returns the top-3 nearest neighbors (NN) of each movie in IMDB. Given the movie “Godfather” as input this results in “Scarface”, “Goodfellas” and “Untouchables”.

info:eu-repo/classification/ddc/004

ddc:004

Identifer	oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:38451
Date	25 February 2020
Creators	Günther, Michael
Publisher	Association for Computing Machinery
Source Sets	Hochschulschriftenserver (HSSS) der SLUB Dresden
Language	English
Detected Language	English
Type	info:eu-repo/semantics/acceptedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text
Rights	info:eu-repo/semantics/openAccess
Relation	978-1-4503-4703-7, 10.1145/3183713.3183717

Page generated in 0.0097 seconds

FREDDY

Description

Links & Downloads

Tags

Additional Fields