Return to search

Lexeme Extraction for Wikidata : A proof of concept study for Swedish lexeme extraction

Wikipedia has a problem with organizing and managing data as well as references. As a solution, they created Wikidata to make it possible for machines to interpret these data, with the help of lexemes. A lexeme is an abstract lexical unit which consists of a word’s lemmas and its word class. The object of this paper is to present one possible way to provide Swedish lexeme data to Wikidata. This was implemented in two phases, namely, the first phase was to identify the lemmas and their word classes; the second phase was to process these words to create coherent lexemes. The developed model was able to process large amounts of words from the data source but barely succeeded to generate coherent lexemes. Although the lexemes was supposed to provide an efficient way of data understanding for machines, the obtained results lead to the conclusion that the developed model did not achieve the anticipated results. This is due to the amount of words found in correlation to the words processed. It is needed to find a way to import lexeme data to Wikidata from another data source.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:miun-40023
Date January 2020
CreatorsSamzelius, Simon
PublisherMittuniversitetet, Institutionen för informationssystem och –teknologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0129 seconds