Global ETD Search

Return to search

Lexeme Extraction for Wikidata : A proof of concept study for Swedish lexeme extraction

Wikipedia has a problem with organizing and managing data as well as references. As a solution, they created Wikidata to make it possible for machines to interpret these data, with the help of lexemes. A lexeme is an abstract lexical unit which consists of a word’s lemmas and its word class. The object of this paper is to present one possible way to provide Swedish lexeme data to Wikidata. This was implemented in two phases, namely, the first phase was to identify the lemmas and their word classes; the second phase was to process these words to create coherent lexemes. The developed model was able to process large amounts of words from the data source but barely succeeded to generate coherent lexemes. Although the lexemes was supposed to provide an efficient way of data understanding for machines, the obtained results lead to the conclusion that the developed model did not achieve the anticipated results. This is due to the amount of words found in correlation to the words processed. It is needed to find a way to import lexeme data to Wikidata from another data source.

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:miun-40023
Date	January 2020
Creators	Samzelius, Simon
Publisher	Mittuniversitetet, Institutionen för informationssystem och –teknologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds

Lexeme Extraction for Wikidata : A proof of concept study for Swedish lexeme extraction

Description

Links & Downloads

Tags

Additional Fields