Wikipedia has a problem with organizing and managing data as well as references. As a solution, they created Wikidata to make it possible for machines to interpret these data, with the help of lexemes. A lexeme is an abstract lexical unit which consists of a word’s lemmas and its word class. The object of this paper is to present one possible way to provide Swedish lexeme data to Wikidata. This was implemented in two phases, namely, the first phase was to identify the lemmas and their word classes; the second phase was to process these words to create coherent lexemes. The developed model was able to process large amounts of words from the data source but barely succeeded to generate coherent lexemes. Although the lexemes was supposed to provide an efficient way of data understanding for machines, the obtained results lead to the conclusion that the developed model did not achieve the anticipated results. This is due to the amount of words found in correlation to the words processed. It is needed to find a way to import lexeme data to Wikidata from another data source.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:miun-40023 |
Date | January 2020 |
Creators | Samzelius, Simon |
Publisher | Mittuniversitetet, Institutionen för informationssystem och –teknologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds