This thesis offers new input in the field of generating epithets to aid the comprehension of Swedish texts. For whatever reason, a reader might find certain words in a text difficult to understand. For example, they may never have come across the term moussaka before; however, by the simple expedient of assigning an explanatory epithet – in this case, “the dish” moussaka – they can hopefully continue reading uninterrupted. To do this, obscure phrases are identified and extracted based on word class, shallow token features and the Pareto Principle. An algorithm then extracts appropriate epithets for each word using the Wikipedia categorisation system. Although the algorithm developed for the study achieved underwhelming results when extracting obscure phrases, it did prove excellent at assigning appropriate epithets to nouns and proper nouns. With further research, this process can hopefully be utilised as a tool for improving the readability of any text.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-193074 |
Date | January 2023 |
Creators | Ragnarsson, Sebastian |
Publisher | Linköpings universitet, Institutionen för datavetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0025 seconds