Global ETD Search

Return to search

Complex Word Identification for Swedish

Complex Word Identification (CWI) is a task of identifying complex words in text data and it is often viewed as a subtask of Automatic Text Simplification (ATS) where the main task is making a complex text simpler. The ways in which a text should be simplified depend on the target readers such as second language learners or people with reading disabilities. In this thesis, we focus on Complex Word Identification for Swedish. First, in addition to exploring existing resources, we collect a new dataset for Swedish CWI. We continue by building several classifiers of Swedish simple and complex words. We then use the findings to analyze the characteristics of lexical complexity in Swedish and English. Our method for collecting training data based on second language learning material has shown positive evaluation scores and resulted in a new dataset for Swedish CWI. Additionally, the built complex word classifiers have an accuracy at least as good as similar systems for English. Finally, the analysis of the selected features confirms the findings of previous studies and reveals some interesting characteristics of lexical complexity.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-352349

complex word identification

lexical complexity

natural language processing

automatic text simplification

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-352349
Date	January 2018
Creators	Smolenska, Greta
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0118 seconds

Complex Word Identification for Swedish

Description

Links & Downloads

Tags

Additional Fields