This thesis investigates Swedish lexical blends. A lexical blend is defined as the concatenation of two words, where at least one word has been reduced. Lexical blends are approached from two perspectives. First, the thesis investigates lexical blends as they appear in the Swedish language. It is found that there is a significant statistical relationship between the two source words in terms of orthographic, phonemic and syllabic length and frequency in a reference corpus. Furthermore, some uncommon lexical blends created from pronouns and interjections are described. A description of lexical blends through semantic construction and similarity to other word formation processes are also described. Secondly, the thesis develops a model which predicts source words of lexical blends. To predict the source words a logistic regression model is used. The evaluation shows that using a ranking approach, the correct source words are the highest ranking word pair in 32.2% of the cases. In the top 10 ranking word pairs, the correct word pair is found in 60.6% of the cases. The results are lower than in previous studies, but the number of blends used is also smaller. It is shown that lexical blends which overlap are easier to predict than lexical blends which do not overlap. Using feature ablation, it is shown that semantic and frequency related features have the most important for the prediction of source words.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-160763 |
Date | January 2018 |
Creators | Ek, Adam |
Publisher | Stockholms universitet, Avdelningen för datorlingvistik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0024 seconds