This bachelor's thesis studied the difference in sentiment between different homographic or polysemous senses of individual words. It did this by training a linear regression model on a version of the British National corpus that had been disambiguated along WordNet word senses (synsets) and analysing sentiment data from SentiWordNet. Results were partial, but indicated that word senses differ somewhat in sentiment. In the process of this study, a new and improved version of the Lesk disambiguation algorithm was also developed, named Nomalised Lesk. The validation of that algorithm compared to the regular Lesk algorithm is presented here as well.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:su-230389 |
Date | January 2024 |
Creators | Ljung, Oskar |
Publisher | Stockholms universitet, Avdelningen för datorlingvistik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0016 seconds