Sentiment analysis is a challenging task today for computational linguistics. Because of the rise of the social Web, both the research and the industry are interested in automatic processing of opinions in text. In this work, we assume a multilingual and multidomain environment and aim at automatic and adaptive polarity classification.We propose a method for automatic construction of multilingual affective lexicons from microblogging to cover the lack of lexical resources. To test our method, we have collected over 2 million messages from Twitter, the largest microblogging platform, and have constructed affective resources in English, French, Spanish, and Chinese.We propose a text representation model based on dependency parse trees to replace a traditional n-grams model. In our model, we use dependency triples to form n-gram like features. We believe this representation covers the loss of information when assuming independence of words in the bag-of-words approach.Finally, we investigate the impact of entity-specific features on classification of minor opinions and propose normalization schemes for improving polarity classification. The proposed normalization schemes gives more weight to terms expressing sentiments and lower the importance of noisy features.The effectiveness of our approach has been proved in experimental evaluations that we have performed across multiple domains (movies, product reviews, news, blog posts) and multiple languages (English, French, Russian, Spanish, Chinese) including official participation in several international evaluation campaigns (SemEval'10, ROMIP'11, I2B2'11).
Identifer | oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00717329 |
Date | 13 June 2012 |
Creators | Pak, Alexander |
Publisher | Université Paris Sud - Paris XI |
Source Sets | CCSD theses-EN-ligne, France |
Language | English |
Detected Language | English |
Type | PhD thesis |
Page generated in 0.002 seconds