There is high demand for computational tools that can automatically label tweets (Twitter messages) as having positive or negative sentiment, but great effort and expense would be required to build a large enough hand-labeled training corpus on which to apply standard machine learning techniques. Going beyond current keyword-based heuristic techniques, this paper uses emoticons (e.g. ':)' and ':(') to collect a large training set with noisy labels using little human intervention and trains a Maximum Entropy classifier on that training set. Results on two hand-labeled test corpora are compared to various baselines and a keyword-based heuristic approach, with the machine learned classifier significantly outperforming both. / text
Identifer | oai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/ETD-UT-2011-08-3823 |
Date | 02 February 2012 |
Creators | Speriosu, Michael Adrian |
Source Sets | University of Texas |
Language | English |
Detected Language | English |
Type | thesis |
Format | application/pdf |
Page generated in 0.0028 seconds