Return to search

Sentiment analysis: Quantitative evaluation of subjective opinions using natural language processing

Sentiment Analysis consists of recognizing sentiment orientation towards specific subjects within natural language texts. Most research in this area focuses on classifying documents as positive or negative. The purpose of this thesis is to quantitatively evaluate subjective opinions of customer reviews using a five star rating system, which is widely used on on-line review web sites, and to try to make the predicted score as accurate as possible.
Firstly, this thesis presents two methods for rating reviews: classifying reviews by supervised learning methods as multi-class classification does, or rating reviews by using association scores of sentiment terms with a set of seed words extracted from the corpus, i.e. the unsupervised learning method. We extend the feature selection approach used in Turney's PMI-IR estimation by introducing semantic relatedness measures based up on the content of WordNet. This thesis reports on experiments using the two methods mentioned above for rating reviews using the combined feature set enriched with WordNet-selected sentiment terms. The results of these experiments suggest ways in which incorporating WordNet relatedness measures into feature selection may yield improvement over classification and unsupervised learning methods which do not use it.
Furthermore, via ordinal meta-classifiers, we utilize the ordering information contained in the scores of bank reviews to improve the performance, we explore the effectiveness of re-sampling for reducing the problem of skewed data, and we check whether discretization benefits the ordinal meta-learning process.
Finally, we combine the unsupervised and supervised meta-learning methods to optimize performance on our sentiment prediction task.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/28000
Date January 2008
CreatorsLi, Wenhui
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format181 p.

Page generated in 0.0124 seconds