Return to search

Aspect discovery and sentiment classification for online reviews

Buying products and services online is becoming increasingly popular and as a result there are a vast number of online reviews. Automatic classification of this increasing large data has become a popular area of interest in recent research as the information contained in these reviews is valuable to potential customers and marketing intelligence. The work in this thesis is focused on discovering aspects and sentiment of online reviews using a topic modelling based approach. Sentiment analysis is to automatically discover opinions whereas topic modelling discovers latent topics. Topic modelling is combined with sentiment analysis techniques to create an effective approach to sentiment analysis. There are three problems which are addressed in this work. Firstly, the classes of real world product reviews tend to be highly imbalanced. When dealing with unbalanced data, data miners usually pre-process the unbalanced data so that they are class-balanced. This work therefore studies the comparison of balanced vs unbalanced datasets, and aims to answer the question: how to model unbalanced data sets, either artificially balance them or keep them unbalanced as they are? A series of experiments are performed to investigate the datasets in different scenarios. Experimental results provide evidence that within the product review domain there is no need to artificially balance a dataset as sentiment analysis on an unbalanced dataset performs better than a balanced dataset. Secondly, the LDA (Latent Dirichlet allocation) model is a popular choice for topic modelling, however the model comes with some shortcomings including identifying topics which could be considered too broad and the manual work to label all the topics produced. This work proposes a novel method, the Twofold-LDA model, to identify aspects and quantify sentiment, which incorporates domain knowledge, removes the one aspect per sentence assumption, and extracts such information that allows the sentiment analysis results to be presented in a user-friendly way. Finally, there has been no known work which focuses on ways to improve topic modelling to perform sentiment classification. As past studies show sentiment analysis techniques provide good performance for identifying sentiment, this work looks at how to incorporate sentiment analysis techniques into the topic modelling process. The Enhanced Twofold-LOA model is proposed which incorporates part-of-speech tagging into the topic / modelling process via altering the Gibbs sampling process. A case study is carried out to demonstrate the ability of the Enhanced Twofold-LDA model for solving practical problems, in particular through creating an end user application aimed at hotel customers.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:625498
Date January 2013
CreatorsBurns, Nicola
PublisherUniversity of Ulster
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0017 seconds