Twitter are a new source of information for data mining techniques. Messages posted through Twitter provide a major information source to gauge public sentiment on topics ranging from politics to fashion trends. The purpose of this paper is to analyze the Twitter tweets to discern the opinions of users regarding Genetically Modified Organisms (GMOs). We examine the effectiveness of several classifiers, Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Logistic Regression and Linear Support Vector Classifier (SVC) in identifying a positive, negative or neutral category on a tweet corpus. Additionally, we use three datasets in this experiment to examine which dataset has the best score. Comparing the classifiers, we discovered that GMO_NDSU has the highest score in each classifier of my experiment among three datasets, and Linear SVC had the highest consistent accuracy by using bigrams as feature extraction and Term Frequency, Chi Square as feature selection.
Identifer | oai:union.ndltd.org:ndsu.edu/oai:library.ndsu.edu:10365/25787 |
Date | January 2016 |
Creators | Li, Hanzhe |
Publisher | North Dakota State University |
Source Sets | North Dakota State University |
Detected Language | English |
Type | text/thesis |
Format | application/pdf |
Rights | NDSU Policy 190.6.2, https://www.ndsu.edu/fileadmin/policy/190.pdf |
Page generated in 0.0021 seconds