Global ETD Search

Return to search

Sentiment analysis of Swedish reviews and transfer learning using Convolutional Neural Networks

Sentiment analysis is a field within machine learning that focus on determine the contextual polarity of subjective information. It is a technique that can be used to analyze the "voice of the customer" and has been applied with success for the English language for opinionated information such as customer reviews, political opinions and social media data. A major problem regarding machine learning models is that they are domain dependent and will therefore not perform well for other domains. Transfer learning or domain adaption is a research field that study a model's ability of transferring knowledge across domains. In the extreme case a model will train on data from one domain, the source domain, and try to make accurate predictions on data from another domain, the target domain. The deep machine learning model Convolutional Neural Network (CNN) has in recent years gained much attention due to its performance in computer vision both for in-domain classification and transfer learning. It has also performed well for natural language processing problems but has not been investigated to the same extent for transfer learning within this area. The purpose of this thesis has been to investigate how well suited the CNN is for cross-domain sentiment analysis of Swedish reviews. The research has been conducted by investigating how the model perform when trained with data from different domains with varying amount of source and target data. Additionally, the impact on the model’s transferability when using different text representation has also been studied. This study has shown that a CNN without pre-trained word embedding is not that well suited for transfer learning since it performs worse than a traditional logistic regression model. Substituting 20% of source training data with target data can in many of the test cases boost the performance with 7-8% both for the logistic regression and the CNN model. Using pre-trained word embedding produced by a word2vec model increases the CNN's transferability as well as the in-domain performance and outperform the logistic regression model and the CNN model without pre-trained word embedding in the majority of test cases.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-339066

Sentiment analysis

transfer learning

domain adaption

convolutional neural networks

cnn

machine learning

word2vec

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-339066
Date	January 2018
Creators	Sundström, Johan
Publisher	Uppsala universitet, Avdelningen för systemteknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC STS, 1650-8319 ; 18001

Page generated in 0.002 seconds

Sentiment analysis of Swedish reviews and transfer learning using Convolutional Neural Networks

Description

Links & Downloads

Tags

Additional Fields