Natural language processing, a subfield of artificial intelligence and computer science, has recently been of great research interest due to the vast amount of information created on the internet in the modern era. One of the main natural language processing areas concerns sentiment analysis. This is a field that studies the polarity of human natural language and generally tries to categorize it as either positive, negative or neutral. In this thesis, sentiment analysis has been applied to research reports written by equity analysts. The objective has been to investigate if there exist a distinct distribution of the reports and if one is able to classify sentiment in these reports. The thesis consist of two parts; firstly investigating possibilities on how to divide the reports into different sentiment labelling regimes and secondly categorizing the sentiment using machine learning techniques. Logistic regression as well as several convolutional neural network structures has been used to classify the sentiment. Working with textual data requires the mapping of text to real valued values called features. Several feature extraction methods has been investigated including Bag of Words, term frequency-inverse document frequency and Word2vec. Out of the tested labelling regimes, classifying the documents using upgrades and downgrades of report recommendation shows the most promising potential. For this regime, the convolutional neural network architectures outperform logistic regression by a significant margin. Out of the networks tested, a double input channel utilizing two different Word2vec representations performs the best. The two different representations originate from different sources; one from the set of equity research reports and the other trained by the Google Brain team on an extensive Google news data set. This suggests that using one representation that represent topic specific words and one that is better at representing more common words enhances classification performance.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-388586 |
Date | January 2019 |
Creators | Olof, Löfving |
Publisher | Uppsala universitet, Avdelningen för beräkningsvetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC F, 1401-5757 ; 19044 |
Page generated in 0.0222 seconds