Global ETD Search

Return to search

The Use of Distributional Semantics in Text Classification Models : Comparative performance analysis of popular word embeddings

In the field of Natural Language Processing, supervised machine learning is commonly used to solve classification tasks such as sentiment analysis and text categorization. The classical way of representing the text has been to use the well known Bag-Of-Words representation. However lately low-dimensional dense word vectors have come to dominate the input to state-of-the-art models. While few studies have made a fair comparison of the models' sensibility to the text representation, this thesis tries to fill that gap. We especially seek insight in the impact various unsupervised pre-trained vectors have on the performance. In addition, we take a closer look at the Random Indexing representation and try to optimize it jointly with the classification task. The results show that while low-dimensional pre-trained representations often have computational benefits and have also reported state-of-the-art performance, they do not necessarily outperform the classical representations in all cases.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-127991

distributional semantics

text classification

cnn

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-127991
Date	January 2016
Creators	Norlund, Tobias
Publisher	Linköpings universitet, Datorseende
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds

The Use of Distributional Semantics in Text Classification Models : Comparative performance analysis of popular word embeddings

Description

Links & Downloads

Tags

Additional Fields