Return to search

Multi-label Classification and Sentiment Analysis on Textual Records

In this thesis we have present effective approaches for two classic Nature Language Processing tasks: Multi-label Text Classification(MLTC) and Sentiment Analysis(SA) based on two datasets.
For MLTC, a robust deep learning approach based on convolution neural network(CNN) has been introduced. We have done this on almost one million records with a related label list consists of 20 labels. We have divided our data set into three parts, training set, validation set and test set. Our CNN based model achieved great result measured in F1 score. For SA, data set was more informative and well-structured compared with MLTC. A traditional word embedding method, Word2Vec was used for generating word vector of each text records. Following that, we employed several classic deep learning models such as Bi-LSTM, RCNN, Attention mechanism and CNN to extract sentiment features. In the next step, a classification frame was designed to graded. At last, the start-of-art language model, BERT which use transfer learning method was employed.
In conclusion, we compared performance of RNN-based model, CNN-based model and pre-trained language model on classification task and discuss their applicability. / Thesis / Master of Science in Electrical and Computer Engineering (MSECE) / This theis purposed two deep learning solution to both multi-label classification problem and sentiment analysis problem.

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/24627
Date January 2019
CreatorsGuo, Xintong
ContributorsChen, Jun, Electrical and Computer Engineering
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.002 seconds