Return to search

Named Entity Recognition for Social Media Text

This thesis aims to perform named entity recognition for English social media texts. Named Entity Recognition (NER) is applied in many NLP tasks as an important preprocessing procedure. Social media texts contain lots of real-time data and therefore serve as a valuable source for information extraction. Nevertheless, NER for social media texts is a rather challenging task due to the noisy context. Traditional approaches to deal with this task use hand-crafted features but prove to be both time-consuming and very task-specific. As a result, they fail to deliver satisfactory performance. The goal of this thesis is to tackle this task by automatically identifying and annotating the named entities with multiple types with the help of neural network methods. In this thesis, we experiment with three different word embeddings and character embedding neural network architectures that combine long short- term memory (LSTM), bidirectional LSTM (BI-LSTM) and conditional random field (CRF) to get the best result. The data and evaluation tool comes from the previous shared tasks on Noisy User-generated Text (W- NUT) in 2017. We achieve the best F1 score 42.44 using BI-LSTM-CRF with character-level representation extracted by a BI-LSTM, and pre-trained word embeddings trained by GloVe. We also find out that the results could be improved with larger training data sets.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-395978
Date January 2019
CreatorsZhang, Yaxi
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0021 seconds