Global ETD Search

Return to search

Coping with Missing and Incomplete Information in Natural Language Processing with Applications in Sentiment Analysis and Entity Matching

Much work in Natural Language Processing (NLP) is broadly concerned with extracting useful information from unstructured text passages. In recent years there has been an increased focus on informal writing as is found in online venues such as Twitter and Yelp. Processing this text introduces additional difficulties for NLP techniques, for example, many of the terms may be unknown due to rapidly changing vocabulary usage. A straightforward NLP approach will not have any capability of using the information these terms provide. In such \emph{information poor} environments of missing and incomplete information, it is necessary to develop novel, clever methods for leveraging the information we have explicitly available to unlock key nuggets of implicitly available information. In this work we explore several such methods and how they can collectively help to improve NLP techniques in general, with a focus on Sentiment Analysis (SA) and Entity Matching (EM). The problem of SA is that of identifying the polarity (positive, negative, neutral) of a speaker or author towards the topic of a given piece of text. SA can focus on various levels of granularity. These include finding the overall sentiment of a long text document, finding the sentiment of individual sentences or phrases, or finding the sentiment directed toward specific entities and their aspects (attributes). The problem of EM, also known as Record Linkage, is the problem of determining records from independent and uncooperative data sources that refer to the same real-world entities. Traditional approaches to EM have used the record representation of entities to accomplish this task. With the nascence of social media, entities on the Web are now accompanied by user generated content, which allows us to apply NLP solutions to the problem. We investigate specifically the following aspects of NLP for missing and incomplete information: (1) Inferring a sentiment polarity (i.e., the positive, negative, and neutral composition) of new terms. (2) Inferring a representation of new vocabulary terms that allows us to compare these terms with known terms in regards to their meaning and sentiment orientation. This idea can be further expanded to derive the representation of larger chunks of text, such as multi-word phrases. (3) Identifying key attributes of highly salient sentiment bearing passages that allow us to identify such sections of a document, even when the complete text is not analyzable. (4) Using text based methods to match corresponding entities (e.g., restaurants or hotels) from independent data sources that may miss key identifying attributes such as names or addresses. / Computer and Information Science

http://hdl.handle.net/20.500.12613/3535

Computer Science

Natural Language Processing

Identifer	oai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/3535
Date	January 2020
Creators	Schneider, Andrew Thomas
Contributors	Dragut, Eduard Constantin, Obradovic, Zoran, Vucetic, Slobodan, Meng, Weiyi
Publisher	Temple University. Libraries
Source Sets	Temple University
Language	English
Detected Language	English
Type	Thesis/Dissertation, Text
Format	142 pages
Rights	IN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relation	http://dx.doi.org/10.34944/dspace/3517, Theses and Dissertations

Page generated in 0.0023 seconds

Coping with Missing and Incomplete Information in Natural Language Processing with Applications in Sentiment Analysis and Entity Matching

Description

Links & Downloads

Tags

Additional Fields