Global ETD Search

Return to search

Specificity Prediction for Sentences in Press Releases

Specificity is an important factor to text analysis. While much research on sentence specificity experiments upon news, very little is known about press releases. Our study is devoted to specificity in press releases, which are journalistic documents that companies share with the press and other media outlets. In this research, we analyze press releases about digital transformation written by pump companies, and develop tools for automatic measurement of sentence specificity. The goal of the research is to 1) explore the effects of data combination, 2) analyze features for specificity prediction, and 3) compare the effectiveness of classification and probability estimation. Through our experiment on various combinations of training data, we find that adding news data to the model effectively improves probability estimation, but the effects on classification are not noticeable. In terms of features, we find that the sentence length plays an essential role in specificity prediction. We remove twelve insignificant features, and this modification results in a model running faster as well as achieving comparable scores. We also find that both classification and probability estimation have drawbacks. With regard to probability estimation, models can score well by only making predictions around the threshold. Binary classification depends on the threshold, and threshold setting requires consideration. Besides, classification scores cannot sift out models that make unreliable judgement about high and low specificity sentences.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413515

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-413515
Date	January 2020
Creators	He, Tiantian
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.002 seconds

Specificity Prediction for Sentences in Press Releases

Description

Links & Downloads

Tags

Additional Fields