Return to search

Specificity Prediction for Sentences in Press Releases

Specificity is an important factor to text analysis. While much research on sentence specificity experiments upon news, very little is known about press releases. Our study is devoted to specificity in press releases, which are journalistic documents that companies share with the press and other media outlets. In this research, we analyze press releases about digital transformation written by pump companies, and develop tools for automatic measurement of sentence specificity. The goal of the research is to 1) explore the effects of data combination, 2) analyze features for specificity prediction, and 3) compare the effectiveness of classification and probability estimation. Through our experiment on various combinations of training data, we find that adding news data to the model effectively improves probability estimation, but the effects on classification are not noticeable. In terms of features, we find that the sentence length plays an essential role in specificity prediction. We remove twelve insignificant features, and this modification results in a model running faster as well as achieving comparable scores. We also find that both classification and probability estimation have drawbacks. With regard to probability estimation, models can score well by only making predictions around the threshold. Binary classification depends on the threshold, and threshold setting requires consideration. Besides, classification scores cannot sift out models that make unreliable judgement about high and low specificity sentences.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-413515
Date January 2020
CreatorsHe, Tiantian
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.1929 seconds