Global ETD Search

Return to search

Extractive Multi-document Summarization of News Articles

Publicly available data grows exponentially through web services and technological advancements. To comprehend large data-streams multi-document summarization (MDS) can be used. In this research, the area of multi-document summarization is investigated. Multiple systems for extractive multi-document summarization are implemented using modern techniques, in the form of the pre-trained BERT language model for word embeddings and sentence classification. This is combined with well proven techniques, in the form of the TextRank ranking algorithm, the Waterfall architecture and anti-redundancy filtering. The systems are evaluated on the DUC-2002, 2006 and 2007 datasets using the ROUGE metric. Where the results show that the BM25 sentence representation implemented in the TextRank model using the Waterfall architecture and an anti-redundancy technique outperforms the other implementations, providing competitive results with other state-of-the-art systems. A cohesive model is derived from the leading system and tried in a user study using a real-world application. The user study is conducted using a real-time news detection application with users from the news-domain. The study shows a clear favour for cohesive summaries in the case of extractive multi-document summarization. Where the cohesive summary is preferred in the majority of cases.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158275

NLP

extractive summarization

multi-document

neural embeddings

information extraction

text-to-text generation

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-158275
Date	January 2019
Creators	Grant, Harald
Publisher	Linköpings universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds

Extractive Multi-document Summarization of News Articles

Description

Links & Downloads

Tags

Additional Fields