Return to search

Methods for increasing cohesion in automatically extracted summaries of Swedish news articles : Using and extending multilingual sentence transformers in the data-processing stage of training BERT models for extractive text summarization / Metoder för att öka kohesionen i automatiskt extraherade sammanfattningar av svenska nyhetsartiklar

Developments in deep learning and machine learning overall has created a plethora of opportunities for easier training of automatic text summarization (ATS) models for producing summaries with higher quality. ATS can be split into extractive and abstractive tasks; extractive models extract sentences from the original text to create summaries. On the contrary, abstractive models generate novel sentences to create summaries. While extractive summaries are often preferred over abstractive ones, summaries created by extractive models trained on Swedish texts often lack cohesion, which affects the readability and overall quality of the summary. Therefore, there is a need to improve the process of training ATS models in terms of cohesion, while maintaining other text qualities such as content coverage. This thesis explores and implements methods at the data-processing stage aimed at improving cohesion of generated summaries. The methods are based around Sentence-BERT for creating advanced sentence embeddings that can be used to rank sentences in a text in terms of if it should be included in the extractive summary or not. Three models are trained using different methods and evaluated using ROUGE, BERTScore for measuring content coverage and Coh-Metrix for measuring cohesion. The results of the evaluation suggest that the methods can indeed be used to create more cohesive summaries, although content coverage was reduced, which gives rise to the potential for extensive future exploration of further implementation.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-185729
Date January 2022
CreatorsAndersson, Elsa
PublisherLinköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0023 seconds