Return to search

FINE-TUNE A LANGUAGE MODEL FOR TEXT SUMMARIZATION (BERTSUM) ON EDGAR-CORPUS

Financial reports include a lot of useful information for investors, but extracting this information is time-consuming. We think text summarization is a feasible method. In this thesis, we implement BERTSUM, a state-of-the-art language model for text summarization, and evaluate the results by ROUGE metrics. The experiment was carried out on a novel and large-scale financial dataset called EDGAR-CORPUS. The BERTSUM with a transformer achieves the best performance with a ROUGE-L F1 score of 9.26%. We also hand-picked some model-generated summaries that contained common errors and investigated the causes. The results were then compared to previous research. The ROUGE-L F1 value in the previous study was much higher than ours, we think this is due to the length of the financial reports.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-477220
Date January 2022
CreatorsNiu, Yijie
PublisherUppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds