Financial reports include a lot of useful information for investors, but extracting this information is time-consuming. We think text summarization is a feasible method. In this thesis, we implement BERTSUM, a state-of-the-art language model for text summarization, and evaluate the results by ROUGE metrics. The experiment was carried out on a novel and large-scale financial dataset called EDGAR-CORPUS. The BERTSUM with a transformer achieves the best performance with a ROUGE-L F1 score of 9.26%. We also hand-picked some model-generated summaries that contained common errors and investigated the causes. The results were then compared to previous research. The ROUGE-L F1 value in the previous study was much higher than ours, we think this is due to the length of the financial reports.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-477220 |
Date | January 2022 |
Creators | Niu, Yijie |
Publisher | Uppsala universitet, Statistiska institutionen |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0017 seconds