Global ETD Search

Return to search

FINE-TUNE A LANGUAGE MODEL FOR TEXT SUMMARIZATION (BERTSUM) ON EDGAR-CORPUS

Financial reports include a lot of useful information for investors, but extracting this information is time-consuming. We think text summarization is a feasible method. In this thesis, we implement BERTSUM, a state-of-the-art language model for text summarization, and evaluate the results by ROUGE metrics. The experiment was carried out on a novel and large-scale financial dataset called EDGAR-CORPUS. The BERTSUM with a transformer achieves the best performance with a ROUGE-L F1 score of 9.26%. We also hand-picked some model-generated summaries that contained common errors and investigated the causes. The results were then compared to previous research. The ROUGE-L F1 value in the previous study was much higher than ours, we think this is due to the length of the financial reports.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-477220

Machine Learning

Natural Language Processing

Text Summarization

Transformers

Neural Networks

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-477220
Date	January 2022
Creators	Niu, Yijie
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds

FINE-TUNE A LANGUAGE MODEL FOR TEXT SUMMARIZATION (BERTSUM) ON EDGAR-CORPUS

Description

Links & Downloads

Tags

Additional Fields