Return to search

Labelling factual information in legal cases using fine-tuned BERT models

Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-447230
Date January 2021
CreatorsWenestam, Arvid
PublisherUppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds