Global ETD Search

Return to search

Labelling factual information in legal cases using fine-tuned BERT models

Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230

Machine learning

Natural language processing

Probability Theory and Statistics

Sannolikhetsteori och statistik

Law and Society

Juridik och samhälle

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-447230
Date	January 2021
Creators	Wenestam, Arvid
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.002 seconds

Labelling factual information in legal cases using fine-tuned BERT models

Description

Links & Downloads

Tags

Additional Fields