Global ETD Search

Return to search

Development of a Semantic Search Tool for Swedish Legal Judgements Based on Fine-Tuning Large Language Models

Large language models (LLMs) are very large deep learning models which are retrained on a huge amount of data. Among the LLMs are sentence bidirectional encoder representations from transformers (SBERT) where advanced training methods such as transformer-based denoising autoEncoder (TSDAE), generative query network (GenQ) and an adaption of generative pseudo labelling (GPL) can be applied. This thesis project aims to develop a semantic search tool for Swedish legal judgments in order to overcome the limitations of traditional keyword searches in legal document retrieval. For this aim, a model adept at understanding the semantic nuances of legal language has been developed by leveraging natural language processing (NLP) and fine- tuning LLMs like SBERT, using advanced training methods such as TSDAE, GenQ, and an adaption of GPL. To generate labeled data out of unlabelled data, a GPT3.5 model was used after it was fine-tuned. The generation of labeled data with the use of a generative model was crucial for this project to train the SBERT efficiently. The search tool has been evaluated. The evaluation demonstrates that the search tool can accurately retrieve relevant documents based on semantic queries and simnifically improve the efficiency and accuracy of legal research. GenQ has been shown to be the most efficient training method for this use case.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-533649

Large Language Models

Semantic search

Fine-tune Embedding model

Computer and Information Sciences

Data- och informationsvetenskap

Engineering and Technology

Teknik och teknologier

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-533649
Date	January 2024
Creators	Mikkelsen Toth, Sebastian
Publisher	Uppsala universitet, Signaler och system
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC F, 1401-5757 ; 24050

Page generated in 0.0018 seconds

Development of a Semantic Search Tool for Swedish Legal Judgements Based on Fine-Tuning Large Language Models

Description

Links & Downloads

Tags

Additional Fields