• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Development of a Semantic Search Tool for Swedish Legal Judgements Based on Fine-Tuning Large Language Models

Mikkelsen Toth, Sebastian January 2024 (has links)
Large language models (LLMs) are very large deep learning models which are retrained on a huge amount of data. Among the LLMs are sentence bidirectional encoder representations from transformers (SBERT) where advanced training methods such as transformer-based denoising autoEncoder (TSDAE), generative query network (GenQ) and an adaption of generative pseudo labelling (GPL) can be applied. This thesis project aims to develop a semantic search tool for Swedish legal judgments in order to overcome the limitations of traditional keyword searches in legal document retrieval. For this aim, a model adept at understanding the semantic nuances of legal language has been developed by leveraging natural language processing (NLP) and fine- tuning LLMs like SBERT, using advanced training methods such as TSDAE, GenQ, and an adaption of GPL. To generate labeled data out of unlabelled data, a GPT3.5 model was used after it was fine-tuned. The generation of labeled data with the use of a generative model was crucial for this project to train the SBERT efficiently. The search tool has been evaluated. The evaluation demonstrates that the search tool can accurately retrieve relevant documents based on semantic queries and simnifically improve the efficiency and accuracy of legal research. GenQ has been shown to be the most efficient training method for this use case.

Page generated in 0.097 seconds