Global ETD Search

Return to search

RAG-based data extraction : Mining information from second-life battery documents

With the constant evolution of Large Language Models (LLMs), methods for minimizing hallucinations are being developed to provide more truthful answers. By using Retrieval-Augmented Generation (RAG), external data can be provided to the model on which its answers should be based. This project aims at using RAG for a data extraction pipeline specified for second-life batteries. By pre-defining the prompts the user may only provide the documents that are wished to be analyzed, this is to ensure that the answers are in the correct format for further data processing. To process different document types, initial labeling takes place before more specific extraction suitable for the document can be applied. Best performance is achieved by grouping questions that allow the model to reason around what the relevant questions are so that no hallucinations occur. Regardless of whether there are two or three document types, the model performs equally well, and it is clear that a pipeline of this type is well suited to today's models. Further improvements can be achieved by utilizing models containing a larger context window and initially using Optical Character Recognition (OCR) to read text from the documents.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-533357

RAG

Retrieval-Augmented Generation

LLM

Data extraction

second-life battery

data extraction pipeline

data extraction

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-533357
Date	January 2024
Creators	Edström, Jesper
Publisher	Uppsala universitet, Avdelningen för systemteknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC F, 1401-5757 ; 24025

Page generated in 0.0014 seconds

RAG-based data extraction : Mining information from second-life battery documents

Description

Links & Downloads

Tags

Additional Fields