Global ETD Search

Return to search

Enhancing Document Retrieval in the FinTech Domain : Applications of Advanced Language Models

In this thesis, methods of creating an information retrieval (IR) model within the Fin-Tech domain are explored. Given the domain-specific and data-scarce environment, methods of artificially generating data to train and evaluate IR models are implemented and their limitations are discussed. The generative model GPT-J 6B is used to generate pseudo-queries for a document corpus, resulting in a training- and test-set of 148 and 166 query-document pairs respectively. Transformer-based models, fine-tuned- and original versions, are put to the test against the baseline model BM25 which historically has been seen as an effective document retrieval model. The models are evaluated using mean reciprocal rank at k (MRR@k) and time-cost to retrieve relevant documents. The main findings are that the historical BM25 model performs well in comparison to the transformer alternatives, it reaches the highest score for MRR@2 = 0.612. The results show that for MRR@5 and MRR@10, a combination model of BM25 and a cross encoder slightly outperforms the baseline reaching scores of MRR@5 = 0.655 and MRR@10 = 0.672. However, the increase in performance is slim and may not be enough to motivate an implementation. Finally, further research using real-world data is required to argue that transformer-based models are more robust in a real-world setting.

http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-226528

NLP

Information retrieval

Semantic similarity

Mathematics

Matematik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-226528
Date	January 2024
Creators	Hansen, Jesper
Publisher	Umeå universitet, Institutionen för matematik och matematisk statistik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds

Enhancing Document Retrieval in the FinTech Domain : Applications of Advanced Language Models

Description

Links & Downloads

Tags

Additional Fields