Return to search

Semantic search in historical documentation

Many organisations face problems with data digitisation and continuous data gathering. They often gather and store this data in outdated systems that are difficult to search through. In our thesis, we utilise the engineering method to investigate the feasibility of incorporating artificial intelligence to search a large corpus of data and find accurate answers. To achieve the thesis goal, we conducted a literature review, studying existing solutions that enhance flexibility and facilitate artificial intelligence operations to search in databases. This resulted in the choice of utilising OpenSearch. Within OpenSearch, we conducted an experiment investigating which sentence transformer for embedding the contextual meaning of sentences could be best utilised for semantic search in the database. We then evaluated the sentence transformers´s performance with the MS MARCO dataset measuring both speed and accuracy. Through the experiment we found two sentence transformers that outperformed the rest by a slight margin and that all the sentence transformers performed similarly overall. A notable result is that the sentence transformers specifically dedicated to semantic search and sentence transformers with larger dimensions did not perform better. Further, these results showed the easy combination of existing search engines that incorporate artificial intelligence to semantically search in the documentation and showed that this could be used within organisations to handle a large corpus of data.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:mdh-67550
Date January 2024
CreatorsWiklund, Edvin, Maranan Hansson, Ivan Kelly
PublisherMälardalens universitet, Akademin för innovation, design och teknik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds