Many organisations face problems with data digitisation and continuous data gathering. They often gather and store this data in outdated systems that are difficult to search through. In our thesis, we utilise the engineering method to investigate the feasibility of incorporating artificial intelligence to search a large corpus of data and find accurate answers. To achieve the thesis goal, we conducted a literature review, studying existing solutions that enhance flexibility and facilitate artificial intelligence operations to search in databases. This resulted in the choice of utilising OpenSearch. Within OpenSearch, we conducted an experiment investigating which sentence transformer for embedding the contextual meaning of sentences could be best utilised for semantic search in the database. We then evaluated the sentence transformers´s performance with the MS MARCO dataset measuring both speed and accuracy. Through the experiment we found two sentence transformers that outperformed the rest by a slight margin and that all the sentence transformers performed similarly overall. A notable result is that the sentence transformers specifically dedicated to semantic search and sentence transformers with larger dimensions did not perform better. Further, these results showed the easy combination of existing search engines that incorporate artificial intelligence to semantically search in the documentation and showed that this could be used within organisations to handle a large corpus of data.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:mdh-67550 |
Date | January 2024 |
Creators | Wiklund, Edvin, Maranan Hansson, Ivan Kelly |
Publisher | Mälardalens universitet, Akademin för innovation, design och teknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0015 seconds