• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Semantic search in historical documentation

Wiklund, Edvin, Maranan Hansson, Ivan Kelly January 2024 (has links)
Many organisations face problems with data digitisation and continuous data gathering. They often gather and store this data in outdated systems that are difficult to search through. In our thesis, we utilise the engineering method to investigate the feasibility of incorporating artificial intelligence to search a large corpus of data and find accurate answers. To achieve the thesis goal, we conducted a literature review, studying existing solutions that enhance flexibility and facilitate artificial intelligence operations to search in databases. This resulted in the choice of utilising OpenSearch. Within OpenSearch, we conducted an experiment investigating which sentence transformer for embedding the contextual meaning of sentences could be best utilised for semantic search in the database. We then evaluated the sentence transformers´s performance with the MS MARCO dataset measuring both speed and accuracy. Through the experiment we found two sentence transformers that outperformed the rest by a slight margin and that all the sentence transformers performed similarly overall. A notable result is that the sentence transformers specifically dedicated to semantic search and sentence transformers with larger dimensions did not perform better. Further, these results showed the easy combination of existing search engines that incorporate artificial intelligence to semantically search in the documentation and showed that this could be used within organisations to handle a large corpus of data.

Page generated in 0.1135 seconds