Return to search

Investigations of Free Text Indexing Using NLP : Comparisons of Search Algorithms and Models in Apache Solr / Undersöka hur fritextindexering kan förbättras genom NLP

As Natural Language Processing progresses societal and applications like OpenAI obtain more considerable popularity in society, businesses encourage the integration of NLP into their systems. Both to improve the user experience and provide users with their requested information. For case management systems, a complicated task is to provide the user with relevant documents, since customers often have large databases containing similar information. This presumes that the user needs to match the requested topic perfectly. Imagine if there was a solution to search for context, instead of formulating the perfect prompt, via established NLP models like BERT. Imagine if the system understood its content. This thesis aims to investigate how a free text index can be improved using NLP from a user perspective and implement it. Using AI to help a free text index, in this case, Apache Solr, can make it easier for users to find the specific content the users are looking for. It is interesting to see how the search can be improved with the help of NLP models and present a more relevant result for the user. NLP can improve user prompts, known as queries, and assist in indexing the information. The task is to conduct a practical investigation by configuring the free text database Apache Solr, with and without NLP support. This is investigated by learning the search models' content, letting the search models provide their relevant search results, for some user queries, and evaluating the results. The investigated search models were a string-based model, an OpenNLP model, and BERT models segmented on paragraph level and sentence level. A hybrid search model of OpenNLP and BERT, on paragraph level, was the best solution overall.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-196156
Date January 2023
CreatorsSundstedt, Alfred
PublisherLinköpings universitet, Interaktiva och kognitiva system
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds