Text has been the historical way of preserving and acquiring knowledge, and text data today is an increasingly growing part of the digital footprint together with the need to query this data for information. Seeking information is a constant ongoing process, and is a crucial part of many systems all around us. The ability to perform fast and effective searches is a must when dealing with vast amounts of data. This thesis implements an information retrieval system based on the Swedish Defence Force's profession guide, with the aim to produce a system that retrieves relevant professions based on user defined queries of varying size. A number of Natural Language Processing techniques are investigated and implemented, in order to transform the gathered profession descriptions a document embedding model, doc2vec, was implemented resulting in document vectors that are compared to find similarities between documents. The final system was evaluated by domain experts, represented by active military personal that quantified the relevancy of the profession retrievals into a measurable performance. The system managed to retrieve relevant information for 46.6% and 56.6% of the long- and short text inputs respectively. Resulting in a much more generalized and capable system compared to the search function available at the profession guide today.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-516522 |
Date | January 2023 |
Creators | Harju Schnee, Andreas |
Publisher | Uppsala universitet, Datalogi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC IT, 1401-5749 ; 23038 |
Page generated in 0.002 seconds