Global ETD Search

Return to search

Improving an Information Retrieval System by Using Machine Learning to Improve User Relevance Feedback / Förbättring av ett informationssökningssystem genom att använda maskininlärning för att förbättra relevansåterkoppling från en användare

The aim of this thesis work is to improve the performance of an already existing information retrieval system that uses relevance feedback for performing query expansion. It is a constant goal to improve this system because the docu- ments that are retrieved are a base for various data analysis tasks. It is therefore important that the precision and re- call are high. A user can choose to give relevance feedback when executing a query, meaning the user can mark docu- ments in the search result as relevant or irrelevant and redo the search based on this feedback. The original query will then be expanded based on the user’s feedback. The ap- proach presented in this thesis uses the documents marked as relevant or irrelevant to train a classifier that can classify unknown documents from the search result as either rele- vant, irrelevant or unknown. The aim is to classify unknown documents and add them to the set of feedback documents that are used for the query expansion. The assumption that this thesis is based on is that the more feedback a user gives, the better the query expansion will perform. The system developed in this thesis is evaluated for the English language. The results in this thesis show that integrating the classifier in the existing system improved the perfor- mance in three out of four use cases. The existing system already has a good performance, but small improvements are important. It would therefore be beneficial to integrate it into the existing system. / I detta examensarbetet så är målet att förbättra ett exi- sterande informationssökningssystem som använder sig av relevansåterkoppling för att utföra sökfrågeexpansion. Det finns en konstant efterfrågan att förbättra prestandan av detta system då de dokument som returneras används för olika dataanalysuppgifter. Därför är det viktigt att både precision och täckning är så högt som möjligt. En använ- dare kan välja att ge relevansåterkoppling, vilket betyder att användaren markerar dokument som är relevanta och irrelevanta, vilket sedan används för att utföra sökfråge- expansion. Den initiala sökfrågan expanderas utifrån in- formation från relevansåterkopplingen. Tillvägagångssättet som presenteras i detta arbete använder de markerade do- kumenten för att träna en maskininlärningsmodell som kan klassificera oklassade document som relevanta, irrelevanat eller okända. Målet är att klassificera okända dokument och sedan lägga till dem till uppsättningen av relevansåterkopp- lingsdokument som användaren har markerat. Antagandet som denna metod baseras på är att ju mer relevansåter- koppling som ges, desto bättre sökfrågeexpansion kan sy- stemet utföra. Systemet som utvecklades i detta examens- arbete är byggt för och evaluerat mot data som äs skrivet på engelska. Resultaten i detta arbete visar att denna metod förbättrade resultaten i tre utav fyra testfall. Prestandan för det existerande systemet är redan på en hög nivå, men små förbättringar är viktiga. Det skulle vara en fördel att integrera detta i det existerande systemet.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185184

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-185184
Date	January 2016
Creators	Nordin, Alexandra
Publisher	KTH, Skolan för datavetenskap och kommunikation (CSC)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0032 seconds

Improving an Information Retrieval System by Using Machine Learning to Improve User Relevance Feedback / Förbättring av ett informationssökningssystem genom att använda maskininlärning för att förbättra relevansåterkoppling från en användare

Description

Links & Downloads

Tags

Additional Fields