Return to search

Improving Performance of Biomedical Information Retrieval using Document-Level Field Boosting and BM25F Weighting

Corpora of biomedical information typically contains large amounts of ambiguous data, as proteins and genes can be referred to by a number of different terms, making information retrieval difficult. This thesis investigates a number of methods attempting to increase precision and recall of searches within the biomedical domain, including using the BM25F model for scoring documents and using Named Entity Recognition (NER) to identify biomedical entities in the text. We have implemented a prototype for testing the approaches, and have found that by using a combination of several methods, including using three different NER models at once, a significant increase (up to 11.5%) in mean average precision (MAP) is observed over our baseline result.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-11979
Date January 2010
CreatorsJervidalo, Jørgen
PublisherNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0013 seconds