Return to search

Searching for, and identifying Protein Information in the Literature

As research papers grow in volume and in quantity, there is still to this day, a hassle to locate desired articles based on specific protein names and/or Protein-Protein-Interactions. This is due to the everlasting problem of extracting protein names and Protein-Protein-Interactions from bio-medical papers and articles. The goal of this thesis was to investigate an approach that suggests the use of the Lucene framework for storing and indexing different articles found in bio-medical databases and being able to effciently identify protein names and possible interactions that exist in them. The system, dubbed MasterPPI, locates protein names and Protein-Protein-Interaction keywords with the help of two dictionaries, and when these are found and labeled, determins a Protein-Protein-Interaction if a specific interaction-keyword is present in a sentence, between to protein names. When tested against the test collection from the IAS subtask in the BioCreAtIvE2 challenge; the prototype system achieved a f-score of 0.34, showing that the system has potential, but needs a great deal of work.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-11977
Date January 2010
CreatorsKlæboe, Espen
PublisherNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds