As research papers grow in volume and in quantity, there is still to this day, a hassle to locate desired articles based on specific protein names and/or Protein-Protein-Interactions. This is due to the everlasting problem of extracting protein names and Protein-Protein-Interactions from bio-medical papers and articles. The goal of this thesis was to investigate an approach that suggests the use of the Lucene framework for storing and indexing different articles found in bio-medical databases and being able to effciently identify protein names and possible interactions that exist in them. The system, dubbed MasterPPI, locates protein names and Protein-Protein-Interaction keywords with the help of two dictionaries, and when these are found and labeled, determins a Protein-Protein-Interaction if a specific interaction-keyword is present in a sentence, between to protein names. When tested against the test collection from the IAS subtask in the BioCreAtIvE2 challenge; the prototype system achieved a f-score of 0.34, showing that the system has potential, but needs a great deal of work.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-11977 |
Date | January 2010 |
Creators | Klæboe, Espen |
Publisher | Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0015 seconds