Global ETD Search

Identification of Publications on Disordered Proteins from PubMed

Indiana University-Purdue University Indianapolis (IUPUI) / The literature corresponding to disordered proteins has been on a rise. As the number of publications increase, the time and effort needed to manually identify the relevant publications and protein information to add to centralized repository (called DisProt) is becoming arduous and critical. Existing search facilities on PubMed can retrieve a seemingly large number of publications based on keywords and does not have any support for ranking them based on the probability of the protein names mentioned in a given abstract being added to DisProt. This thesis explores a novel system of using disorder predictors and context based dictionary methods to quickly identify publications on disordered proteins from the PubMed database.

NLProt, which is built around Support Vector Machines, is used to identify protein names and PONDR-FIT which is an Artificial Neural Network based meta- predictor is used for identifying protein disorder. The work done in this thesis is of immediate significance in identifying disordered protein names.

We have tested the new system on 100 abstracts from DisProt [these abstracts were found to be relevant to disordered proteins and were added to DisProt manually by the annotators.] This system had an accuracy of 87% on this test set. We then took another 100 recently added abstracts from PubMed and ran our algorithm on them. This time it had an accuracy of 68%. We suggested improvements to increase the accuracy and believe that this system can be applied for identifying disordered proteins from literature.

DisProt, Database, Software Tool

Proteins -- Analysis

Bioinformatics

Database searching

Genomics -- Data processing

Identifer	oai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/2885
Date	07 August 2012
Creators	Sirisha, Peyyeti
Contributors	Xia, Yuni, Dunker, A. Keith, Chen, Jake
Source Sets	Indiana University-Purdue University Indianapolis
Language	en_US
Detected Language	English

Page generated in 0.0027 seconds

Identification of Publications on Disordered Proteins from PubMed

Description

Links & Downloads

Tags

Additional Fields