The focus of this study is on predicting the subcellular localization of a protein. Subcellular localization
information is important for protein function annotation which is a fundamental problem in computational
biology. For this problem, a classification system is built that has two main parts: a predictor that is
based on a feature mapping technique to extract biologically meaningful information from protein sequences
and a client/server architecture for searching and predicting subcellular localizations. In the first part of the
thesis, we describe a feature mapping technique based on frequent patterns. In the feature mapping technique we describe,
frequent patterns in a protein sequence dataset were identified using a search technique based on a priori
property and the distribution of these patterns over a new sample is used as a feature vector for classification.
The effect of a number of feature selection methods on the classification performance is investigated and the best
one is applied. The method is assessed on the subcellular localization
prediction problem with 4 compartments (Endoplasmic reticulum (ER) targeted, cytosolic, mitochondrial, and nuclear)
and the dataset is the same used in P2SL. Our method improved the overall accuracy to 91.71% which was
originally 81.96% by P2SL. In the second part of the thesis, a client/server architecture is designed and implemented
based on Simple Object Access Protocol (SOAP) technology which provides a user-friendly interface for accessing the
protein subcellular localization predictions. Client part is in fact a Cytoscape plug-in that is used for functional
enrichment of biological networks. Instead of the individual use of subcellular localization information,
this plug-in lets biologists to analyze a set of genes/proteins under system view.
Identifer | oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/3/12608914/index.pdf |
Date | 01 September 2007 |
Creators | Alay, Gokcen |
Contributors | Atalay, Volkan |
Publisher | METU |
Source Sets | Middle East Technical Univ. |
Language | English |
Detected Language | English |
Type | M.S. Thesis |
Format | text/pdf |
Rights | To liberate the content for public access |
Page generated in 0.0013 seconds