Return to search

Prediction Of Protein Subcellular Localization Using Global Protein Sequence Feature

The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years.
Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic properties of proteins. Knowledge of the structure of a protein is essential for describing and understanding its function. In addition, subcellular localization of a protein can be used to provide some amount of characterization of a protein. In this study, a method for the prediction of protein subcellular localization based on primary sequence data is described. Primary sequence data for a protein is based on amino acid sequence. The frequency value for each amino acid is computed in one given position. Assigned frequencies are used in a new encoding scheme that conserves biological information based on point accepted mutations (PAM) substitution matrix. This method can be used to predict the nuclear, the cytosolic sequences, the mitochondrial targeting peptides (mTP) and the signal peptides (SP). For clustering purposes, other than well known traditional techniques, principle component analysis (PCA)&quot / and self-organizing maps (SOM)&quot / are used. For classication purposes, support vector machines (SVM)&quot / , a method of statistical learning theory recently introduced to bioinformatics is used. The aim of the combination of feature extraction, clustering and classification methods is to design an acccurate system that predicts the subcellular localization of proteins presented into the system. Our scheme for combining several methods is cascading or serial combination according to its architecture. In the cascading architecture, the output of a method serves as the input of the other model used.

Identiferoai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/3/1135292/index.pdf
Date01 August 2003
CreatorsBozkurt, Burcin
ContributorsAtalay, Volkan
PublisherMETU
Source SetsMiddle East Technical Univ.
LanguageEnglish
Detected LanguageEnglish
TypeM.S. Thesis
Formattext/pdf
RightsTo liberate the content for public access

Page generated in 0.372 seconds