The availability of new technologies supplied life scientists with large amounts of experimental data. The data sets are large not only in terms of the number of observations, but also in terms of the number of recorded features. One of the aims of modeling is to explain a given phenomenon in possibly the simplest way, hence the need for selection of suitable features. We extended a Monte Carlo-based approach to selecting statistically significant features with discovery of feature interdependencies and used it in modeling sequence-function relationships in proteins. Our approach led to compact and easy-to-interpret predictive models. First, we represented protein sequences in terms of their physicochemical properties. This was followed by our feature selection and discovery of feature interdependencies. Finally, predictive models based on e.g., decision trees or rough sets were constructed. We applied the method to model two important biological problems: 1) HIV-1 resistance to reverse transcriptase-targeted drugs and 2) post-translational modifications of proteins. In the case of HIV resistance, we were not only able to predict whether the mutated protein is resistant to a drug or not, but we also suggested some new, previously neglected, mutations that possibly contribute to drug resistance. For all these mutations we proposed probable molecular mechanisms of action using literature and 3D structure studies. In the case of predicting PTMs, we built high accuracy models of modifications. In comparison to other methods, we were able to resolve whether the closest neighborhood of a residue (the nanomer) is sufficient to determine its modification status. Importantly, the application of our method yields networks of interdependent physicochemical properties of amino acids that show how these properties collaborate in establishing a given modification. We believe that the presented methods will help researchers to analyze a large class of important biological problems and will guide them in their research.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-109873 |
Date | January 2009 |
Creators | Kierczak, Marcin |
Publisher | Uppsala universitet, Centrum för bioinformatik, Uppsala : Acta Universitatis Upsaliensis |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Doctoral thesis, comprehensive summary, info:eu-repo/semantics/doctoralThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, 1651-6214 ; 688 |
Page generated in 0.0021 seconds