In the recent years, a rapidly increasing amount of experimental data has been generated by high-throughput technologies. Despite of these large quantities of protein-related data and the development of computational prediction methods, the function of many proteins is still unknown. In the human proteome, at least 20% of the annotated proteins are not characterized. Thus, the question, how to predict protein function from its amino acid sequence, remains to be answered for many proteins. Classical bioinformatics approaches for function prediction are based on inferring function from well-characterized homologs, which are identified based on sequence similarity. However, these methods fail to identify distant homologs with low sequence similarity. As protein structure is more conserved than sequence in protein families, structure-based methods (e.g. fold recognition) may recognize possible structural similarities even at low sequence similarity and therefore provide information for function inference. These fold recognition methods have already been proven to be successful for individual proteins, but their automation for high-throughput application is difficult due to intrinsic challenges of these techniques, mainly caused by a high false positive rate. Automated identification of remote homologs based on fold recognition methods would allow a signi cant improvement in functional annotation of proteins. My approach was to combine structure-based computational prediction methods with experimental data from genome-wide RNAi screens to support the establishment of functional hypotheses by improving the analysis of protein structure prediction results.
In the first part of my thesis, I characterized proteins from the Ska complex by computational methods. I showed the benefit of including experimental information to identify remote homologs: Integration of functional data helped to reduce the number of false positives in fold recognition results and made it possible to establish interesting functional hypotheses based on high con dence structural predictions. Based on the structural hypothesis of a GLEBS motif in c13orf3 (Ska3), I could derive a potential molecular mechanism that could explain the observed phenotype.
In the second part of my thesis, my goal was to develop computational tools and automated analysis techniques to be able to perform structure-based functional annotation in a high-throughput way. I designed and implemented key tools that were successfully integrated into a computational platform, called StrAnno, which I set up together with my colleagues. These novel computational modules include a domain prediction algorithm and a graphical overview that facilitates and accelerates the analysis of results.
StrAnno can be seen as a first step towards automatic functional annotation of proteins by structure-based methods. First, the analysis of long hit lists to identify promising candidates for further analysis is substantially facilitated by integration and combination of various sequence-based computational tools and data from functional databases. Second, the developed post-processing tools accelerate the evaluation of structural and functional hypotheses. False positives from the threading result lists are removed by various filters, and analysis of the possible true positives is greatly enhanced by the graphical overview. With these two essential benefits, fold recognition techniques are applicable to large-scale approaches. By applying this developed methodology to hits from a genome-wide cell cycle RNAi screen and evaluating structural hypotheses by molecular modeling techniques, I aimed to associate biological functions to human proteins and link the RNAi phenotype to a molecular function. For two selected human proteins, c20orf43 and HJURP, I could establish interesting structural and functional hypotheses. These predictions were based on templates with low sequence identity (10-20%). The uncharacterized human protein c20orf43 might be a E3 SUMO-ligase that could be involved either in DNA repair or rRNA regulatory processes. Based on the structural hypotheses of two domains of HJURP, I predicted a potential link to ubiquitylation processes and direct DNA binding. In addition, I substantiated the cell cycle arrest phenotype of these two genes upon RNAi knockdown.
Fold recognition methods are a promising alternative for functional annotation of proteins that escape sequence-based annotation due to their low sequence identity to well-characterized protein families. The structural and functional hypotheses I established in my thesis open the door to investigate the molecular mechanisms of previously uncharacterized proteins, which may provide new insights into cellular mechanisms.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:25989 |
Date | 16 April 2012 |
Creators | Sontheimer, Jana |
Contributors | Pisabarro, M. Teresa, Stewart, Francis, Buchholz, Frank, Technische Universität Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0024 seconds