Entire molecular biology revolves primarily around proteins and genes (DNA and RNA). They collaborate with each other facilitating various biomolecular systems. Thus, to comprehend any biological phenomenon from very basic cell division to most complex cancer, it is fundamental to decode the functional dynamics of proteins and genes. Recently, computational approaches are being widely used to supplement traditional experimental approaches. However, each automated approach has its own advantages and limitations. In this thesis, major shortcomings of existing computational approaches are identified and alternative fast yet precise methods are proposed.
First, a strong need for reliable automated protein function prediction is identified. Almost half of protein functional interpretations are enigmatic. Lack of universal functional vocabulary further elevates the problem. NRProF, a novel neural response based method is proposed for protein functional annotation. Neural response algorithm simulates human brain in classifying images; the same is applied here for classifying proteins. Considering Gene Ontology (GO) hierarchical structure as background, NRProF classifies a protein of interest to a specific GO category and thus assigns the corresponding function.
Having established reliable protein functional annotations, protein and gene collaborations are studied next. Interactions amongst transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental for gene regulation and are highly specific, even in evolution background. To explain this binding specificity a Co-Evo (co-evolutionary) relationship is hypothesized. Pearson correlation and Mutual Information (MI) metrics are used to validate the hypothesis. Residue level MI is used to infer specific binding residues of TFs and corresponding TFBSs, assisting a thorough understanding of gene regulatory mechanism and aid targeted gene therapies.
After comprehending TF and TFBS associations, interplay between genes is abstracted as Gene Regulatory Networks. Several methods using expression correlations are proposed to infer gene networks. However, most of them ignore the embedded dynamic delay induced by complex molecular interactions and other riotous cellular mechanisms, involved in gene regulation. The delay is rather obvious in high frequency time series expression data. DDGni, a novel network inference strategy is proposed by adopting gapped smith-waterman algorithm. Gaps attune expression delays and local alignment unveils short regulatory windows, which traditional methods overlook.
In addition to gene level expression data, recent studies demonstrated the merits of exon-level RNA-Seq data in profiling splice variants and constructing gene networks. However, the large number of exons versus small sample size limits their practical application. SpliceNet, a novel method based on Large Dimensional Trace is proposed to infer isoform specific co-expression networks from exon-level RNA-Seq data. It provides a more comprehensive picture to our understanding of complex diseases by inferring network rewiring between normal and diseased samples at isoform resolution. It can be applied to any exon level RNA-Seq data and exon array data.
In summary, this thesis first identifies major shortcomings of existing computational approaches to functional association of proteins and genes, and develops several tools viz. NRProF, Co-Evo, DDGni and SpliceNet. Collectively, they offer a comprehensive picture of the biomolecular system under study. / published_or_final_version / Biochemistry / Doctoral / Doctor of Philosophy
Identifer | oai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/206477 |
Date | January 2014 |
Creators | Yalamanchili, Hari Krishna |
Contributors | Wang, JJ, Chin, FYL |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Source Sets | Hong Kong University Theses |
Language | English |
Detected Language | English |
Type | PG_Thesis |
Rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License |
Relation | HKU Theses Online (HKUTO) |
Page generated in 0.0019 seconds