Global ETD Search

441	Computer Modelling and Simulations of Enzymes and their Mechanisms Alonso, Hernan, hernan.alonso@anu.edu.au January 2006 (has links) Although the tremendous catalytic power of enzymes is widely recognized, their exact mechanisms of action are still a source of debate. In order to elucidate the origin of their power, it is necessary to look at individual residues and atoms, and establish their contribution to ligand binding, activation, and reaction. Given the present limitations of experimental techniques, only computational tools allow for such detailed analysis. During my PhD studies I have applied a variety of computational methods, reviewed in Chapter 2, to the study of two enzymes: DfrB dihydrofolate reductase (DHFR) and methyltetrahydrofolate: corrinoid/iron-sulfur protein methyltransferase (MeTr). ¶ The DfrB enzyme has intrigued microbiologists since it was discovered thirty years ago, because of its simple structure, enzymatic inefficiency, and its insensitivity to trimethoprim. This bacterial enzyme shows neither structural nor sequence similarity with its chromosomal counterpart, despite both catalysing the reduction of dihydrofolate (DHF) using NADPH as a cofactor. As numerous attempts to obtain experimental structures of an enzyme ternary complex have been unsuccessful, I combined docking studies and molecular dynamics simulations to produce a reliable model of the reactive DfrBDHFNADPH complex. These results, combined with published empirical data, showed that multiple binding modes of the ligands are possible within DfrB. ¶ Comprehensive sequence and structural analysis provided further insight into the DfrB family. The presence of the dfrB genes within integrons and their level of sequence conservation suggest that they are old structures that had been diverging well before the introduction of trimethoprim. Each monomer of the tetrameric active enzyme presents an SH3-fold domain; this is a eukaryotic auxiliary domain never found before as the sole domain of a protein, let alone as the catalytic one. Overall, DfrB DHFR seems to be a poorly adapted catalyst, a minimalistic enzyme that promotes the reaction by facilitating the approach of the ligands rather than by using specific catalytic residues. ¶ MeTr initiates the Wood-Ljungdahl pathway of anaerobic CO2 fixation. It catalyses the transfer of the N5-methyl group from N5-methyltetrahydrofolate (CH3THF) to the cobalt centre of a corrinoid/iron-sulfur protein. For the reaction to occur, the N5 position of CH3THF is expected to be activated by protonation. As experimental studies have led to conflicting suggestions, computational approaches were used to address the activation mechanism. ¶ Initially, I tested the accuracy of quantum mechanical (QM) methods to predict protonation positions and pKas of pterin, folate, and their analogues. Then, different protonation states of CH3THF and active-site aspartic residues were analysed. Fragment QM calculations suggested that the pKa of N5 in CH3THF is likely to increase upon protein binding. Further, ONIOM calculations which accounted for the complete protein structure indicated that active-site aspartic residues are likely to be protonated before the ligand. Finally, solvation and binding free energies of several protonated forms of CH3THF were compared using the thermodynamic integration approach. Taken together, these preliminary results suggest that further work with particular emphasis on the protonation state of active-site aspartic residues is needed in order to elucidate the protonation and activation mechanism of CH3THF within MeTr. computational biology molecular dynamics docking free energy protonation drug resistance protein flexibility ligand binding dihydrofolate reductase methyl transferase
442	The Maximum Clique Problem: Algorithms, Applications, and Implementations Eblen, John David 01 August 2010 (has links) Computationally hard problems are routinely encountered during the course of solving practical problems. This is commonly dealt with by settling for less than optimal solutions, through the use of heuristics or approximation algorithms. This dissertation examines the alternate possibility of solving such problems exactly, through a detailed study of one particular problem, the maximum clique problem. It discusses algorithms, implementations, and the application of maximum clique results to real-world problems. First, the theoretical roots of the algorithmic method employed are discussed. Then a practical approach is described, which separates out important algorithmic decisions so that the algorithm can be easily tuned for different types of input data. This general and modifiable approach is also meant as a tool for research so that different strategies can easily be tried for different situations. Next, a specific implementation is described. The program is tuned, by use of experiments, to work best for two different graph types, real-world biological data and a suite of synthetic graphs. A parallel implementation is then briefly discussed and tested. After considering implementation, an example of applying these clique-finding tools to a specific case of real-world biological data is presented. Results are analyzed using both statistical and biological metrics. Then the development of practical algorithms based on clique-finding tools is explored in greater detail. New algorithms are introduced and preliminary experiments are performed. Next, some relaxations of clique are discussed along with the possibility of developing new practical algorithms from these variations. Finally, conclusions and future research directions are given. FPT paraclique NP-complete bioinformatics correlation software Bioinformatics Computational Biology Discrete Mathematics and Combinatorics Software Engineering Theory and Algorithms
443	Efficient Algorithms for Comparing, Storing, and Sharing Large Collections of Phylogenetic Trees Matthews, Suzanne 2012 May 1900 (has links) Evolutionary relationships between a group of organisms are commonly summarized in a phylogenetic (or evolutionary) tree. The goal of phylogenetic inference is to infer the best tree structure that represents the relationships between a group of organisms, given a set of observations (e.g. molecular sequences). However, popular heuristics for inferring phylogenies output tens to hundreds of thousands of equally weighted candidate trees. Biologists summarize these trees into a single structure called the consensus tree. The central assumption is that the information discarded has less value than the information retained. But, what if this assumption is not true? In this dissertation, we demonstrate the value of retaining and studying tree collections. We also conduct an extensive literature search that highlights the rapid growth of trees produced by phylogenetic analysis. Thus, high performance algorithms are needed to accommodate this increasing production of data. We created several efficient algorithms that allow biologists to easily compare, store and share tree collections over tens to hundreds of thousands of phylogenetic trees. Universal hashing is central to all these approaches, allowing us to quickly identify the shared evolutionary relationships contained in tree collections. Our algorithms MrsRF and Phlash are the fastest in the field for comparing large collections of trees. Our algorithm TreeZip is the most efficient way to store large tree collections. Lastly, we developed Noria, a novel version control system that allows biologists to seamlessly manage and share their phylogenetic analyses. Our work has far-reaching implications for both the biological and computer science communities. We tested our algorithms on four large biological datasets, each consisting of 20; 000 to 150; 000 trees over 150 to 525 taxa. Our experimental results on these datasets indicate the long-term applicability of our algorithms to modern phylogenetic analysis, and underscore their ability to help scientists easily exchange and analyze their large tree collections. In addition to contributing to the reproducibility of phylogenetic analysis, our work enables the creation of test beds for improving phylogenetic heuristics and applications. Lastly, our data structures and algorithms can be applied to managing other tree-like data (e.g. XML). computer science computational biology, bioinformatics systematic biology biology evolutionary tree phylogenetic tree tree collections phylogeny compression version control
444	Computational Prediction of Gene Function From High-throughput Data Sources Mostafavi, Sara 31 August 2011 (has links) A large number and variety of genome-wide genomics and proteomics datasets are now available for model organisms. Each dataset on its own presents a distinct but noisy view of cellular state. However, collectively, these datasets embody a more comprehensive view of cell function. This motivates the prediction of function for uncharacterized genes by combining multiple datasets, in order to exploit the associations between such genes and genes of known function--all in a query-specific fashion. Commonly, heterogeneous datasets are represented as networks in order to facilitate their combination. Here, I show that it is possible to accurately predict gene function in seconds by combining multiple large-scale networks. This facilitates function prediction on-demand, allowing users to take advantage of the persistent improvement and proliferation of genomics and proteomics datasets and continuously make up-to-date predictions for large genomes such as humans. Our algorithm, GeneMANIA, uses constrained linear regression to combine multiple association networks and uses label propagation to make predictions from the combined network. I introduce extensions that result in improved predictions when the number of labeled examples for training is limited, or when an ontological structure describing a hierarchy of gene function categorization scheme is available. Further, motivated by our empirical observations on predicting node labels for general networks, I propose a new label propagation algorithm that exploits common properties of real-world networks to increase both the speed and accuracy of our predictions. Computational Biology Machine Learning Predicting Gene Function Biological Networks Combining High-Throughput Data Sources 0984 0800 0715
445	Computational Prediction of Gene Function From High-throughput Data Sources Mostafavi, Sara 31 August 2011 (has links) A large number and variety of genome-wide genomics and proteomics datasets are now available for model organisms. Each dataset on its own presents a distinct but noisy view of cellular state. However, collectively, these datasets embody a more comprehensive view of cell function. This motivates the prediction of function for uncharacterized genes by combining multiple datasets, in order to exploit the associations between such genes and genes of known function--all in a query-specific fashion. Commonly, heterogeneous datasets are represented as networks in order to facilitate their combination. Here, I show that it is possible to accurately predict gene function in seconds by combining multiple large-scale networks. This facilitates function prediction on-demand, allowing users to take advantage of the persistent improvement and proliferation of genomics and proteomics datasets and continuously make up-to-date predictions for large genomes such as humans. Our algorithm, GeneMANIA, uses constrained linear regression to combine multiple association networks and uses label propagation to make predictions from the combined network. I introduce extensions that result in improved predictions when the number of labeled examples for training is limited, or when an ontological structure describing a hierarchy of gene function categorization scheme is available. Further, motivated by our empirical observations on predicting node labels for general networks, I propose a new label propagation algorithm that exploits common properties of real-world networks to increase both the speed and accuracy of our predictions. Computational Biology Machine Learning Predicting Gene Function Biological Networks Combining High-Throughput Data Sources 0984 0800 0715
446	Computational Molecular Engineering Nucleic Acid Binding Proteins and Enzymes Reza, Faisal January 2010 (has links) <p>Interactions between nucleic acid substrates and the proteins and enzymes that bind and catalyze them are ubiquitous and essential for reading, writing, replicating, repairing, and regulating the genomic code by the proteomic machinery. In this dissertation, computational molecular engineering furthered the elucidation of spatial-temporal interactions of natural nucleic acid binding proteins and enzymes and the creation of synthetic counterparts with structure-function interactions at predictive proficiency. We examined spatial-temporal interactions to study how natural proteins can process signals and substrates. The signals, propagated by spatial interactions between genes and proteins, can encode and decode information in the temporal domain. Natural proteins evolved through facilitating signaling, limiting crosstalk, and overcoming noise locally and globally. Findings indicate that fidelity and speed of frequency signal transmission in cellular noise was coordinated by a critical frequency, beyond which interactions may degrade or fail. The substrates, bound to their corresponding proteins, present structural information that is precisely recognized and acted upon in the spatial domain. Natural proteins evolved by coordinating substrate features with their own. Findings highlight the importance of accurate structural modeling. We explored structure-function interactions to study how synthetic proteins can complex with substrates. These complexes, composed of nucleic acid containing substrates and amino acid containing enzymes, can recognize and catalyze information in the spatial and temporal domains. Natural proteins evolved by balancing stability, solubility, substrate affinity, specificity, and catalytic activity. Accurate computational modeling of mutants with desirable properties for nucleic acids while maintaining such balances extended molecular redesign approaches. Findings demonstrate that binding and catalyzing proteins redesigned by single-conformation and multiple-conformation approaches maintained this balance to function, often as well as or better than those found in nature. We enabled access to computational molecular engineering of these interactions through open-source practices. We examined the applications and issues of engineering nucleic acid binding proteins and enzymes for nanotechnology, therapeutics, and in the ethical, legal, and social dimensions. Findings suggest that these access and applications can make engineering biology more widely adopted, easier, more effective, and safer.</p> / Dissertation Engineering, Biomedical Computer Science Biology, Bioinformatics binding protein computational biology molecular engineering nucleic acid protein design synthetic biology
447	The Effect of Structural Microheterogeneity on the Initiation and Propagation of Ectopic Activity in Cardiac Tissue Hubbard, Marjorie Letitia January 2010 (has links) <p>Cardiac arrhythmias triggered by both reentrant and focal sources are closely correlated with regions of tissue characterized by significant structural heterogeneity. Experimental and modeling studies of electrical activity in the heart have shown that local microscopic heterogeneities which average out at the macroscale in healthy tissue play a much more important role in diseased and aging cardiac tissue which have low levels of coupling and abnormal or reduced membrane excitability. However, it is still largely unknown how various combinations of microheterogeneity in the intracellular and interstitial spaces affect wavefront propagation in these critical regimes. </p> <p>This thesis uses biophysically realistic 1-D and 2-D computer models to investigate how heterogeneity in the interstitial and intracellular spaces influence both the initiation of ectopic beats and the escape of multiple ectopic beats from a poorly coupled region of tissue into surrounding well-coupled tissue. An approximate discrete monodomain model that incorporates local heterogeneity in both the interstitial and intracellular spaces was developed to represent the tissue domain. </p> <p>The results showed that increasing the effective interstitial resistivity in poorly coupled fibers alters the distribution of electrical load at the microscale and causes propagation to become more like that observed in continuous fibers. In poorly coupled domains, this nearly continuous state is modulated by cell length and is characterized by decreased gap junction delay, sustained conduction velocity, increased sodium current, reduced maximum upstroke velocity, and increased safety factor. In inhomogeneous fibers with adjacent well-coupled and poorly coupled regions, locally increasing the effective interstitial resistivity in the poorly coupled region reduces the size of the focal source needed to generate an ectopic beat, reduces dispersion of repolarization, and delays the onset of conduction block that is caused by source-load mismatch at the boundary between well-coupled and poorly-coupled regions. In 2-D tissue models, local increases in effective interstitial resistivity as well as microstructural variations in cell arrangement at the boundary between poorly coupled and well-coupled regions of tissue modulate the distribution of maximum sodium current which facilitates the unidirectional escape of focal beats. Variations in the distribution of sodium current as a function of cell length and width lead to directional differences in the response to increased effective interstitial resistivity. Propagation in critical regimes such as the ectopic substrate is very sensitive to source-load interactions and local increases in maximum sodium current caused by microheterogeneity in both intracellular and interstitial structure.</p> / Dissertation Engineering, Biomedical Biophysics, Medical action potential propagation bidomain model cardiac electrophysiology computational biology gap junction coupling interstitial space
448	Modeling Multi-factor Binding of the Genome Wasson, Todd Steven January 2010 (has links) <p>Hundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models tend to consider positions as being either binding sites or not.</p><p>Here, we present a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an 'occupancy profile', a probabilistic description of the DNA occupancy of each factor at each position. We implement our model efficiently as the software package COMPETE. We demonstrate genome-wide and at specific loci how modeling nucleosome binding alters TF binding, and vice versa, and illustrate how factor concentration influences binding occupancy. Binding cooperativity between nearby TFs arises implicitly via mutual competition with nucleosomes. Our method applies not only to TFs, but also recapitulates known occupancy profiles of a well-studied replication origin with and without ORC binding.</p><p>We then develop a statistical framework for tuning our model concentrations to further improve its predictions. Importantly, this tuning optimizes with respect to actual biological data. We take steps to ensure that our tuned parameters are biologically plausible.</p><p>Finally, we discuss novel extensions and applications of our model, suggesting next steps in its development and deployment.</p> / Dissertation Biology, Bioinformatics Computer Science Statistics Boltzmann chains Computational Biology DNA binding Hidden Markov models Statistical mechanics Transcription factors
449	Genomic data mining for the computational prediction of small non-coding RNA genes Tran, Thao Thanh Thi 20 January 2009 (has links) The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies. Bioinformatics Non-coding RNA genes Operon prediction Neural networks Computational biology Non-coding RNA Data mining Genomics Data processing Genomes Data processing
450	Identification of topological and dynamic properties of biological networks through diverse types of data Guner, Ugur 23 May 2011 (has links) It is becoming increasingly important to understand biological networks in order to understand complex diseases, identify novel, safer protein targets for therapies and design efficient drugs. 'Systems biology' has emerged as a discipline to uncover biological networks through genomic data. Computational methods for identifying these networks become immensely important and have been growing in number in parallel to increasing amount of genomic data under the discipline of 'Systems Biology'. In this thesis we introduced novel computational methods for identifying topological and dynamic properties of biological networks. Biological data is available in various forms. Experimental data on the interactions between biological components provides a connectivity map of the system as a network of interactions and time series or steady state experiments on concentrations or activity levels of biological constituents will give a dynamic picture of the web of these interactions. Biological data is scarce usually relative to the number of components in the networks and subject to high levels of noise. The data is available from various resources however it can have missing information and inconsistencies. Hence it is critical to design intelligent computational methods that can incorporate data from different resources while considering noise component. This thesis is organized as follows; Chapter 1 and 2 will introduce the basic concepts for biological network types. Chapter 2 will give a background on biochemical network identification data types and computational approaches for reverse engineering of these networks. Chapter 3 will introduce our novel constrained total least squares approach for recovering network topology and dynamics through noisy measurements. We proved our method to be superior over existing reverse engineering methods. Chapter 4 is an extension of chapter 3 where a Bayesian parameter estimation algorithm is presented that is capable of incorporating noisy time series and prior information for the connectivity of network. The quality of prior information is critical to be able to infer dynamics of the networks. The major drawback of prior connectivity data is the presence of false negatives, missing links. Hence, powerful link prediction methods are necessary to be able to identify missing links. At this junction a novel link prediction method is introduced in Chapter 5. This method is capable of predicting missing links in a connectivity data. An application of this method on protein-protein association data from a literature mining database will be demonstrated. In chapter 6 a further extension into link prediction applications will be given. An interesting application of these methods is the drug adverse effect prediction. Adverse effects are the major reason for the failure of drugs in pharmaceutical industry, therefore it is very important to identify potential toxicity risks in the early drug development process. Motivated by this chapter 6 introduces our computational framework that integrates drug-target, drug-side effect, pathway-target and mouse phenotype-mouse genes data to predict side effects. Chapter 7 will give the significant findings and overall achievements of the thesis. Subsequent steps will be suggested that can follow the work presented here to improve network prediction methods. Constrained total least squares Biological networks Target identification Adverse event prediction Reverse engineering Systems biology Molecular biology Computational biology

Search results