1 |
STRUCTURAL DETERMINANTS OF REPLACEMENT RATE HETEROGENEITYRaftis, Francis 07 1900 (has links)
<p> Protein sequences display replacement rate heterogeneity across sites. In an earlier
work, half of the causal site-wise variation in replacement rates was explained
by a simple linear regression model consisting of terms for the solvent exposure of
each residue, distance from the active site, and glycines in unusual main-chain conformations.
Replacement rates vary not only across sites, they may also vary over
time. In this study, we apply the linear regression model to phylogenies divided
into subtrees to see if lineage-specific rate shifts have a structural basis that can be
detected by the model. This approach is applied to two different data sets. The first
set consists of phylogenies containing two representative structures, divided into
subtrees such that one structure is present in each subtree. These structures have
little or no obvious functional divergence between them. The model is tested with
permutations of subtrees and structures from each subtree. While there is a slight
effect of the specific structure on the fit of the model, the specific subtree has a
greater effect. The second data set involves homologous structure pairs where the
quaternary structure has changed at some point in the phylogeny. These pairs are
examined to see how the change in constraint on the new interface sites affect the
replacement rate, and its relationship with other structural factors. We find that the
unique interfaces are as conserved as the shared ones, and they exhibit a different
relationship between replacement rates and indicators of constraint than the shared
interfaces or other protein sites. We also find that the unique interfaces display
characteristic amino acid preferences that may identify interfaces which are still in
the process of stabilizing. </p> / Thesis / Master of Science (MSc)
|
2 |
Improved Bayesian methods for detecting recombination and rate heterogeneity in DNA sequence alignmentsMantzaris, Alexander Vassilios January 2011 (has links)
DNA sequence alignments are usually not homogeneous. Mosaic structures may result as a consequence of recombination or rate heterogeneity. Interspecific recombination, in which DNA subsequences are transferred between different (typically viral or bacterial) strains may result in a change of the topology of the underlying phylogenetic tree. Rate heterogeneity corresponds to a change of the nucleotide substitution rate. Various methods for simultaneously detecting recombination and rate heterogeneity in DNA sequence alignments have recently been proposed, based on complex probabilistic models that combine phylogenetic trees with factorial hidden Markov models or multiple changepoint processes. The objective of my thesis is to identify potential shortcomings of these models and explore ways of how to improve them. One shortcoming that I have identified is related to an approximation made in various recently proposed Bayesian models. The Bayesian paradigm requires the solution of an integral over the space of parameters. To render this integration analytically tractable, these models assume that the vectors of branch lengths of the phylogenetic tree are independent among sites. While this approximation reduces the computational complexity considerably, I show that it leads to the systematic prediction of spurious topology changes in the Felsenstein zone, that is, the area in the branch lengths configuration space where maximum parsimony consistently infers the wrong topology due to long-branch attraction. I demonstrate these failures by using two Bayesian hypothesis tests, based on an inter- and an intra-model approach to estimating the marginal likelihood. I then propose a revised model that addresses these shortcomings, and demonstrate its improved performance on a set of synthetic DNA sequence alignments systematically generated around the Felsenstein zone. The core model explored in my thesis is a phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to recombination and rate heterogeneity. The focus of my work is on improving the modelling of the latter aspect. Earlier research efforts by other authors have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. Their work fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. I have improved these earlier phylogenetic FHMMs in two respects. Firstly, by sampling the rate vector from the posterior distribution with RJMCMC I have made the modelling of regional rate heterogeneity more flexible, and I infer the number of different degrees of divergence directly from the DNA sequence alignment, thereby dispensing with the need to arbitrarily select this quantity in advance. Secondly, I explicitly model within-codon rate heterogeneity via a separate rate modification vector. In this way, the within-codon effect of rate heterogeneity is imposed on the model a priori, which facilitates the learning of the biologically more interesting effect of regional rate heterogeneity a posteriori. I have carried out simulations on synthetic DNA sequence alignments, which have borne out my conjecture. The existing model, which does not explicitly include the within-codon rate variation, has to model both effects with the same modelling mechanism. As expected, it was found to fail to disentangle these two effects. On the contrary, I have found that my new model clearly separates within-codon rate variation from regional rate heterogeneity, resulting in more accurate predictions.
|
3 |
AsymmeTree: A Flexible Python Package for the Simulation of Complex Gene Family HistoriesSchaller, David, Hellmuth, Marc, Stadler, Peter F. 15 January 2024 (has links)
AsymmeTree is a flexible and easy-to-use Python package for the simulation of gene family
histories. It simulates species trees and considers the joint action of gene duplication, loss, conversion,
and horizontal transfer to evolve gene families along the species tree. To generate realistic scenarios,
evolution rate heterogeneity from various sources is modeled. Finally, nucleotide or amino acid
sequences (optionally with indels, among-site rate heterogeneity, and invariant sites) can be simulated
along the gene phylogenies. For all steps, users can choose from a spectrum of alternative methods
and parameters. These choices include most options that are commonly used in comparable tools but
also some that are usually not found, such as the innovation model for species evolution. While output
files for each individual step can be generated, AsymmeTree is primarily intended to be integrated in
complex Python pipelines designed to assess the performance of data analysis methods. It allows the
user to interact with, analyze, and possibly manipulate the simulated scenarios. AsymmeTree is freely
available on GitHub.
|
Page generated in 0.1116 seconds