Global ETD Search

1	Gene phylogenies and protein–protein interactions: possible artifacts resulting from shared protein interaction partners Campos, Paulo R.A., de Oliveira, Viviane M., Wagner, Günter P., Stadler, Peter F. 10 December 2018 (has links) The study of gene families critically depends on the correct reconstruction of gene genealogies, as for instance in the case of transcription factor genes like Hox genes and Dlx gene families. Proteins belonging to the same family are likely to share some of the same protein interaction partners and may thus face a similar selective environment. This common selective environment can induce co-evolutionary pressures and thus can give rise to correlated rates and patterns of evolution among members of a gene family. In this study, we simulate the evolution of a family of sequences which share a set of interaction partners. Depending on the amount of sequence dedicated to protein–protein interaction and the relative rate parameters of sequence evolution three outcomes are possible: if the fraction of the sequence dedicated to interaction with common co-factors is low and the time since divergence is small, the trees based on sequence information tend to be correct. If the time since gene duplication is long two possible outcomes are observed in our simulations. If the rate of evolution of the interaction partner is small compared to the rate of evolution of the focal protein family, the reconstructed trees tend towards star phylogenies. As the rate of evolution of the interaction partner approaches that of the focal protein family the reconstructed phylogenies tend to be incorrectly resolved. We conclude that the genealogies of gene families can be hard to estimate, in particular if the proteins interact with a conserved set of binding partners, as is likely the case for transcription factors.
2	Ordinal and convex assumptions in phylogenetic tree reconstruction Candy, Robin January 2014 (has links) Phylogenetics is a field primarily concerned with the reconstruction of the evolutionary history of present day species. Evolutionary history is often modeled by a phylogenetic tree, similar to a family tree. To recreate a phylogenetic tree from information about current species, one needs to make assumptions about the evolutionary process. These assumptions can range from full parametrised models of evolution to simple observations. This thesis looks at the reconstruction of phylogenetic trees under two different assumptions. The first, known as the ordinal assumption, has been previously studied and asserts that as species evolve, they become more dissimilar. The second, the convex assumption, has not previously been studied in this context and asserts that changes species go through to become dissimilar are progressively larger than the current differences between those species. This thesis presents an overview of mathematical results in tree reconstruction from dissimilarity maps (also known as distance matrices) and develops techniques for reasoning about the ordinal and convex assumptions. In particular, three main results are presented: a complete classification of phylogenetic trees with four leaves under the ordinal assumption; a partial classification of phylogenetic trees with four leaves under the convex assumption; and, an independent proof of a result on the relationship between ultrametrics and the ordinal assumption. phylogenetic ordinal assumption convex assumption tree reconstruction quartet reconstruction
3	Protein folding and phylogenetic tree reconstruction using stochastic approximation Monte Carlo Cheon, Sooyoung 17 September 2007 (has links) Recently, the stochastic approximation Monte Carlo algorithm has been proposed by Liang et al. (2005) as a general-purpose stochastic optimization and simulation algorithm. An annealing version of this algorithm was developed for real small protein folding problems. The numerical results indicate that it outperforms simulated annealing and conventional Monte Carlo algorithms as a stochastic optimization algorithm. We also propose one method for the use of secondary structures in protein folding. The predicted protein structures are rather close to the true structures. Phylogenetic trees have been used in biology for a long time to graphically represent evolutionary relationships among species and genes. An understanding of evolutionary relationships is critical to appropriate interpretation of bioinformatics results. The use of the sequential structure of phylogenetic trees in conjunction with stochastic approximation Monte Carlo was developed for phylogenetic tree reconstruction. The numerical results indicate that it has a capability of escaping from local traps and achieving a much faster convergence to the global likelihood maxima than other phylogenetic tree reconstruction methods, such as BAMBE and MrBayes. Stochastic Approximation Monte Carlo Protein Folding Phylogenetic tree reconstruction
4	Computational problems in evolution : Multiple alignment, genome rearrangements, and tree reconstruction Elias, Isaac January 2006 (has links) Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include: • A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren) • NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets. • A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman) • Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller) • An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren) / QC 20110121 Distance Methods Tree reconstruction Sorting by Transpositions multiple alignment ancestral sequences Computer science Datavetenskap
5	The Mystery of the Chaetognatha: A Molecular Phylogenetic Approach Using Pelagic Chaetognath Species on Pelican Island, Galveston, Texas Towers, Leah Nicole 2010 December 1900 (has links) The phylum Chaetognatha is a mysterious group of organisms that has eluded scientists for more than a century because of their unique morphology and developmental characteristics, i.e. protostome (mouth develops from blastopore; e.g. mollusks, annelids, arthropods) versus deuterostome (anus develops from blastopore; e.g. echinoderms and chordates) offer few clues to their evolutionary origins. Some early morphological studies argued that chaetognaths were derived mollusks or nematodes according to gross ultrastructural data, while other studies focused on the coelomic cavity. 33 Although 18S rRNA is widely used in molecular phylogeny studies, it has limits such as long- branch chain attractions and a slow rate of evolutionary change. Long-branch chain attractions are a phenomenon in phylogenetic analyses when rapidly evolving lineages are inferred to be closely related, regardless of their true evolutionary relationships. Hence other genes are used in this study to complement the 18S rRNA such as the cytochrome oxidase genes. The cytochrome oxidase genes are highly conserved throughout all eukaryotic organisms and they are less ambiguous to align as compared to the ribosomal genes, making them better phylogenetic markers as compared to the 18S rRNA gene. This study focuses on using a molecular approach (ARDRA, PCR, phylogenetic tree reconstruction) to determine the phylogeny of pelagic chaetognaths found on Pelican Island, Galveston, Texas. 18S rRNA, Cytochrome Oxidase I and Cytochrome Oxidase II genes were used to help decipher the phylogeny of this group. All analyzed genes in this study (18S rRNA, COI, and COII) grouped the Pelican Island chaetognaths with the protostomes. The maximum parsimony bootstrap tree for the 18S rRNA gene, grouped the samples closest to the arthropods (protostome). For the COI and COII genes, the minimum evolution bootstrap tree grouped the 8 collected samples more closely to two other protostome phyla: the mollusks and annelids (COI) while bootstrapping with the COII grouped the samples with the nematodes (with >66 percent bootstrap). My findings are significant because they reveal phylogenetic results of a protostome lineage for the Chaetognatha using 3 genes, one of which (COII) has not been greatly studied for the Chaetognatha. ARDRA PCR chaetognath phylogeny 18s rRNA COI COII tree reconstruction MEGA 4.0
6	Computational problems in evolution : Multiple alignment, genome rearrangements, and tree reconstruction Elias, Isaac January 2006 (has links) <p>Reconstructing the evolutionary history of a set of species is a fundamental problem in biology. This thesis concerns computational problems that arise in different settings and stages of phylogenetic tree reconstruction, but also in other contexts. The contributions include:</p><p>• A new distance-based tree reconstruction method with optimal reconstruction radius and optimal runtime complexity. Included in the result is a greatly simplified proof that the NJ algorithm also has optimal reconstruction radius. (co-author Jens Lagergren)</p><p>• NP-hardness results for the most common variations of Multiple Alignment. In particular, it is shown that SP-score, Star Alignment, and Tree Alignment, are NP hard for all metric symbol distances over all binary or larger alphabets.</p><p>• A 1.375-approximation algorithm for Sorting By Transpositions (SBT). SBT is the problem of sorting a permutation using as few block-transpositions as possible. The complexity of this problem is still open and it was a ten-year-old open problem to improve the best known 1.5-approximation ratio. The 1.375-approximation algorithm is based on a new upper bound on the diameter of 3-permutations. Moreover, a new lower bound on the transposition diameter of the symmetric group is presented and the exact transposition diameter of simple permutations is determined. (co-author Tzvika Hartman)</p><p>• Approximation, fixed-parameter tractable, and fast heuristic algorithms for two variants of the Ancestral Maximum Likelihood (AML) problem: when the phylogenetic tree is known and when it is unknown. AML is the problem of reconstructing the most likely genetic sequences of extinct ancestors along with the most likely mutation probabilities on the edges, given the phylogenetic tree and sequences at the leafs. (co-author Tamir Tuller)</p><p>• An algorithm for computing the number of mutational events between aligned DNA sequences which is several hundred times faster than the famous Phylip packages. Since pairwise distance estimation is a bottleneck in distance-based phylogeny reconstruction, the new algorithm improves the overall running time of many distancebased methods by a factor of several hundred. (co-author Jens Lagergren)</p> Distance Methods Tree reconstruction Sorting by Transpositions multiple alignment ancestral sequences Computer science Datavetenskap
7	A Statistical Approach to Understand the Evolution of Exotic Butterfly Species / En statistisk metod för att förstå evolutionen av exotiska fjärilsarter Eliasson, Elin, Haraldsson, Rebecka January 2023 (has links) The alarming rate at which we see the decline in biodiversity due to human activity has raised concerns about the well-being of our planet. Butterflies which serve as pollinators are an essential part of many ecosystems and sensitive indicators of environmental changes and can provide valuable insight into how ecosystems function and evolve. This thesis aims to create phylogenetic trees based on DNA sequences from butterflies and compare different nucleotide substitution models and methods in order to better understand butterflies' evolution and genetic relationships. Our approach was to use Markov theory to investigate how the four nucleotides are evolving. In regard to the comparison of models, the General Time Reversible model with more degrees of freedom was found to be better than the K80 model. Although the Maximum Likelihood and Pairwise Distance methods were found to have different transition rate matrices, the tree reconstructions had no registered differences. Interestingly, the Q matrix was found to be similar across butterfly families. These findings can suggest that it is possible to have a standard Q matrix when estimating or inferring evolutionary relationships among butterflies, and probably other animal groups. This should improve the accuracy of estimations within phylogenetics when dealing with small data sets. The information helps with reconstructing evolutionary relationships and species, therefore contributing to preserving biodiversity and thereby the ecosystems to whom they belong - and in addition humankind. / Den oroväckande takten med vilken vi ser en minskning i biologisk mångfald på grund av mänsklig aktivitet har väckt rädsla för vår planets fortsatta välbefinnande. Fjärilar som är en väsentlig del av många ekosystem fungerar som pollinatörer och indikatorer för miljöförändringar, vilket ger värdefull insikt om ekosystemens funktion och utveckling. Fjärilar fungerar som pollinatörer och är en viktig del av många ekosystem. Därmed är de känsliga indikatorer på miljöförändringar och kan ge värdefull insikt om hur ekosystem fungerar samt utvecklas. Detta kandidatexamensarbete syftar till att skapa fylogenetiska träd baserade fjärilars DNA-sekvenser och jämföra olika modeller för nukleotid substitution och metoder för att bättre förstå fjärilars utveckling och genetiska relationer. Vårt tillvägagångssätt var att använda Markovs teori för att undersöka hur de fyra nukleotiderna utvecklas. När det gäller jämförelsen av modeller visade sig General Time Reversible-modellen med fler frihetsgrader vara bättre än K80-modellen. Fastän Maximum Likelihood och Pairwise Distance metoderna visade sig ha olika övergångsmatriser fanns det inga registrerade skillnader i trädrekonstruktionerna. Ytterligare ett intressant resultat var att Q-matrisen var liknande oberoende av fjärilsfamilj. Detta kan tyda på att det är möjligt att ha en standard Q-matris när man uppskattar eller drar slutsatser om evolutionära samband mellan fjärilar och förmodligen andra djurgrupper. Vidare studier behövs men detta skulle förbättra noggrannheten av uppskattningar inom fylogenetiken när man hanterar små datamängder. Sammanfattningsvis hjälper nämnda insikter till att rekonstruera evolutionära relationer bland arter och bidrar därför till att bevara den biologiska mångfalden, tillhörande ekosystem och dessutom mänskligheten. Phylogenetics Tree reconstruction COI barcoding General Time Reversible Model K80 Model Evolution Butterflies Fylogenetik Rekonstruktion av träd COI barcoding General Time Reversible modell K80 modell Evolution Fjärilar Probability Theory and Statistics Sannolikhetsteori och statistik
8	Analysis and Reconstruction of the Hematopoietic Stem Cell Differentiation Tree: A Linear Programming Approach for Gene Selection Ghadie, Mohamed A. January 2015 (has links) Stem cells differentiate through an organized hierarchy of intermediate cell types to terminally differentiated cell types. This process is largely guided by master transcriptional regulators, but it also depends on the expression of many other types of genes. The discrete cell types in the differentiation hierarchy are often identified based on the expression or non-expression of certain marker genes. Historically, these have often been various cell-surface proteins, which are fairly easy to assay biochemically but are not necessarily causative of the cell type, in the sense of being master transcriptional regulators. This raises important questions about how gene expression across the whole genome controls or reflects cell state, and in particular, differentiation hierarchies. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering and maximization of parsimony can organize the cell types into a tree, but in general this tree is different from the differentiation hierarchy. Using hematopoietic differentiation as an example, we demonstrate how many genes other than marker genes are able to discriminate between different branches of the differentiation tree by proposing two models for detecting genes that are up-regulated or down-regulated in distinct lineages. We then propose a novel approach to solving the following problem: Given the differentiation hierarchy and gene expression data at each node, construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming framework to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a sparse weighted Euclidean metric that uses just 175 genes. These 175 genes are different than the marker genes that were used to identify the 38 cell types, hence offering a novel alternative way of discriminating different branches of the tree. A DAVID functional annotation analysis shows that the 175 genes reflect major processes and pathways active in different parts of the tree. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. Our results show that the 175 genes frequently appear in the random metrics, implicating their significance from an empirical point of view as well. Finally, we show how our linear programming method is able to identify columns that were selected to build minimum spanning trees on the nodes of random variable-size matrices. Linear Programming Distance Metric Learning Machine Learning Feature Selection Tree Reconstruction Hierarchical Clustering Minimum Spanning Tree Clustering Optimization Maximum Parsimony Euclidean Distance Weighted Euclidean Stem Cell Differentiation Hematopoiesis Transcriptional Regulation Transcription Factor Gene Selection Gene Expression Microarray Cell Type Marker Gene Functional Annotation Random Forest Biological Function Regulation Statistical Significance Erythropoiesis Natural Killer Cell T Cell B Cell Granulocyte Monocyte Megakaryocyte Minimization Linear Constraint Cell Lineage

Search results