Global ETD Search

1	Fast Hash-Based Algorithms for Analyzing Large Collections of Evolutionary Trees Sul, Seung Jin 2009 December 1900 (has links) Phylogenetic analysis can produce easily tens of thousands of equally plausible evolutionary trees. Consensus trees and topological distance matrices are often used to summarize the evolutionary relationships among the trees of interest. However, current approaches are not designed to analyze very large tree collections. In this dissertation, we present two fast algorithms— HashCS and HashRF —for analyzing large collections of evolutionary trees based on a novel hash table data structure, which provides a convenient and fast approach to store and access the bipartition information collected from the tree collections. Our HashCS algorithm is a fast ( ) technique for constructing consensus trees, where is the number of taxa and is the number of trees. By reprocessing the bipartition information in our hash table, HashCS constructs strict and majority consensus trees. In addition to a consensus algorithm, we design a fast topological distance algorithm called HashRF to compute the × Robinson-Foulds distance matrix, which requires ( ^ 2) running time. A RF distance matrix provides plenty of data-mining opportunities to help researchers understand the evolutionary relationships contained in their collection of trees. We also introduce a series of extensions based on HashRF to provide researchers with more convenient set of tools for analyzing their trees. We provide extensive experimentation regarding the practical performance of our hash-based algorithms across a diverse collection of biological and artificial trees. Our results show that both algorithms easily outperform existing consensus and RF matrix implementations. For example, on our biological trees, HashCS and HashRF are 1.8 and 100 times faster than PAUP*, respectively. We show two real-world applications of our fast hashing algorithms: (i) comparing phylogenetic heuristic implementations, and (ii) clustering and visualizing trees. In our first application, we design novel methods to compare the PaupRat and Rec-I-DCM3, two popular phylogenetic heuristics that use the Maximum Parsimony criterion, and show that RF distances are more effective than parsimony scores at identifying heterogeneity within a collection of trees. In our second application, we empirically show how to determine the distinct clusters of trees within large tree collections. We use two different techniques to identify distinct tree groups. Both techniques show that partitioning the trees into distinct groups and summarizing each group separately is a better representation of the data. Additional benefits of our approach are better consensus trees as well as insightful information regarding the convergence behavior of phylogenetic heuristics. Our fast hash-based algorithms provide scientists with a very powerful tools for analyzing the relationships within their large phylogenetic tree collections in new and exciting ways. Our work has many opportunities for future work including detecting convergence and designing better heuristics. Furthermore, our hash tables have lots of potential future extensions. For example, we can also use our novel hashing structure to design algorithms for computing other distance metrics such as Nearest Neighbor Interchange (NNI), Subtree Pruning and Regrafting (SPR), and Tree Bisection and Reconnection (TBR) distances. phylogenetic analysis evolutionary tree hash, consensus tree Robinson-Foulds distance
2	Approximation Algorithms for Constructing Evolutionary Trees Huang, Chia-Mao 10 August 2001 (has links) In this thesis, we shall propose heuristic algorithms to construct evolutionary trees under the distance base model. For a distance matrix of any type, the problem of constructing a minimum ultrametric tree (MUT), whose scoring function is the minimum tree size, is NP-hard. Furthermore, the problem of constructing an approximate ultrametric tree with approximation error ratio within $n^{epsilon}, epsilon > 0$, is also NP-hard. When the distance matrix is metric, the problem is called the triangle minimum ultrametric tree problem ($ riangle$MUT). For the $ riangle$MUT, there is a previous approximation algorithm, with error ratio $leq 1.5 ( lceil log n ceil + 1 )$. And we shall propose an improvement, with error ratio $leq lceil log_{alpha} n ceil + 1 cong 1.44 lceil log n ceil + 1$, where $alpha = frac{sqrt{5}+1}{2}$ for solving the $ riangle$MUT problem. We shall also propose a heuristic algorithm to obtain a good leaf node circular order. The heuristic algorithm is based on the clustering scheme. And then we shall design a dynamic programming algorithm to construct the optimal ultrametric tree with some fixed leaf node circular order. The time complexity of the dynamic programming is $O(n^3)$, if the scoring function is the minimum tree size or $L^1$-min increment. dynamic programming evolutionary tree ultrametric bioinformations MUT phylogeny circular order
3	A Novel Quartet-Based Method for Inferring Evolutionary Trees from Molecular Data Tarawneh, Monther January 2008 (has links) octor of Philosophy(PhD) / Molecular Evolution is the key to explain the divergence of species and the origin of life on earth. The main task in the study of molecular evolution is the reconstruction of evolutionary trees from sequences data of the current species. This thesis introduces a novel algorithm for inferring evolutionary trees from genetic data using quartet-based approach. The new method recursively merges sub-trees based on a global statistical provided by the global quartet weight matrix. The quarte weights can be computed using several methods. Since the quartet weights computation is the most expensive procedure in this approach, the new method enables the parallel inference of large evolutionary trees. Several techniques developed to deal with quartets inaccuracies. In addition, the new method we developed is flexible in such a way that can combine morphological and molecular phylogenetic analyses to yield more accurate trees. Also, we introduce the concept of critical point where more than one possible merges are possible for the same sub-tree. The critical point concept can provide information about the relationships between species in more details and show how close they are. This enables us to detect other reasonable trees. We evaluated the algorithm on both synthetic and real data sets. Experimental results showed that the new method achieved significantly better accuracy in comparison with existing methods. Molecular evolution Phylogenetic Evolutionary tree Excavation Taxa Quartet based method QBNJ QBML maximum likelihood evolutionary models
4	A Novel Quartet-Based Method for Inferring Evolutionary Trees from Molecular Data Tarawneh, Monther January 2008 (has links) octor of Philosophy(PhD) / Molecular Evolution is the key to explain the divergence of species and the origin of life on earth. The main task in the study of molecular evolution is the reconstruction of evolutionary trees from sequences data of the current species. This thesis introduces a novel algorithm for inferring evolutionary trees from genetic data using quartet-based approach. The new method recursively merges sub-trees based on a global statistical provided by the global quartet weight matrix. The quarte weights can be computed using several methods. Since the quartet weights computation is the most expensive procedure in this approach, the new method enables the parallel inference of large evolutionary trees. Several techniques developed to deal with quartets inaccuracies. In addition, the new method we developed is flexible in such a way that can combine morphological and molecular phylogenetic analyses to yield more accurate trees. Also, we introduce the concept of critical point where more than one possible merges are possible for the same sub-tree. The critical point concept can provide information about the relationships between species in more details and show how close they are. This enables us to detect other reasonable trees. We evaluated the algorithm on both synthetic and real data sets. Experimental results showed that the new method achieved significantly better accuracy in comparison with existing methods. Molecular evolution Phylogenetic Evolutionary tree Excavation Taxa Quartet based method QBNJ QBML maximum likelihood evolutionary models
5	Efficient Algorithms for Comparing, Storing, and Sharing Large Collections of Phylogenetic Trees Matthews, Suzanne 2012 May 1900 (has links) Evolutionary relationships between a group of organisms are commonly summarized in a phylogenetic (or evolutionary) tree. The goal of phylogenetic inference is to infer the best tree structure that represents the relationships between a group of organisms, given a set of observations (e.g. molecular sequences). However, popular heuristics for inferring phylogenies output tens to hundreds of thousands of equally weighted candidate trees. Biologists summarize these trees into a single structure called the consensus tree. The central assumption is that the information discarded has less value than the information retained. But, what if this assumption is not true? In this dissertation, we demonstrate the value of retaining and studying tree collections. We also conduct an extensive literature search that highlights the rapid growth of trees produced by phylogenetic analysis. Thus, high performance algorithms are needed to accommodate this increasing production of data. We created several efficient algorithms that allow biologists to easily compare, store and share tree collections over tens to hundreds of thousands of phylogenetic trees. Universal hashing is central to all these approaches, allowing us to quickly identify the shared evolutionary relationships contained in tree collections. Our algorithms MrsRF and Phlash are the fastest in the field for comparing large collections of trees. Our algorithm TreeZip is the most efficient way to store large tree collections. Lastly, we developed Noria, a novel version control system that allows biologists to seamlessly manage and share their phylogenetic analyses. Our work has far-reaching implications for both the biological and computer science communities. We tested our algorithms on four large biological datasets, each consisting of 20; 000 to 150; 000 trees over 150 to 525 taxa. Our experimental results on these datasets indicate the long-term applicability of our algorithms to modern phylogenetic analysis, and underscore their ability to help scientists easily exchange and analyze their large tree collections. In addition to contributing to the reproducibility of phylogenetic analysis, our work enables the creation of test beds for improving phylogenetic heuristics and applications. Lastly, our data structures and algorithms can be applied to managing other tree-like data (e.g. XML). computer science computational biology, bioinformatics systematic biology biology evolutionary tree phylogenetic tree tree collections phylogeny compression version control
6	Mathematical models to investigate the relationship between cross-immunity and replacement of influenza subtypes Asaduzzaman, S M 08 January 2018 (has links) A pandemic subtype of influenza A sometimes replaces (e.g., in 1918, 1957, 1968) but sometimes coexists (e.g., in 1977) with the previous seasonal subtype. This research aims to determine a condition for replacement or coexistence of influenza subtypes. We formulate a hybrid model for the dynamics of influenza A epidemics taking into account cross-immunity of influenza strains depending on the most recent seasonal infection. A combination of theoretical and numerical analyses shows that for very strong cross-immunity between seasonal and pandemic subtypes, the pandemic cannot invade, whereas for strong and weak cross-immunity there is coexistence, and for intermediate levels of cross-immunity the pandemic may replace the seasonal subtype. Cross-immunity between seasonal strains is also a key factor of our model because it has a major influence on the final size of seasonal epidemics, and on the distribution of susceptibility in the population. To determine this cross-immunity, we design a novel statistical method, which uses a theoretical model and clinical data on attack rates and vaccine efficacy among school children for two seasons after the 1968 A/H3N2 pandemic. This model incorporates the distribution of susceptibility and the dependence of cross-immunity on the antigenic distance of drifted strains. We find that the cross-immunity between an influenza strain and the mutant that causes the next epidemic is 88%. Our method also gives an estimated value 2.15 for the basic reproduction number of the 1968 pandemic influenza. Our hybrid model agrees qualitatively with the observed subtype replacement or coexistence in 1957, 1968 and 1977. However, our model with the homogeneous mixing assumption significantly over estimates the pandemic attack rate. Thus, we modify the model to incorporate heterogeneity in the contact rate of individuals. Using the determined values of cross-immunity and the basic reproduction number, this modification lowers the pandemic attack rate slightly, but it is still higher than the observed attack rates. / Graduate Influenza drift Influenza pandemic Cross-immunity Reproduction number Vaccine protection Drift evolution Basic reproduction number Seasonal influenza strains Evolutionary tree

1

Page generated in 0.6547 seconds