Global ETD Search

1	Fast Hash-Based Algorithms for Analyzing Large Collections of Evolutionary Trees Sul, Seung Jin 2009 December 1900 (has links) Phylogenetic analysis can produce easily tens of thousands of equally plausible evolutionary trees. Consensus trees and topological distance matrices are often used to summarize the evolutionary relationships among the trees of interest. However, current approaches are not designed to analyze very large tree collections. In this dissertation, we present two fast algorithms— HashCS and HashRF —for analyzing large collections of evolutionary trees based on a novel hash table data structure, which provides a convenient and fast approach to store and access the bipartition information collected from the tree collections. Our HashCS algorithm is a fast ( ) technique for constructing consensus trees, where is the number of taxa and is the number of trees. By reprocessing the bipartition information in our hash table, HashCS constructs strict and majority consensus trees. In addition to a consensus algorithm, we design a fast topological distance algorithm called HashRF to compute the × Robinson-Foulds distance matrix, which requires ( ^ 2) running time. A RF distance matrix provides plenty of data-mining opportunities to help researchers understand the evolutionary relationships contained in their collection of trees. We also introduce a series of extensions based on HashRF to provide researchers with more convenient set of tools for analyzing their trees. We provide extensive experimentation regarding the practical performance of our hash-based algorithms across a diverse collection of biological and artificial trees. Our results show that both algorithms easily outperform existing consensus and RF matrix implementations. For example, on our biological trees, HashCS and HashRF are 1.8 and 100 times faster than PAUP*, respectively. We show two real-world applications of our fast hashing algorithms: (i) comparing phylogenetic heuristic implementations, and (ii) clustering and visualizing trees. In our first application, we design novel methods to compare the PaupRat and Rec-I-DCM3, two popular phylogenetic heuristics that use the Maximum Parsimony criterion, and show that RF distances are more effective than parsimony scores at identifying heterogeneity within a collection of trees. In our second application, we empirically show how to determine the distinct clusters of trees within large tree collections. We use two different techniques to identify distinct tree groups. Both techniques show that partitioning the trees into distinct groups and summarizing each group separately is a better representation of the data. Additional benefits of our approach are better consensus trees as well as insightful information regarding the convergence behavior of phylogenetic heuristics. Our fast hash-based algorithms provide scientists with a very powerful tools for analyzing the relationships within their large phylogenetic tree collections in new and exciting ways. Our work has many opportunities for future work including detecting convergence and designing better heuristics. Furthermore, our hash tables have lots of potential future extensions. For example, we can also use our novel hashing structure to design algorithms for computing other distance metrics such as Nearest Neighbor Interchange (NNI), Subtree Pruning and Regrafting (SPR), and Tree Bisection and Reconnection (TBR) distances. phylogenetic analysis evolutionary tree hash, consensus tree Robinson-Foulds distance
2	Neighbourhoods of Phylogenetic Trees: Exact and Asymptotic Counts de Jong, Jamie Victoria January 2015 (has links) A central theme in phylogenetics is the reconstruction and analysis of evolutionary trees from a given set of data. To determine the optimal search methods for the reconstruction of trees, it is crucial to understand the size and structure of neighbourhoods of trees under tree rearrangement operations. The diameter and size of the immediate neighbourhood of a tree has been well-studied, however little is known about the number of trees at distance two, three or (more generally) k from a given tree. In this thesis we explore previous results on the size of these neighbourhoods under common tree rearrangement operations (NNI, SPR and TBR). We obtain new results concerning the number of trees at distance k from a given tree under the Robinson-Foulds (RF) metric and the Nearest Neighbour Interchange (NNI) operation, and the number of trees at distance two from a given tree under the Subtree Prune and Regraft (SPR) operation. We also obtain an exact count for the number of pairs of binary phylogenetic trees that share a first RF or NNI neighbour. Phylogenetic tree splits Robinson-Foulds metric tree rearrangements asymptotics
3	Edit distance metrics for measuring dissimilarity between labeled gene trees Briand, Samuel 08 1900 (has links) Les arbres phylogénétiques sont des instruments de biologie évolutive offrant de formidables moyens d'étude pour la génomique comparative. Ils fournissent des moyens de représenter des mécanismes permettant de modéliser les relations de parenté entre les espèces ou les membres de familles de gènes en fonction de la diversité taxonomique, ainsi que des observations et des renseignements sur l'histoire évolutive, la structure et la variation des processus biologiques. Cependant, les méthodes traditionnelles d'inférence phylogénétique ont la réputation d'être sensibles aux erreurs. Il est donc indispensable de comparer les arbres phylogénétiques et de les analyser pour obtenir la meilleure interprétation des données biologiques qu'ils peuvent fournir. Nous commençons par aborder les travaux connexes existants pour déduire, comparer et analyser les arbres phylogénétiques, en évaluant leurs bonnes caractéristiques ainsi que leurs défauts, et discuter des pistes d'améliorations futures. La deuxième partie de cette thèse se concentre sur le développement de mesures efficaces et précises pour analyser et comparer des paires d'arbres génétiques avec des nœuds internes étiquetés. Nous montrons que notre extension de la métrique bien connue de Robinson-Foulds donne lieu à une bonne métrique pour la comparaison d'arbres génétiques étiquetés sous divers modèles évolutifs, et qui peuvent impliquer divers événements évolutifs. / Phylogenetic trees are instruments of evolutionary biology offering great insight for comparative genomics. They provide mechanisms to model the kinship relations between species or members of gene families as a function of taxonomic diversity. They also provide evidence and insights into the evolutionary history, structure, and variation of biological processes. However, traditional phylogenetic inference methods have the reputation to be prone to errors. Therefore, comparing and analysing phylogenetic trees is indispensable for obtaining the best interpretation of the biological information they can provide. We start by assessing existing related work to infer, compare, and analyse phylogenetic trees, evaluating their advantageous traits and flaws, and discussing avenues for future improvements. The second part of this thesis focuses on the development of efficient and accurate metrics to analyse and compare pairs of gene trees with labeled internal nodes. We show that our attempt in extending the popular Robinson-Foulds metric is useful for the preliminary analysis and comparison of labeled gene trees under various evolutionary models that may involve various evolutionary events. Distance d’édition Évolution Arbre génétique Arbre étiqueté Robinson-Foulds Métrique d’arbre Histoire de l’évolution Evolution Edit distance Gene tree Labeled tree Tree metric Evolutionary history

1

Page generated in 0.0381 seconds