Spelling suggestions: "subject:"phylogenetic reconstruction""
1 |
An Approximation Algorithm for Character Compatibility and Fast Quartet-based Phylogenetic Tree ComparisonTsang, John January 2000 (has links)
Phylogenetic analysis, or the inference of evolutionary history is done routinely by biologists and is one of the most important problems in systematic biology. In this thesis, we study two computational problems in the area. First, we study the evolutionary tree reconstruction problem under the character compatibility (CC) paradigm and give a polynomial time approximation scheme (PTAS) for a variation of the formulation called fractional character compatibility (FCC), which has been proven to be NP-hard. We also present a very simple algorithm called the Ordinal Split Method (OSM) to generate bipartitions given sequence data, which can be served as a front-end to the PTAS. The performance of the OSM and the validity of the FCC formulation are studied through simulation experiments. The second part of this thesis presents an efficient algorithm to compare evolutionary trees using the quartet metric. Different evolutionary hypothesis arises when different data sets are used or when different tree inference methods are applied to the same data set. Tree comparisons are routinely done by biologists to evaluate the quality of their tree inference experiments. The quartet metric has many desirable properties but its use has been hindered by its relatively heavy computational requirements. We address this problem by giving the first O(n^2) time algorithm to compute the quartet distance between two evolutionary trees.
|
2 |
An Approximation Algorithm for Character Compatibility and Fast Quartet-based Phylogenetic Tree ComparisonTsang, John January 2000 (has links)
Phylogenetic analysis, or the inference of evolutionary history is done routinely by biologists and is one of the most important problems in systematic biology. In this thesis, we study two computational problems in the area. First, we study the evolutionary tree reconstruction problem under the character compatibility (CC) paradigm and give a polynomial time approximation scheme (PTAS) for a variation of the formulation called fractional character compatibility (FCC), which has been proven to be NP-hard. We also present a very simple algorithm called the Ordinal Split Method (OSM) to generate bipartitions given sequence data, which can be served as a front-end to the PTAS. The performance of the OSM and the validity of the FCC formulation are studied through simulation experiments. The second part of this thesis presents an efficient algorithm to compare evolutionary trees using the quartet metric. Different evolutionary hypothesis arises when different data sets are used or when different tree inference methods are applied to the same data set. Tree comparisons are routinely done by biologists to evaluate the quality of their tree inference experiments. The quartet metric has many desirable properties but its use has been hindered by its relatively heavy computational requirements. We address this problem by giving the first O(n^2) time algorithm to compute the quartet distance between two evolutionary trees.
|
3 |
From Homologous Genes to Phylogenetic Species Trees: On Tree Representations of Binary RelationsWieseke, Nicolas 27 September 2017 (has links)
Orthology and paralogy distinguish whether a pair of genes originated by a speciation or a gene duplication event, whereas xenology refers to horizontal gene transfer. These concepts play a key role in phylogenomics and species tree inference is one of its prevalent tasks. Commonly, species tree inference is performed using sequence-based phylogenetic methods which heavily rely on the initial data sets to be solely composed of 1:1 orthologs. Such approaches are strongly restricted to a small set of genes that provide information about the species tree. In this work, it is shown that the restriction to 1:1 orthologs is not necessary to reconstruct a reliable hypothesis on the evolutionary history of species.
Besides orthology, knowledge on all three major driving forces of gene evolution can be considered: speciation, gene duplication, and horizontal gene transfer. The corresponding concepts of orthology, paralogy, and xenology imply binary relations on pairs of genes. These relations, in turn, convey meaningful phylogenetic information and allow the inference of plausible phylogenetic species trees.
To this end, it is shown that orthology, paralogy, and xenology have to fulfill certain mathematical properties. In particular, they have to be representable as a tree – the so-called gene tree. This work investigates the theoretical concepts of tree representable sets of binary relations to unfold the underlying mathematical structure. Various novel characterizations for those relations are given and the close connection between tree representable sets of binary relations and cographs, symbolic ultrametrics, and so-called unp 2-structures is revealed. Based on the novel characterizations, polynomial-time recognition algorithms for tree representable sets of relations are presented. In the case, a set of relations is tree representable, the corresponding tree representation can be found in polynomial time as well.
Moreover, for the NP-complete problems of editing a given set of relations to its closest tree representable set, exact algorithms are developed by means of formulations as integer linear program. Finally, all algorithms have been implemented in the software ParaPhylo, a species tree inference method based on orthology and paralogy data. It is demonstrated on simulated data sets, as well as real-life data sets, that non-trivial phylogenies can indeed be reconstructed from tree-free orthology estimates alone.
|
4 |
Fylogenetické studie polyploidního rodu Curcuma L. / Phylogenetic Studies in the Polyploid Genus Curcuma L.Záveská, Eliška January 2014 (has links)
1 Phylogenetic Studies in the Polyploid Genus Curcuma L. SUMMARY Curcuma is genetically one of the most complex genera within the tropical family Zingiberaceae, with hybridization and polyploidization being the major forces in its evolution. In this thesis, I have focused mainly on the genetic background of Curcuma species variation, relationships and overall genome structure, as a key to solve long standing taxonomic problems. Results of my molecular studies on the genus Curcuma performed since 2007 represent an extension of ongoing taxonomic and nomenclatural work started by Jana Leong- Škorničková in 2000. The first part of the thesis consists of a broad, general introduction to the subject to reflect the current state of knowledge, formulate the major problems to be confronted within the genus, and summarise the major results of the studies presented in the second part of the thesis. As the main obstacles in studying Curcuma are consequences of its reticulate evolution, it is also outlines the importance of understanding the genetic background and species relationships using molecular markers. Common molecular methods used for assessing phylogenetic relationships on the intraspecific and infrageneric levels - AFLP and sequencing of selected markers from cpDNA, nrDNA and nDNA - are described, with the...
|
5 |
Uma abordagem multi-objetivo e multimodal para reconstrução de arvores filogeneticas / A multimodal and multiobjective approach for phylogenetic trees reconstructionSilva, Ana Estela Antunes da, 1965- 12 December 2007 (has links)
Orientador: Fernando Jose Von Zuben / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-12T21:45:18Z (GMT). No. of bitstreams: 1
Silva_AnaEstelaAntunesda_D.pdf: 8601078 bytes, checksum: 494abd829c21ee91c2a7003c33fdf0a1 (MD5)
Previous issue date: 2007 / Resumo : A reconstrução de árvores filogenéticas pode ser interpretada como um processo sistemático de proposição de uma descrição arbórea para as diferenças relativas que se observam em conjuntos de atributos genéticos homólogos de espécies sob comparação. A árvore filogenética resultante apresenta uma certa topologia, ou padrão de ancestralidade, e os comprimentos dos ramos desta árvore são indicativos do número de mudanças evolutivas desde a divergência do ancestral comum. Tanto a topologia quanto os comprimentos de ramos são hipóteses descritivas de eventos não-observáveis e condicionais, razão pela qual tendem a existir diversas hipóteses de alta qualidade para a reconstrução, assim como múltiplos critérios de desempenho. Esta tese (i) aborda árvores sem raiz; (ii) enfatiza os critérios de quadrados mínimos, evolução mínima e máxima verossimilhança; (iii) propõe uma extensão ao algoritmo Neighbor Joining que oferece múltiplas hipóteses de alta qualidade para a reconstrução; e (iv) descreve e utiliza uma nova ferramenta para otimização multiobjetivo no contexto de reconstrução filogenética. São considerados dados artificiais e dados reais na apresentação de resultados, os quais apontam
vantagens e aspectos diferenciais das metodologias propostas / Abstract: The reconstruction of phylogenetic trees can be interpreted as a systematic process of proposing an arborean description to the relative dissimilarities observed among sets of homologous genetic attributes of species being compared. The resulting phylogenetic tree presents a certain topology, or ancestrality pattern, and the length of the edges of the tree will indicate the number of evolutionary changes since the divergence from the common ancestor. Both topology and edge lengths are descriptive hypotheses of non-observable and conditional events, which implies the existence of diverse high-quality hypotheses for the reconstruction, as long as multiple performance criteria. This thesis (i) deals with unrooted trees; (ii) emphasizes the least squares, minimum evolution, and maximum likelihood criteria; (iii) proposes an extension to the Neighbor Joining algorithm which offers multiple high-quality reconstruction hypotheses; and (iv) describes and uses a new tool for multiobjective optimization in the context of phylogenetic reconstruction. Artificial and real datasets are considered in the presentation of results, which points to some advantages and distinctive aspects of the proposed methodologies / Doutorado / Engenharia de Computação / Doutor em Engenharia Elétrica
|
6 |
Analysis of the subsequence composition of biosequencesCunial, Fabio 07 May 2012 (has links)
Measuring the amount of information and of shared information in biological strings, as well as relating information to structure, function and evolution, are fundamental computational problems in the post-genomic era. Classical analyses of the information content of biosequences are grounded in Shannon's statistical telecommunication theory, while the recent focus is on suitable specializations of the notions introduced by Kolmogorov, Chaitin and Solomonoff, based on data compression and compositional redundancy. Symmetrically, classical estimates of mutual information based on string editing are currently being supplanted by compositional methods hinged on the distribution of controlled substructures.
Current compositional analyses and comparisons of biological strings are almost exclusively limited to short sequences of contiguous solid characters. Comparatively little is known about longer and sparser components, both from the point of view of their effectiveness in measuring information and in separating biological strings from random strings, and from the point of view of their ability to classify and to reconstruct phylogenies. Yet, sparse structures are suspected to grasp long-range correlations and, at short range, they are known to encode signatures and motifs that characterize molecular families.
In this thesis, we introduce and study compositional measures based on the repertoire of distinct subsequences of any length, but constrained to occur with a predefined maximum gap between consecutive symbols. Such measures highlight previously unknown laws that relate subsequence abundance to string length and to the allowed gap, across a range of structurally and functionally diverse polypeptides. Measures on subsequences are capable of separating only few amino acid strings from their random permutations, but they reveal that random permutations themselves amass along previously undetected, linear loci. This is perhaps the first time in which the vocabulary of all distinct subsequences of a set of structurally and functionally diverse polypeptides is systematically counted and analyzed.
Another objective of this thesis is measuring the quality of phylogenies based on the composition of sparse structures. Specifically, we use a set of repetitive gapped patterns, called motifs, whose length and sparsity have never been considered before. We find that extremely sparse motifs in mitochondrial proteomes support phylogenies of comparable quality to state-of-the-art string-based algorithms. Moving from maximal motifs -- motifs that cannot be made more specific without losing support -- to a set of generators with decreasing size and redundancy, generally degrades classification, suggesting that redundancy itself is a key factor for the efficient reconstruction of phylogenies. This is perhaps the first time in which the composition of all motifs of a proteome is systematically used in phylogeny reconstruction on a large scale.
Extracting all maximal motifs, or even their compact generators, is infeasible for entire genomes. In the last part of this thesis, we study the robustness of measures of similarity built around the dictionary of LZW -- the variant of the LZ78 compression algorithm proposed by Welch -- and of some of its recently introduced gapped variants. These algorithms use a very small vocabulary, they perform linearly in the input strings, and they can be made even faster than LZ77 in practice. We find that dissimilarity measures based on maximal strings in the dictionary of LZW support phylogenies that are comparable to state-of-the-art methods on test proteomes. Introducing a controlled proportion of gaps does not degrade classification, and allows to discard up to 20% of each input proteome during comparison.
|
Page generated in 0.1199 seconds