Global ETD Search

1	Large-scale analysis of phylogenetic search behavior Park, Hyun Jung 15 May 2009 (has links) Phylogenetic analysis is used in all branches of biology by inferring evolutionary trees. Applications include designing more effective drugs, tracing the transmission of deadly viruses, and guiding conservation and biodiversity efforts. Most analyses rely on effective heuristics for obtaining accurate trees. However, relatively little work has been done to analyze quantitatively the behavior of phylogenetic heuristics in tree space. This is important, because a better understanding of local search behavior can facilitate the design of better heuristics, which ultimately leads to more accurate depictions of the true evolutionary relationships. In order to access and analyze the tree search space, we implement an effec- tive local search heuristic. Having an effective heuristic that can open the space is important, since no search heuristic in this field can effectively provide data collec- tion control. So we have implemented and estimated a search heuristic, Simple Local Search or SLS, that works reasonably well in the space. Our investigations led to several interesting observations about the behavior of a search heuristic and the tree search space. We studied the correlation of tree features of search path trees, where tree features refer to the parsimony score, the Robinson- Foulds distance and the homoplasy measure. Most importantly from the results, parsimony score was highly correlated with Robinson-Foulds distance only in trees that lie on the search path to a local optimum. We also note that the scores of neighborhoods along search paths improve together, as a local search progresses. Correlations of tree features of search path trees are particularly useful in char- acterizing and controlling a search path. This paper proposes one possible stopping criterion to maximize the tree search results while minimizing computational time tested on three biological datasets using the correlation between the parsimony score and the RF distance value of search path trees. Also, the observation that scores of a neighborhood on a search path improve together gives us a significant amount of flexibility in selecting the next pivot of a search without losing performance. Eventually, our long-term goal is developing an effective search heuristic that can deal with large scale tree space in reasonable time. Improved knowledge about the tree search space and the search heuristic can provide a reasonable starting point toward the goal. phylogenetic trees maximum parsimony
2	Novel Mathematical Aspects of Phylogenetic Estimation Fischer, Mareike January 2009 (has links) In evolutionary biology, genetic sequences carry with them a trace of the underlying tree that describes their evolution from a common ancestral sequence. Inferring this underlying tree is challenging. We investigate some curious cases in which different methods like Maximum Parsimony, Maximum Likelihood and distance-based methods lead to different trees. Moreover, we state that in some cases, ancestral sequences can be more reliably reconstructed when some of the leaves of the tree are ignored - even if these leaves are close to the root. While all these findings show problems inherent to either the assumed model or the applied method, sometimes an inaccurate tree reconstruction is simply due to insufficient data. This is particularly problematic when a rapid divergence event occurred in the distant past. We analyze an idealized form of this problem and determine a tight lower bound on the growth rate for the sequence length required to resolve the tree (independent of any particular branch length). Finally, we investigate the problem of intermediates in the fossil record. The extent of ‘gaps’ (missing transitional stages) has been used to argue against gradual evolution from a common ancestor. We take an analytical approach and demonstrate why, under certain sampling conditions, we may not expect intermediates to be found. phylogenetics maximum parsimony maximum likelihood fossils
3	Novel Mathematical Aspects of Phylogenetic Estimation Fischer, Mareike January 2009 (has links) In evolutionary biology, genetic sequences carry with them a trace of the underlying tree that describes their evolution from a common ancestral sequence. Inferring this underlying tree is challenging. We investigate some curious cases in which different methods like Maximum Parsimony, Maximum Likelihood and distance-based methods lead to different trees. Moreover, we state that in some cases, ancestral sequences can be more reliably reconstructed when some of the leaves of the tree are ignored - even if these leaves are close to the root. While all these findings show problems inherent to either the assumed model or the applied method, sometimes an inaccurate tree reconstruction is simply due to insufficient data. This is particularly problematic when a rapid divergence event occurred in the distant past. We analyze an idealized form of this problem and determine a tight lower bound on the growth rate for the sequence length required to resolve the tree (independent of any particular branch length). Finally, we investigate the problem of intermediates in the fossil record. The extent of ‘gaps’ (missing transitional stages) has been used to argue against gradual evolution from a common ancestor. We take an analytical approach and demonstrate why, under certain sampling conditions, we may not expect intermediates to be found. phylogenetics maximum parsimony maximum likelihood fossils
4	Phylogeny of Five Taxa in the Felsenstein and Farris Zones Lam, Eric Trung 18 March 2021 (has links) Mathematical conditions which showed where parsimony was not consistent for four taxa were first introduced by Felsenstein in 1978. This was subsequently labelled the "Felsenstein zone". Following Felsenstein's findings, 'frequentists' conjectured that for five taxa there would also be a region in parameter space where parsimony is not consistent. In response, 'cladists' claimed that parsimony was consistent in a different region of parameter space, which is called the "Farris zone". However, no analytical description of the region in which this consistency occurs has been made. Furthermore, no mathematical extensions of this Felsenstein theory to five taxa or more has been made. The same is true for the Farris zone. In this thesis, we give a complete account for the Felsenstein zone and Farris zone for four and five taxa and interpret these in terms of the shape of the phylogenetic tree. Phylogeny Maximum Parsimony Consistency Felsenstein Zone Farris Zone
5	Advanced methods to solve the maximum parsimony problem / Méthodes avancées pour la résolution du problème de maximum parcimonie Vazquez ortiz, Karla Esmeralda 14 June 2016 (has links) La reconstruction phylogénétique est considérée comme un élément central de divers domaines comme l’écologie, la biologie et la physiologie moléculaire pour lesquels les relations généalogiques entre séquences d’espèces ou de gènes, représentées sous forme d’arbres, peuvent apporter des éclairages significatifs à la compréhension de phénomènes biologiques. Le problème de Maximum de Parcimonie est une approche importante pour résoudre la reconstruction phylogénétique en se basant sur un critère d’optimalité pour lequel l’arbre comprenant le moins de mutations est préféré. Dans cette thèse nous proposons différentes méthodes pour s’attaquer à la nature combinatoire de ce problème NP-complet. Premièrement, nous présentons un algorithme de Recuit Simulé compétitif qui nous a permis de trouver des solutions de meilleure qualité pour un ensemble de problèmes. Deuxièmement, nous proposons une nouvelle technique de Path-Relinking qui semble intéressante pour comparer des arbres mais pas pour trouver des solutions de meilleure qualité. Troisièmement, nous donnons le code d’une implantation sur GPU de la fonction objectif dont l’intérêt est de réduire le temps d’exécution de la recherche pour des instances dont la longueur des séquences est importante. Finalement, nous introduisons un prédicteur capable d’estimer le score optimum pour un vaste ensemble d’instances avec une très grande précision. / Phylogenetic reconstruction is considered a central underpinning of diverse fields like ecology, molecular biology and physiology where genealogical relationships of species or gene sequences represented as trees can provide the most meaningful insights into biology. Maximum Parsimony (MP) is an important approach to solve the phylogenetic reconstruction based on an optimality criterion under which the tree that minimizes the total number of genetic transformations is preferred. In this thesis we propose different methods to cope with the combinatorial nature of this NP-complete problem. First we present a competitive Simulated Annealing algorithm which helped us find trees of better parsimony score than the ones that were known for a set of instances. Second, we propose a Path-Relinking technique that appears to be suitable for tree comparison but not for finding trees of better quality. Third, we give a GPU implementation of the objective function of the problem that can reduce the runtime for instances that have an important number of residues per taxon. Finally, we introduce a predictor that is able to estimate the best parsimony score of a huge set of instances with a high accuracy. Reconstruction phylogénétique Parcimonie maximum Optimisation combinatoire Recuit Simulé Phylogenetics Maximum parsimony Combinatorial optimization Simulated annealing 510
6	Relative Timing of Intron Gain and a New Marker for Phylogenetic Analyses Lehmann, Jörg 12 June 2014 (has links) (PDF) Despite decades of effort by molecular systematists, the trees of life of eukaryotic organisms still remain partly unresolved or in conflict with each other. An ever increasing number of fully-sequenced genomes of various eukaryotes allows to consider gene and species phylogenies at genome-scale. However, such phylogenomics-based approaches also revealed that more taxa and more and more gene sequences are not the ultimate solution to fully resolve these conflicts, and that there is a need for sequence-independent phylogenetic meta-characters that are derived from genome sequences. Spliceosomal introns are characteristic features of eukaryotic nuclear genomes. The relatively rare changes of spliceosomal intron positions have already been used as genome-level markers, both for the estimation of intron evolution and phylogenies, however with variable success. In this thesis, a specific subset of these changes is introduced and established as a novel phylogenetic marker, termed near intron pair (NIP). These characters are inferred from homologous genes that contain mutually-exclusive intron presences at pairs of coding sequence (CDS) positions in close proximity. The idea that NIPs are powerful characters is based on the assumption that both very small exons and multiple intron gains at the same position are rare. To obtain sufficient numbers of NIP character data from genomic and alignment data sets in a consistent and flexible way, the implementation of a computational pipeline was a main goal of this work. Starting from orthologous (or more general: homologous) gene datasets comprising genomic sequences and corresponding CDS transcript annotations, the multiple alignment generation is an integral part of this pipeline. The alignment can be calculated at the amino acid level utilizing external tools (e.g. transAlign) and results in a codon alignment via back-translation. Guided by the multiple alignment, the positionally homologous intron positions should become apparent when mapped individually for each transcript. The pipeline proceeds at this stage to output portions of the intron-annotated alignment that contain at least one candidate of a NIP character. In a subsequent pipeline script, these collected so-called NIP region files are finally converted to binary state characters representing valid NIPs in dependence of quality filter constraints concerning, e.g., the amino acid alignment conservation around intron loci and splice sites, to name a few. The computational pipeline tools provide the researcher to elaborate on NIP character matrices that can be used for tree inference, e.g., using the maximum parsimony approach. In a first NIP-based application, the phylogenetic position of major orders of holometabolic insects (more specifically: the Coleoptera-Hymenoptera-Mecopterida trifurcation) was evaluated in a cladistic sense. As already suggested during a study on the eIF2gamma gene based on two NIP cases (Krauss et al. 2005), the genome-scale evaluation supported Hymenoptera as sister group to an assemblage of Coleoptera and Mecopterida, in agreement with other studies, but contradicting the previously established view. As part of the genome paper describing a new species of twisted-wing parasites (Strepsiptera), the NIP method was employed to help to resolve the phylogenetic position of them within (holometabolic) insects. Together with analyses of sequence patterns and a further meta-character, it revealed twisted-wing parasites as being the closest relatives of the mega-diverse beetles. NIP-based reconstructions of the metazoan tree covering a broad selection of representative animal species also identified some weaknesses of the NIP approach that may suffer e.g. from alignment/ortholog prediction artifacts (depending on the depth of range of taxa) and systematic biases (long branch attraction artifacts, due to unequal evolutionary rates of intron gain/loss and the use of the maximum parsimony method). In a further study, the identification of NIPs within the recently diverged genus Drosophila could be utilized to characterize recent intron gain events that apparently involved several cases of intron sliding and tandem exon duplication, albeit the mechanisms of gain for the majority of cases could not be elucidated. Finally, the NIP marker could be established as a novel phylogenetic marker, in particular dedicated to complementarily explore the wealth of genome data for phylogenetic purposes and to address open questions of intron evolution. Phylogenetischer Marker Intron-Position Intron-Evolution phylogenetic marker intron position intron evolution maximum parsimony rare genomic change ddc:500
7	A Systematic Revision of the Carex Nardina Complex (Cyperaceae) Sawtell, Wayne MacLeod January 2012 (has links) The Carex nardina complex is a group of one to three species (C. nardina, C. hepburnii, C. stantonensis) and six taxa of unispicate sedges (Cyperaceae), the taxonomy of which has been controversial since the 1800s. As initial DNA phylogenies suggested that the complex was nested within Carex section Filifoliae and sister to C. elynoides, a species often confused with C. nardina and sympatric with it in the western North American Cordillera, analyses were conducted to determine whether C. hepburnii, C. stantonensis and other infraspecific taxa could be the result of hybridization. Morphometric and molecular analyses found no substantial evidence for hybridization and supported the recognition of no taxon beyond C. nardina. Consequently, this study concludes that the complex comprises a single variable species, Carex nardina, distributed throughout arctic North America south through the western Cordillera to New Mexico with a minor portion of its range in northeastern Russia, northwestern Scandinavia and Iceland. phylogenetic analysis taxonomy species complex arctic-alpine morphometric Cyperaceae Cariceae Carex principal components analysis maximum parsimony Bayesian analysis evolution
8	Enhance the understanding of whole-genome evolution by designing, accelerating and parallelizing phylogenetic algorithms Yin, Zhaoming 22 May 2014 (has links) The advent of new technology enhance the speed and reduce the cost for sequencing biological data. Making biological sense of this genomic data is a big challenge to the algorithm design as well as the high performance computing society. There are many problems in Bioinformatics, such as how new functional genes arise, why genes are organized into chromosomes, how species are connected through the evolutionary tree of life, or why arrangements are subject to change. Phylogenetic analyses have become essential to research on the evolutionary tree of life. It can help us to track the history of species and the relationship between different genes or genomes through millions of years. One of the fundamentals for phylogenetic construction is the computation of distances between genomes. Since there are much more complicated combinatoric patterns in rearrangement events, the distance computation is still a hot topic as much belongs to mathematics as to biology. For the distance computation with input of two genomes containing unequal gene contents (with insertions/deletions and duplications) the problem is especially hard. In this thesis, we will discuss about our contributions to the distance estimation for unequal gene order data. The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. For genomes with unequal contents, to the best of our knowledge, there is no algorithm that can help to find the median. In this thesis, we make our contributions to the median computation in two aspects. 1) Algorithm engineering aspect, we harness the power of streaming graph analytics methods to implement an exact DCJ median algorithm which run as fast as the heuristic algorithm and can help construct a better phylogenetic tree. 2) Algorithmic aspect, we theoretically formulate the problem of finding median with input of genomes having unequal gene content, which leads to the design and implementation of an efficient Lin-Kernighan heuristic based median algorithm. Inferring phylogenies (evolutionary history) of a set of given species is the ultimate goal when the distance and median model are chosen. For more than a decade, biologists and computer scientists have studied how to infer phylogenies by the measurement of genome rearrangement events using gene order data. While evolution is not an inherently parsimonious process, maximum parsimony (MP) phylogenetic analysis has been supported by widely applied to the phylogeny inference to study the evolutionary patterns of genome rearrangements. There are generally two problems with the MP phylogenetic arose by genome rearrangement: One is, given a set of modern genomes, how to compute the topologies of the according phylogenetic tree; Another is, given the topology of a model tree, how to infer the gene orders of the ancestor species. To assemble a MP phylogenetic tree constructor, there are multiple NP hard problems involved, unfortunately, they organized as one problem on top of other problems. Which means, to solve a NP hard problem, we need to solve multiple NP hard sub-problems. For phylogenetic tree construction with the input of unequal content genomes, there are three layers of NP hard problems. In this thesis, we will mainly discuss about our contributions to the design and implementation of the software package DCJUC (Phylogeny Inference using DCJ model to cope with Unequal Content Genomes), that can help to achieve both of these two goals. Aside from the biological problems, another issue we need to concern is about the use of the power of parallel computing to assist accelerating algorithms to handle huge data sets, such as the high resolution gene order data. For one thing, all of the method to tackle with phylogenetic problems are based on branch and bound algorithms, which are quite irregular and unfriendly to parallel computing. To parallelize these algorithms, we need to properly enhance the efficiency for localized memory access and load balance methods to make sure that each thread can put their potentials into full play. For the other, there is a revolution taking place in computing with the availability of commodity graphical processors such as Nvidia GPU and with many-core CPUs such as Cray-XMT, or Intel Xeon Phi Coprocessor with 60 cores. These architectures provide a new way for us to achieve high performance at much lower cost. However, code running on these machines are not so easily programmed, and scientific computing is hard to tune well on them. We try to explore the potentials of these architectures to help us accelerate branch and bound based phylogenetic algorithms. Maximum parsimony phylogeny DCJ distance DCJ median Parallel branch-and-bound Algorithms Phylogeny Cladistic analysis Nucleotide sequence Sequence alignment (Bioinformatics) Genomics Combinatorial optimization
9	Relative Timing of Intron Gain and a New Marker for Phylogenetic Analyses Lehmann, Jörg 12 February 2014 (has links) Despite decades of effort by molecular systematists, the trees of life of eukaryotic organisms still remain partly unresolved or in conflict with each other. An ever increasing number of fully-sequenced genomes of various eukaryotes allows to consider gene and species phylogenies at genome-scale. However, such phylogenomics-based approaches also revealed that more taxa and more and more gene sequences are not the ultimate solution to fully resolve these conflicts, and that there is a need for sequence-independent phylogenetic meta-characters that are derived from genome sequences. Spliceosomal introns are characteristic features of eukaryotic nuclear genomes. The relatively rare changes of spliceosomal intron positions have already been used as genome-level markers, both for the estimation of intron evolution and phylogenies, however with variable success. In this thesis, a specific subset of these changes is introduced and established as a novel phylogenetic marker, termed near intron pair (NIP). These characters are inferred from homologous genes that contain mutually-exclusive intron presences at pairs of coding sequence (CDS) positions in close proximity. The idea that NIPs are powerful characters is based on the assumption that both very small exons and multiple intron gains at the same position are rare. To obtain sufficient numbers of NIP character data from genomic and alignment data sets in a consistent and flexible way, the implementation of a computational pipeline was a main goal of this work. Starting from orthologous (or more general: homologous) gene datasets comprising genomic sequences and corresponding CDS transcript annotations, the multiple alignment generation is an integral part of this pipeline. The alignment can be calculated at the amino acid level utilizing external tools (e.g. transAlign) and results in a codon alignment via back-translation. Guided by the multiple alignment, the positionally homologous intron positions should become apparent when mapped individually for each transcript. The pipeline proceeds at this stage to output portions of the intron-annotated alignment that contain at least one candidate of a NIP character. In a subsequent pipeline script, these collected so-called NIP region files are finally converted to binary state characters representing valid NIPs in dependence of quality filter constraints concerning, e.g., the amino acid alignment conservation around intron loci and splice sites, to name a few. The computational pipeline tools provide the researcher to elaborate on NIP character matrices that can be used for tree inference, e.g., using the maximum parsimony approach. In a first NIP-based application, the phylogenetic position of major orders of holometabolic insects (more specifically: the Coleoptera-Hymenoptera-Mecopterida trifurcation) was evaluated in a cladistic sense. As already suggested during a study on the eIF2gamma gene based on two NIP cases (Krauss et al. 2005), the genome-scale evaluation supported Hymenoptera as sister group to an assemblage of Coleoptera and Mecopterida, in agreement with other studies, but contradicting the previously established view. As part of the genome paper describing a new species of twisted-wing parasites (Strepsiptera), the NIP method was employed to help to resolve the phylogenetic position of them within (holometabolic) insects. Together with analyses of sequence patterns and a further meta-character, it revealed twisted-wing parasites as being the closest relatives of the mega-diverse beetles. NIP-based reconstructions of the metazoan tree covering a broad selection of representative animal species also identified some weaknesses of the NIP approach that may suffer e.g. from alignment/ortholog prediction artifacts (depending on the depth of range of taxa) and systematic biases (long branch attraction artifacts, due to unequal evolutionary rates of intron gain/loss and the use of the maximum parsimony method). In a further study, the identification of NIPs within the recently diverged genus Drosophila could be utilized to characterize recent intron gain events that apparently involved several cases of intron sliding and tandem exon duplication, albeit the mechanisms of gain for the majority of cases could not be elucidated. Finally, the NIP marker could be established as a novel phylogenetic marker, in particular dedicated to complementarily explore the wealth of genome data for phylogenetic purposes and to address open questions of intron evolution. info:eu-repo/classification/ddc/500 ddc:500
10	Evoluce velikosti mozku u letounů (Chiroptera) / Evolution of brain size in bats (Chiroptera) Králová, Zuzana January 2010 (has links) According to the prevailing doctrine, brain size has mainly increased throughout the evolution of mammals and reductions in brain size were rare. On the other hand, energetic costs of developing and maintaining big brain are high, so brain size reduction should occur every time when the respective selective pressure is present. Modern phylogenetic methods make it possible to test the presence of evolutionary trend and to infer the ancestral values of the trait in question based on knowledge of phylogeny and trait values for recent species. However, this approach has been rarely applied to study brain evolution so far. In this thesis, I focus on bats (Chiroptera). Bats are a suitable group for demonstrating the importance of brain size reductions. Considering their energetically demanding mode of locomotion, they are likely to have been under selection pressure for brain reduction. Furthermore, there is a large amount of data on body and brain mass of recent species available. Finally, phylogenetic relationships among bats are relatively well resolved. My present study is based on body masses and brain masses of 334 recent bat species (Baron et al., 1996) and on a phylogeny obtained by adjusting existing bat supertree (Jones et al., 2002) according to recent molecular studies. Analysing the data for...

Search results