• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Efficient Computation and Application of Maximum Agreement Forests

Whidden, Chris 29 July 2013 (has links)
Rampant lateral gene transfer (LGT) among prokaryotes, hybridization in plants and other reticulate evolutionary processes invalidate typical phylogenetic tree models by violating the assumption that organisms only inherit genetic information from a single parent species. Comparing the different evolutionary histories of multiple genes is necessary to identify and assess these processes. In this work I develop efficient approximation and fixed-parameter algorithms for computing rooted maximum agreement forests (MAFs) and maximum acyclic agreement forests (MAAFs) of pairs of phylogenetic trees. Their sizes correspond to the subtree-prune-and-regraft (SPR) distance and the hybridization number of these pairs of trees, which are important measures of the dissimilarity of phylogenies used in studying reticulate evolution. Although these MAFs and MAAFs are NP-hard to compute, my fixed-parameter algorithms are practical because they scale exponentially with the computed distance rather than the size of the trees. I contribute efficient fixed-parameter algorithms for computing MAFs and MAAFs of two binary rooted trees and give the first efficient fixed-parameter and approximation algorithms for computing MAFs of two multifurcating rooted trees. My open-source implementation of the MAF algorithms is orders of magnitude faster than previous approaches, reducing the time required to compute SPR distances of 46 between trees of 144 species to fractions of a second whereas previous approaches required hours to compute SPR distances of 25. These fast MAF-based distance metrics enable the construction of supertrees to reconcile a collection of gene trees and rapid inference of LGT. Simulations demonstrate that supertrees minimizing the SPR distance are more accurate than other supertree methods under plausible rates of LGT. I constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees covering 244 genomes from several major bacterial phyla and inferred "highways" of gene transfer between these bacterial classes and genera; a small number of these highways connect distantly related genera and can highlight specific genes implicated in long-distance LGT. These fast MAF algorithms are thus practical and enable new analyses of reticulate evolution.
2

Cuts and Partitions in Graphs/Trees with Applications

Fan, Jia-Hao 16 December 2013 (has links)
Both the maximum agreement forest problem and the multicut on trees problem are NP-hard, thus cannot be solved efficiently if P /=NP. The maximum agreement forest problem was motivated in the study of evolution trees in bioinformatics, in which we are given two leaf-labeled trees and are asked to find a maximum forest that is a subgraph of both trees. The multicuton trees problem has applications in networks, in which we are given a forest and a set of pairs of termianls and are asked to find a cut that separates all pairs of terminals. We develop combinatorial and algorithmic techniques that lead to improved parameterized algorithms, approximation algorithms, and kernelization algorithms for these problems. For the maximum agreement forest problem, we proceed from the bottommost level of trees and extend solutions to whole trees. With this technique, we show that the maxi- mum agreement forest problem is fixed-parameterized tractable in general trees, resolving an open problem in this area. We also provide the first constant ratio approximation algorithm for the problem in general trees. For the multicut on trees problem, we take a new look at the problem through the eyes of vertex cover problem. This connection allows us to develop an kernelization algorithm for the problem, which gives an upper bound of O(k3) on the kernel size, significantly improving the previous best upper bound O(k6). We further exploit this connection to give a parameterized algorithm for the problem that runs in time O∗ (1.62k), thus improving the previous best algorithm of running time O∗ (2k). In the protein complex prediction problem, which comes directly from the study of bioinformatics, we are given a protein-protein interaction network, and are asked to find dense regions in this graph. We formulate this problem as a graph clustering problem and develop an algorithm to refine the results for identifying protein complexes. We test our algorithm on yeast protein- protein interaction networks, and we show that our algorithm is able to identify complexes more accurately than other existing algorithms.

Page generated in 0.1077 seconds