1 |
Error and Uncertainty in Computational PhylogeneticsHanson-Smith, Victor, 1981- 12 1900 (has links)
xi, 119 p. : ill. (some col.) / The evolutionary history of protein families can be difficult to study because necessary ancestral molecules are often unavailable for direct observation. As an alternative, the field of computational phylogenetics has developed statistical methods to infer the evolutionary relationships among extant molecular sequences and their ancestral sequences. Typically, the methods of computational phylogenetic inference and ancestral sequence reconstruction are combined with other non-computational techniques in a larger analysis pipeline to study the inferred forms and functions of ancient molecules. Two big problems surrounding this analysis pipeline are computational error and statistical uncertainty. In this dissertation, I use simulations and analysis of empirical systems to show that phylogenetic error can be reduced by using an alternative search heuristic. I then use similar methods to reveal the relationship between phylogenetic uncertainty and the accuracy of ancestral sequence reconstruction. Finally, I provide a case-study of a molecular machine in yeast, to demonstrate all stages of the analysis pipeline.
This dissertation includes previously published co-authored material. / Committee in charge: John Conery, Chair;
Daniel Lowd, Member;
Sara Douglas, Member;
Joseph W. Thornton, Outside Member
|
2 |
Lineage specific evolution and phylogenetic analysis : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomathematics at Massey University, Palmerston North, New ZealandGrievink, Liat Shavit January 2009 (has links)
Phylogenetic models generally assume a homogeneous, time reversible, stationary process. These assumptions are often violated by the real, far more complex, evolutionary process. This thesis is centered on non-homogeneous, lineage-specific, properties of molecular sequences. It consist several related but independent studies. LineageSpecificSeqgen, an extension to the Seq-Gen program, which allows generation of sequences with changes in the proportion of variable sites, is introduced. This program is then used in a simulation study showing that changes in the proportion of variable sites can hinder tree estimation accuracy, and that tree reconstruction under the best-fit model chosen using a relative test can result in a wrong tree. In this case, the less commonly used absolute model-fit was a better predictor of tree estimation accuracy. This study found that increased taxon sampling of lineages that have undergone a change in the proportion of variable sites was critical for accurate tree reconstruction and that, in contrast to some earlier findings, the accuracy of maximum parsimony is adversely affected by such changes. This thesis also addresses the well-known long-branch attraction artifact. A nonparametric bootstrap test to identify changes in the substitution process is introduced, validated, and applied to the case of Microsporidia, a highly reduced intracellular parasite. Microsporidia was first thought to be an early branching eukaryote, but is now believed to be sister to, or included within, fungi. Its apparent basal eukaryote position is considered a result of long-branch attraction due to an elevated evolutionary rate in the microsporidian lineage. This study shows that long-branch estimates and basal positioning of Microsporidia both correlate with increased proportions of radical substitutions in the microsporidian lineage. In simulated data, such increased proportions of radical substitutions leads to erroneous long-branch estimates. These results suggest that the long microsporidian branch is likely to be a result of an increased proportion of radical substitutions on that branch, rather than increased evolutionary rate per se. The focus of the last study is the intriguing case of Mesostigma, a fresh water green alga for which contradicting phylogenetic relationships were inferred. While some studies placed Mesostigma within the Streptophyta lineage (which includes land plants), others placed it as the deepest green algae divergence. This basal positioning is regarded as a result of long-branch attraction due to poor taxon sampling. Reinvestigation of a 13- taxon mitochondrial amino acid dataset and a sub-dataset of 8 taxa reveals that site sampling, and in particular the treatment of missing data, is just as important a factor for accurate tree reconstruction as taxon sampling. This study identifies a difficulty in recreating the long-branch attraction observed for the 8-taxon dataset in simulated data. The cause is likely to be the smaller number of amino acid characters per site in simulated data compared to real data, highlighting the fact that there are properties of the evolutionary process that are yet to be accurately modeled.
|
3 |
Lineage specific evolution and phylogenetic analysis : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomathematics at Massey University, Palmerston North, New ZealandGrievink, Liat Shavit January 2009 (has links)
Phylogenetic models generally assume a homogeneous, time reversible, stationary process. These assumptions are often violated by the real, far more complex, evolutionary process. This thesis is centered on non-homogeneous, lineage-specific, properties of molecular sequences. It consist several related but independent studies. LineageSpecificSeqgen, an extension to the Seq-Gen program, which allows generation of sequences with changes in the proportion of variable sites, is introduced. This program is then used in a simulation study showing that changes in the proportion of variable sites can hinder tree estimation accuracy, and that tree reconstruction under the best-fit model chosen using a relative test can result in a wrong tree. In this case, the less commonly used absolute model-fit was a better predictor of tree estimation accuracy. This study found that increased taxon sampling of lineages that have undergone a change in the proportion of variable sites was critical for accurate tree reconstruction and that, in contrast to some earlier findings, the accuracy of maximum parsimony is adversely affected by such changes. This thesis also addresses the well-known long-branch attraction artifact. A nonparametric bootstrap test to identify changes in the substitution process is introduced, validated, and applied to the case of Microsporidia, a highly reduced intracellular parasite. Microsporidia was first thought to be an early branching eukaryote, but is now believed to be sister to, or included within, fungi. Its apparent basal eukaryote position is considered a result of long-branch attraction due to an elevated evolutionary rate in the microsporidian lineage. This study shows that long-branch estimates and basal positioning of Microsporidia both correlate with increased proportions of radical substitutions in the microsporidian lineage. In simulated data, such increased proportions of radical substitutions leads to erroneous long-branch estimates. These results suggest that the long microsporidian branch is likely to be a result of an increased proportion of radical substitutions on that branch, rather than increased evolutionary rate per se. The focus of the last study is the intriguing case of Mesostigma, a fresh water green alga for which contradicting phylogenetic relationships were inferred. While some studies placed Mesostigma within the Streptophyta lineage (which includes land plants), others placed it as the deepest green algae divergence. This basal positioning is regarded as a result of long-branch attraction due to poor taxon sampling. Reinvestigation of a 13- taxon mitochondrial amino acid dataset and a sub-dataset of 8 taxa reveals that site sampling, and in particular the treatment of missing data, is just as important a factor for accurate tree reconstruction as taxon sampling. This study identifies a difficulty in recreating the long-branch attraction observed for the 8-taxon dataset in simulated data. The cause is likely to be the smaller number of amino acid characters per site in simulated data compared to real data, highlighting the fact that there are properties of the evolutionary process that are yet to be accurately modeled.
|
4 |
Lineage specific evolution and phylogenetic analysis : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biomathematics at Massey University, Palmerston North, New ZealandGrievink, Liat Shavit January 2009 (has links)
Phylogenetic models generally assume a homogeneous, time reversible, stationary process. These assumptions are often violated by the real, far more complex, evolutionary process. This thesis is centered on non-homogeneous, lineage-specific, properties of molecular sequences. It consist several related but independent studies. LineageSpecificSeqgen, an extension to the Seq-Gen program, which allows generation of sequences with changes in the proportion of variable sites, is introduced. This program is then used in a simulation study showing that changes in the proportion of variable sites can hinder tree estimation accuracy, and that tree reconstruction under the best-fit model chosen using a relative test can result in a wrong tree. In this case, the less commonly used absolute model-fit was a better predictor of tree estimation accuracy. This study found that increased taxon sampling of lineages that have undergone a change in the proportion of variable sites was critical for accurate tree reconstruction and that, in contrast to some earlier findings, the accuracy of maximum parsimony is adversely affected by such changes. This thesis also addresses the well-known long-branch attraction artifact. A nonparametric bootstrap test to identify changes in the substitution process is introduced, validated, and applied to the case of Microsporidia, a highly reduced intracellular parasite. Microsporidia was first thought to be an early branching eukaryote, but is now believed to be sister to, or included within, fungi. Its apparent basal eukaryote position is considered a result of long-branch attraction due to an elevated evolutionary rate in the microsporidian lineage. This study shows that long-branch estimates and basal positioning of Microsporidia both correlate with increased proportions of radical substitutions in the microsporidian lineage. In simulated data, such increased proportions of radical substitutions leads to erroneous long-branch estimates. These results suggest that the long microsporidian branch is likely to be a result of an increased proportion of radical substitutions on that branch, rather than increased evolutionary rate per se. The focus of the last study is the intriguing case of Mesostigma, a fresh water green alga for which contradicting phylogenetic relationships were inferred. While some studies placed Mesostigma within the Streptophyta lineage (which includes land plants), others placed it as the deepest green algae divergence. This basal positioning is regarded as a result of long-branch attraction due to poor taxon sampling. Reinvestigation of a 13- taxon mitochondrial amino acid dataset and a sub-dataset of 8 taxa reveals that site sampling, and in particular the treatment of missing data, is just as important a factor for accurate tree reconstruction as taxon sampling. This study identifies a difficulty in recreating the long-branch attraction observed for the 8-taxon dataset in simulated data. The cause is likely to be the smaller number of amino acid characters per site in simulated data compared to real data, highlighting the fact that there are properties of the evolutionary process that are yet to be accurately modeled.
|
Page generated in 0.0698 seconds