Return to search

Fast and accurate estimation of large-scale phylogenetic alignments and trees

Phylogenetics is the study of evolutionary relationships.
Phylogenetic trees and alignments play important roles in a wide range
of biological research, including reconstruction of the Tree of Life
- the evolutionary history of all organisms on Earth - and the
development of vaccines and antibiotics.
Today's phylogenetic studies seek to reconstruct
trees and alignments on a greater number and variety of
organisms than ever before, primarily
due to exponential
growth in affordable sequencing and computing power.
The importance of
phylogenetic trees and alignments motivates the need for
methods to reconstruct them accurately and efficiently
on large-scale datasets.

Traditionally, phylogenetic studies proceed in two phases: first, an
alignment is produced from biomolecular sequences with differing
lengths, and, second, a tree is produced using the alignment. My
dissertation presents the first empirical performance study of leading
two-phase methods on datasets with up to hundreds of thousands of
sequences. Relatively accurate alignments and trees were obtained
using methods with high computational requirements on datasets with a
few hundred sequences, but as datasets grew past 1000 sequences and up
to tens of thousands of sequences, the set of methods capable of
analyzing a dataset diminished and only the methods with the lowest
computational requirements and lowest accuracy remained.

Alternatively, methods have been developed to simultaneously estimate
phylogenetic alignments and trees. Methods optimizing the treelength
optimization problem - the most widely-used approach for simultaneous
estimation - have not been shown to return more accurate trees and alignments
than two-phase approaches. I demonstrate that treelength optimization
under a particular class of optimization criteria represents
a promising means for inferring accurate trees
and alignments.
The other methods for simultaneous estimation are not known to
support analyses of datasets with a few hundred sequences due to their
high computational requirements.

The main contribution of my dissertation is SATe,
the first fast and accurate method for simultaneous
estimation of alignments and trees on datasets with up to several
thousand nucleotide sequences. SATe improves upon the alignment and
topological accuracy of all existing methods, especially
on the most difficult-to-align datasets, while retaining
reasonable computational requirements. / text

Identiferoai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/ETD-UT-2011-05-3489
Date06 July 2011
CreatorsLiu, Kevin Jensen
Source SetsUniversity of Texas
LanguageEnglish
Detected LanguageEnglish
Typethesis
Formatapplication/pdf

Page generated in 0.0021 seconds