Return to search

Linking phylogenetic models to population processes, from species trees to genomes

Phylogenetics is transitioning from a history of deep-time analyses with few genes to a future of full-genome data that allows species-level resolutions at deep and shallow time scales. Accompanying this transition is a new focus on demographic parameters like ancestral population sizes and gene flow events in addition to the bifurcating trees that are the cornerstone of the field. As access to more data has highlighted some shortcomings of traditional phylogenetic methods that do not account for the processes of recombination, selection, population size changes, and inter-species gene flow, the field is exploring new theory and methods to catch up with the data.

My thesis focuses on signals of demographic processes in genomic data. In exploring these processes, we attempt to avoid biases involved in simply extending old phylogenetic methods -- which have typically been applied to just a handful of genes -- to genomic datasets.

Chapter 1 introduces a tool, ipcoal, for simulating genomic data on phylogenetic trees within a framework that includes recombination and the ability to specify effective population sizes, gene flow events, recombination maps, and differences in generation times. This tool enables, to varying degrees, all further chapters.

Chapter 2 studies the effects of species tree demographic parameters on the resulting linkage among nearby local genealogies, including implications for gene tree and species tree inference.

Chapter 3 examines turnover in local histories along the genome using a theoretical framework, the MS-SMC, which links topological heterogeneity along the genome to species tree model.

Chapter 4 introduces simcat, a machine-learning method that uses genome-wide SNP data to infer admixture events on a phylogeny without relying on gene tree inference. This is an important step toward decreasing gene tree estimation error over deep evolutionary time scales. Behind the scenes, simcat uses ipcoal to train a machine learning model to map patterns in SNP data to the demographic scenarios that produced them.

These chapters demonstrate new phylogenetic theory and methods for refining our ability to infer historical processes at phylogenetic scales, while also illuminating the importance of population-scale processes like gene flow and recombination for shaping genomes sampled in the present day.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/baxm-1a27
Date January 2023
CreatorsMcKenzie, Patrick Franklin
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0023 seconds