Return to search

Genealogy Reconstruction: Methods and applications in cancer and wild populations

Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding.
In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors.
In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v
Acknowledgments vii
1 Introduction 1
2 Cancer Phylogenies 7
2.1 Introduction..................................... 7
2.2 Background..................................... 9
2.2.1 PhylogeneticTrees............................. 9
2.2.2 Microarrays................................. 10
2.3 Methods....................................... 11
2.3.1 Datasetcompilation ............................ 11
2.3.2 Statistical Methods and Analysis..................... 13
2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15
2.4 Results........................................ 16
2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16
2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28
2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30
2.5 Discussion...................................... 32
3 Wild Pedigrees 35
3.1 Introduction..................................... 35
3.2 The molecular ecologist’s tools of the trade ................... 36
3.2.1 3.2.2 3.2.3
3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37
3.2.2 Parentage and paternity inference .................... 39
3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40
3.3 Background..................................... 40
3.3.1 Pedigrees .................................. 40
3.3.2 Genotypes.................................. 41
3.3.3 Mendelian segregation probability .................... 41
3.3.4 LOD Scores................................. 43
3.3.5 Genotyping Errors ............................. 43
3.3.6 IBD coefficients............................... 45
3.3.7 Bayesian MCMC.............................. 46
3.4 Methods....................................... 47
3.4.1 Likelihood Model.............................. 47
3.4.2 Efficient Likelihood Calculation...................... 49
3.4.3 Maximum Likelihood Pedigree ...................... 51
3.4.4 Full siblings................................. 52
3.4.5 Algorithm.................................. 53
3.4.6 Missing Values ............................... 56
3.4.7 Allelefrequencies.............................. 58
3.4.8 Rates of Self-fertilization.......................... 60
3.4.9 Rates of Clonality ............................. 60
3.5 Results........................................ 61
3.5.1 Real Microsatellite Data.......................... 61
3.5.2 Simulated Human Population....................... 62
3.5.3 SimulatedClonalPlantPopulation.................... 64
3.6 Discussion...................................... 71
4 Conclusions 77
A FRANz 79
A.1 Availability ..................................... 79
A.2 Input files...................................... 79
A.2.1 Maininputfile ............................... 79
A.2.2 Knownrelationships ............................ 80
A.2.3 Allele frequencies.............................. 81
A.2.4 Sampling locations............................. 82
A.3 Output files..................................... 83
A.4 Web 2.0 Interface.................................. 86
List of Figures 87
List of Tables 88
List Abbreviations 90
Bibliography 92
Curriculum Vitae I

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:11028
Date23 June 2010
CreatorsRiester, Markus
ContributorsStadler, Peter F., Bryant, David, Universität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typedoc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0031 seconds