Spelling suggestions: "subject:"phylogenetic networks"" "subject:"hylogenetic networks""
1 |
A Bayesian approach to phylogenetic networksRadice, Rosalba January 2011 (has links)
Traditional phylogenetic inference assumes that the history of a set of taxa can be explained by a tree. This assumption is often violated as some biological entities can exchange genetic material giving rise to non-treelike events often called reticulations. Failure to consider these events might result in incorrectly inferred phylogenies, and further consequences, for example stagnant and less targeted drug development. Phylogenetic networks provide a flexible tool which allow us to model the evolutionary history of a set of organisms in the presence of reticulation events. In recent years, a number of methods addressing phylogenetic network reconstruction and evaluation have been introduced. One of suchmethods has been proposed byMoret et al. (2004). They defined a phylogenetic network as a directed acyclic graph obtained by positing a set of edges between pairs of the branches of an underlying tree to model reticulation events. Recently, two works by Jin et al. (2006), and Snir and Tuller (2009), respectively, using this definition of phylogenetic network, have appeared. Both works demonstrate the potential of using maximum likelihood estimation for phylogenetic network reconstruction. We propose a Bayesian approach to the estimation of phylogenetic network parameters. We allow for different phylogenies to be inferred at different parts of our DNA alignment in the presence of reticulation events, at the species level, by using the idea that a phylogenetic network can be naturally decomposed into trees. A Markov chainMonte Carlo algorithmis provided for posterior computation of the phylogenetic network parameters. Also a more general algorithm is proposed which allows the data to dictate how many phylogenies are required to explain the data. This can be achieved by using stochastic search variable selection. Both algorithms are tested on simulated data and also demonstrated on the ribosomal protein gene rps11 data from five flowering plants. The proposed approach can be applied to a wide variety of problems which aim at exploring the possibility of reticulation events in the history of a set of taxa.
|
2 |
Stochastic tree models and probabilistic modelling of gene trees of given species networksZhu, Sha (Joe) January 2013 (has links)
In the pre-genomic era, the relationships among species and their evolutionary histories were often determined by examining the fossil records. In the genomic era, these relationships are identified by analysing the genetic data, which also enables us to take a close-up view of the differences between the individual samples. Nevertheless, these relationships are often described by a tree-like structure or a network. In this thesis, we investigate some of the models that are used to describe these relationships.
This thesis can be divided into two main parts. The first part focuses on investigating the theoretical properties of several neutral tree models that are often considered in phylogenetics and population genetics studies, such as the Yule–Harding model, the proportional to distinguishable arrangements and the Kingman coalescent models.
In comparison to the first part, the other half of the thesis is more computationally
oriented: we focus on developing and implementing methods of calculating gene tree probabilities of given species networks, and simulating genealogies within species networks.
|
3 |
Méthodes combinatoires de reconstruction de réseaux phylogénétiques / Combinatorial Methods for Phylogenetic Network ReconstructionGambette, Philippe 30 November 2010 (has links)
Les réseaux phylogénétiques généralisent le modèle de l'arbre pour décrire l'évolution, en permettant à des arêtes entre les branches de l'arbre d'exprimer des échanges de matériel génétique entre espèces coexistantes. De nombreuses approches combinatoires - fondées sur la manipulation d'ensembles finis d'objets mathématiques - ont été conçues pour reconstruire ces réseaux à partir de données extraites de plusieurs arbres de gènes contradictoires. Elles se divisent en plusieurs catégories selon le type de données en entrées (triplets, quadruplets, clades ou bipartitions) et les restrictions de structure sur les réseaux reconstruits. Nous analysons en particulier la structure d'une classe de réseaux restreints, les réseaux de niveau k, et adaptons ce paramètre de niveau au contexte non enraciné. Nous donnons aussi de nouvelles méthodes combinatoires pour reconstruire des réseaux phylogénétiques, à partir de clades - méthode implémentée dans le logiciel Dendroscope - ou de quadruplets. Nous étudions les limites de ces méthodes combinatoires (explosion de complexité, bruit et silence dans les données, ambiguïté des réseaux reconstruits) et la façon de les prendre en compte, en particulier par un pré-traitement des données. Finalement, nous illustrons les résultats de ces méthodes de reconstruction sur des données réelles avant de conclure sur leur utilisation dans une méthodologie globale qui intègre des aspects statistiques. / Phylogenetic networks generalize the tree concept to model Evolution, by allowing edges between branches inside the tree to reflect genetic material exchanges between coexisting species. Lots of combinatorial approaches have been designed to reconstruct networks from data extracted from a set of contradictory gene trees. These approaches can be divided into several categories depending on the kind of input, i.e. triplets, quartets, clusters and splits, and on the kind of structure restrictions they impose on reconstructed networks.We particularly analyze the structure of one class of such restricted networks, namely level-k phylogenetic networks, and adapt this level parameter to the unrooted context. We also give new combinatorial methods to reconstruct phylogenetic networks from clusters - implemented in Dendroscope - or quartets. We study the limits of combinatorial methods (complexity explosion, noise and silence in the data, ambiguity in the reconstucted network), and the way to tackle them, in particular with an appropriate data preprocessing. Finally we illustrate the results of these reconstruction methods on a dataset, and we conclude on how to use them in a global methodology which integrates statistical aspects.
|
4 |
Genealogy ReconstructionRiester, Markus 02 July 2010 (has links) (PDF)
Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding.
In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors.
In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.
|
5 |
Le genre Leishmania : structure génétique et mécanismes d'évolution / The Leishmania genus : genetic structure and evolutionary mechanismsEl Baidouri, Fouad 20 December 2012 (has links)
Les protozoaires parasites du genre Leishmania infectent de très nombreuses espèces de mammifères à travers le Monde. Transmis par des insectes vecteurs, ces pathogènes sont endémiques dans une centaine de pays et chez l'Homme l'incidence de la maladie est estimée à environ 2 millions de nouveaux cas chaque année. Les questions qui se posent aujourd'hui sur les modalités de reproduction et d'échanges génétiques chez ces protozoaires sont multiples. Elles ont des prolongements sur l'évolution, l'adaptation à de nouveaux cycles, la virulence, l'identification spécifique, la taxonomie ou la prise en charge thérapeutique.Au cours de ce travail, nous avons d'abord eu pour objectif d'aborder ces questions par une analyse génétique globale de toutes les espèces de Leishmania d'Eurasie et d'Afrique par une approche de type MLSA (MultiLocus Sequence Analysis) à partir d'un jeu de plus de 220 souches provenant de 43 pays. Nous avons ensuite essayé de comprendre en participant à l'analyse de données isoenzymatiques de plus de 2200 souches de Leishmania viscérotropes quelle pouvait être la structuration de ce groupe controversé. Enfin, à partir de données sur un foyer très restreint de l'espèce anthoponotique Leishmania tropica, nous avons contribué à l'analyse des données de caractérisation isoenzymatique de cette population et à l'étude des possibles cycles de transmission.Nos résultats mettent en évidence un certain nombre de points importants. Ils montrent d'abord que les différentes espèces de Leishmania présentes en Afrique et en Eurasie se répartissent en sept groupes distincts, recouvrant partiellement les dix espèces définies jusqu'ici sur des critères essentiellement biochimiques. Le système multilocus que nous avons développé s'avère plus résolutif que celui basé sur les isoenzymes. La mise en place d'un schéma MLST pour l'identification des souches pourrait être une contribution significative à la prise en charge thérapeutique des Leishmanioses de l'Ancien Monde. D'un point de vue taxonomique et épidémiologique, les trois espèces viscérotropes séparées jusqu'ici (L. infantum, L. donovani et L. archibaldi) apparaissent former un continuum génétique (complexe d'espèces). Nos analyses des données isoenzymatiques élargie à 2200 souches vont dans le même sens.Nos résultats montrent également que les sept loci génomiques étudiés par MLSA présentent une congruence phylogénétique remarquable. Ceci est en défaveur d'éventuels échanges génétiques et de recombinaisons entre les différentes espèces, contrairement à ce qui était supposé jusqu'ici. Potentiellement fréquente, l'hybridation inter-spécifique ne contribuerait pourtant pas à l'évolution des génotypes et serait un évènement transitoire, instable, incapable de se fixer dans les génomes. Il sera cependant sans doute indispensable d'explorer de façon exhaustive différents modèles avant d'en tirer des conclusions définitives.Enfin l'étude des corrélations entre structuration génétique et distance géographique révèle à la fois la focalisation de très nombreux génotypes mais également une dispersion très forte de certains autres, sans qu'on puisse proposer pour l'instant d'hypothèse robuste pour rendre compte de ces différences. Dans le même sens, notre analyse de données isoenzymatiques de souches de L. tropica provenant du même micro-foyer palestinien semble montrer une hétérogénéité intra-spécifique qui pourrait reposer malgré la proximité géographique sur des cycles parasitaires différenciés. / Parasitic protozoa of the Leishmania genus infect many mammal species throughout the world. Transmitted by insect vectors, these pathogens are endemic in a hundred countries. In humans, the incidence of the disease is estimated at about 2 million new cases each year. Today, multiple issues arise on the modes of reproduction and genetic exchanges among these protozoa. They have implications on evolution, adaptation to new cycles, virulence, specific identification, taxonomy or therapeutic management.In this work, we first aimed at addressing these issues through a comprehensive genetic analysis of all Leishmania species from Eurasia and Africa, using a MLSA (MultiLocus Sequence Analysis)-based approach, from a dataset of more than 220 strains from 43 countries. We then tried to understand the structuration of this controversial group by participating in the isozyme data analysis of more than 2200 strains of viscerotropic Leishmania. Finally, using data from a very small focus of the anthoponotic species Leishmania tropica, we contributed to the analysis of isozyme characterization of this population and to the study of possible transmission cycles.Our results highlight a number of important points. They first show that the different Leishmania species found in Africa and Eurasia are divided into seven distinct groups, partially overlapping the ten species identified so far mainly on biochemical criteria. The multilocus system we developed is more resolutive than that based on isozymes. Implementing a MLST scheme for strain identification could contribute significantly to the therapeutic management of leishmaniases in the Old World. From taxonomic and epidemiological points of view, three viscerotropic species which were separated so far (L. infantum, L. archibaldi, L. donovani) appear to form a genetic continuum (species complex). Our analyzes of isozyme data extended to 2200 strains are in line with this conclusion. Our results also show a remarkable phylogenetic congruence of the seven genomic loci studied by MLSA. This does not support potential genetic exchanges and recombinations between different species, contrary to what was assumed so far. Potentially frequent, inter-specific hybridization would not contribute to the evolution of genotypes and would be a transient and unstable event, unable to settle in genomes. However, it will probably be necessary to explore different models exhaustively before drawing definitive conclusions.Finally, the study of correlations between geographic distance and genetic structuration revealed the focus of many genotypes but also a very high dispersion of some others, even if no convincing hypothesis can be made to explain such differences. In the same way, our analysis of isozyme data of L. tropica strains from the same Palestinian micro-focus seems to show an intra-specific heterogeneity which could be due to differentiated parasitic cycles, despite the geographical proximity.
|
6 |
Genealogy Reconstruction: Methods and applications in cancer and wild populationsRiester, Markus 23 June 2010 (has links)
Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding.
In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors.
In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v
Acknowledgments vii
1 Introduction 1
2 Cancer Phylogenies 7
2.1 Introduction..................................... 7
2.2 Background..................................... 9
2.2.1 PhylogeneticTrees............................. 9
2.2.2 Microarrays................................. 10
2.3 Methods....................................... 11
2.3.1 Datasetcompilation ............................ 11
2.3.2 Statistical Methods and Analysis..................... 13
2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15
2.4 Results........................................ 16
2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16
2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28
2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30
2.5 Discussion...................................... 32
3 Wild Pedigrees 35
3.1 Introduction..................................... 35
3.2 The molecular ecologist’s tools of the trade ................... 36
3.2.1 3.2.2 3.2.3
3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37
3.2.2 Parentage and paternity inference .................... 39
3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40
3.3 Background..................................... 40
3.3.1 Pedigrees .................................. 40
3.3.2 Genotypes.................................. 41
3.3.3 Mendelian segregation probability .................... 41
3.3.4 LOD Scores................................. 43
3.3.5 Genotyping Errors ............................. 43
3.3.6 IBD coefficients............................... 45
3.3.7 Bayesian MCMC.............................. 46
3.4 Methods....................................... 47
3.4.1 Likelihood Model.............................. 47
3.4.2 Efficient Likelihood Calculation...................... 49
3.4.3 Maximum Likelihood Pedigree ...................... 51
3.4.4 Full siblings................................. 52
3.4.5 Algorithm.................................. 53
3.4.6 Missing Values ............................... 56
3.4.7 Allelefrequencies.............................. 58
3.4.8 Rates of Self-fertilization.......................... 60
3.4.9 Rates of Clonality ............................. 60
3.5 Results........................................ 61
3.5.1 Real Microsatellite Data.......................... 61
3.5.2 Simulated Human Population....................... 62
3.5.3 SimulatedClonalPlantPopulation.................... 64
3.6 Discussion...................................... 71
4 Conclusions 77
A FRANz 79
A.1 Availability ..................................... 79
A.2 Input files...................................... 79
A.2.1 Maininputfile ............................... 79
A.2.2 Knownrelationships ............................ 80
A.2.3 Allele frequencies.............................. 81
A.2.4 Sampling locations............................. 82
A.3 Output files..................................... 83
A.4 Web 2.0 Interface.................................. 86
List of Figures 87
List of Tables 88
List Abbreviations 90
Bibliography 92
Curriculum Vitae I
|
Page generated in 0.0622 seconds