1 |
UNSUPERVISED LEARNING IN PHYLOGENOMIC ANALYSIS OVER THE SPACE OF PHYLOGENETIC TREESKang, Qiwen 01 January 2019 (has links)
A phylogenetic tree is a tree to represent an evolutionary history between species or other entities. Phylogenomics is a new field intersecting phylogenetics and genomics and it is well-known that we need statistical learning methods to handle and analyze a large amount of data which can be generated relatively cheaply with new technologies. Based on the existing Markov models, we introduce a new method, CURatio, to identify outliers in a given gene data set. This method, intrinsically an unsupervised method, can find outliers from thousands or even more genes. This ability to analyze large amounts of genes (even with missing information) makes it unique in many parametric methods. At the same time, the exploration of statistical analysis in high-dimensional space of phylogenetic trees has never stopped, many tree metrics are proposed to statistical methodology. Tropical metric is one of them. We implement a MCMC sampling method to estimate the principal components in a tree space with the tropical metric for achieving dimension reduction and visualizing the result in a 2-D tropical triangle.
|
2 |
Conservation genetics and the Ctenosaura palearis cladePasachnik, Stesha Ann 01 August 2010 (has links)
We are now in the midst of a mass extinction crisis. The top threats to biodiversity include habitat destruction, pollution, over-harvesting, and invasive species. The field of conservation genetics seeks to understand these threats and devise management to preserve taxa with the ability to cope with environmental change. Preserving genetic variation and the processes in which variation is created and maintained is vital to long-term conservation goals. Limited conservation resources are cause for the prioritization of taxa and areas. Nine basic methods of prioritization have been developed. Though there are differences in these methods, and thus in the resulting target areas, many, including biodiversity hotspots, list Mesoamerica, in which the highest diversity of iguanids confined to a single genus, Ctenosaura, occur. Though ctenosaurs are the most diverse genus of iguanas, have the most Redlisted species, lack protection and are in danger of extinction, they have been overlooked. The Ctenosaura palearis complex, occurs in central Mesoamerica and is made up of four endangered species. In order aid in the conservation of this biodiversity, a multi-scale molecular evaluation of this complex was preformed. I first used a species tree approach to elucidate the relationships between the focal species, showing that these species have gone through recent and rapid speciation, resulting in four closely related endemics. Thus, the nominal groupings should be upheld and given individual protection. Second, I evaluated the degree to which gene flow from the widely distributed congener threatens the genetic distinctiveness of the endemic C. bakeri. Low levels of introgression indicated no current threat. Hybridization could increase if habitat destruction or changes in relative abundance increase the probability of interbreeding. Continued monitoring of this situation is justified. Third, I used a variety of population genetic techniques to elucidate the genetic structure within and among populations of C. melanosterna. These results indicate that the populations in the Valle de Aguán and Cayos Cochinos are not interchangeable thus protection of both areas is necessary, and extreme caution should be used when implementing breeding and translocation programs. Local conservation efforts may be evaluated and developed using this information.
|
3 |
Étude de l'histoire évolutive des PI3K et des voies de signalisation associées / Evolutionary history of PI3Ks and related signalling pathwaysPhilippon, Héloïse 05 July 2016 (has links)
L'objectif principal de ma thèse a été la caractérisation de l'histoire évolutive des voies de signalisation au travers d'une double approche: (i) l'analyse phylogénétique de leurs composés; et (ii) l'identification et la caractérisation de leurs interactions par l'analyse des interactomes d'organismes modèles. Or, bien que de nombreux outils soient disponibles pour la reconstruction d'arbres de gènes individuels, peu de méthodes ont été développées pour l'étude d'un ensemble de protéines impliquées dans un même processus cellulaire. Pourtant, au sein de la cellule, la plupart des protéines agissent en interaction avec d'autres protéines. Dans un premier temps, j'ai étudié l'histoire évolutive de la famille des PI3K (Phosphatidylinositol 3-kinases). Cette première analyse phylogénétique détaillée m'a permis de mettre en place une méthodologie applicable aux voies de signalisation. Un problème important rencontré dans cette étude a consisté en la sélection de transcrits alternatifs et ceci m'a conduit à développer un logiciel dédié nommé BATfinder (\Best Aligned Transcript finder). Dans le but d'étudier la voie de signalisation AKT/mTOR, j'ai effectué l'implémentation de la méthodologie validée avec les PI3K. Cette implémentation a pris la forme d'un pipeline automatique nommé EPINe (Easy Phylogenetics for Interaction Networks). Ce pipeline est théoriquement utilisable pour l'analyse phylogénétique de tout réseau métabolique eucaryote / The main goal of my thesis was the characterization of the evolutionary history of signalling pathways through a twofold approach: (i) the phylogenetic analysis of their components; and (ii) the identification and characterization of their interactions by the analysis of model organisms interactomes. While many tools are available for single genes tree reconstruction, only a few methods have been developed for the study of a set of proteins involved in the same cellular process. However, inside the cell, most of proteins interact with others.Initially, I studied the evolutionary history of the PI3K family (Phosphati-dylinositol 3-kinases). This first detailed phylogenetic analysis allowed me to set up a methodology suitable for signalling pathways. One of the important problems encountered in this study was the selection of alternative transcripts and this led me to develop a software called BATfinder (Best Aligned Transcript finder ). In order to study the AKT/mTOR signalling pathway, I have implemented the methodology previously validated with PI3Ks. This implementation was carried out as an automated pipeline called EPINe (Easy Phylogenetics for Interaction Networks). This pipeline is theoretically usable for the phylogenetic analysis of any eukaryotic metabolic network
|
4 |
Algorithmes de construction et correction d'arbres de gènes par la réconciliationLafond, Manuel 08 1900 (has links)
Les gènes, qui servent à encoder les fonctions biologiques des êtres vivants,
forment l'unité moléculaire de base de l'hérédité.
Afin d'expliquer la diversité des espèces que l'on peut observer aujourd'hui,
il est essentiel de comprendre comment les gènes évoluent.
Pour ce faire, on doit recréer le passé en inférant leur phylogénie,
c'est-à-dire un arbre de gènes qui représente les liens
de parenté des régions codantes des vivants.
Les méthodes classiques d'inférence phylogénétique ont été élaborées principalement pour construire des arbres d'espèces et ne se basent que sur les séquences d'ADN.
Les gènes sont toutefois riches en information, et on commence à peine à voir apparaître
des méthodes de reconstruction qui
utilisent leurs propriétés spécifiques. Notamment, l'histoire d'une famille de gènes en terme de duplications et de pertes, obtenue par la réconciliation d'un arbre de gènes avec un arbre d'espèces,
peut nous permettre de détecter des faiblesses au sein d'un arbre et de l'améliorer.
Dans cette thèse, la réconciliation est appliquée
à la construction et la correction d'arbres de gènes sous trois angles différents:
1) Nous abordons la problématique de résoudre un arbre de gènes non-binaire.
En particulier, nous présentons un algorithme en temps linéaire qui résout
une polytomie
en se basant sur la réconciliation.
2) Nous proposons une nouvelle approche de correction d'arbres de gènes par les relations d'orthologie et paralogie.
Des algorithmes en temps polynomial sont présentés pour les problèmes suivants:
corriger un arbre de gènes afin qu'il contienne un ensemble d'orthologues donné, et valider un ensemble de relations partielles d'orthologie et paralogie.
3) Nous montrons comment la réconciliation peut servir à "combiner'' plusieurs arbres de gènes.
Plus précisément, nous étudions le problème de choisir un superarbre de gènes
selon son coût de réconciliation. / Genes encode the biological functions of all living organisms and are the basic molecular units of heredity.
In order to explain
the diversity of species that can be observed today,
it is essential to understand how genes evolve.
To do this, the past has to be recreated by inferring their phylogeny,
i.e. a gene tree depicting the parental relationships between
the coding regions of living beings.
Traditional phylogenetic inference methods have been developed primarily to construct species trees
and are solely based on DNA sequences.
Genes, however, are rich in information and only a few known
reconstruction methods make usage of their specific properties.
In particular, the history of a gene family in terms of duplications and losses,
obtained by the reconciliation of a gene tree with a tree species,
may allow us to detect weaknesses in a tree and improve it.
In this thesis, reconciliation is applied
to the construction and correction of gene trees from three different angles:
1) We address the problem of resolving a non-binary gene tree.
In particular, we present a linear time algorithm that solves
a polytomy based on reconciliation.
2) We propose a new gene tree correction approach based on orthology and paralogy relations.
Polynomial-time algorithms are presented for the following problems:
modify a gene tree so that it contains a given set of orthologous genes,
and validate a set of partial orthology and paralogy relations.
3) We show how reconciliation can be used to "combine'' multiple gene trees.
Specifically, we study the problem of choosing a gene supertree
based on its reconciliation cost.
|
Page generated in 0.7717 seconds