• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 5
  • 1
  • Tagged with
  • 11
  • 11
  • 6
  • 6
  • 6
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Fractionation Statistics

Wang, Baoyong 01 May 2014 (has links)
Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean mu, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r; itself a random variable with distribution pi(.). A biologically more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying pi(r), as a function of mu, and show how sampling l allows us to estimate mu. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving pi(.) analytically, we develop a deterministic recurrence to calculate each pi(r) as a function of mu and the proportion of unreduced paralog pairs. This is based on a computing formula containing nested sums. The parameter mu can be estimated based on run lengths of single-copy regions. We then reduce the computing formulae, at least in the one-sided case, to closed form. This virtually eliminates computing time due to highly nested summations. We formulate a continuous version of the fractionation process, deleting line segments of exponentially distributed lengths in analogy to geometric distributed numbers of genes. We derive nested integrals and discover that the number of previously deleted regions to be skipped by a new deletion event is exactly geometrically distributed. We undertook a large simulation experiment to show how to discriminate between the gene-by-gene duplicate deletion model and the deletion of a geometrically distributed number of genes. This revealed the importance of the effects of genome size N, the mean of the geometric distribution, the progress towards completion of the fractionation process, and whether the data are based on runs of deleted genes or undeleted genes.
2

Fractionation Statistics

Wang, Baoyong January 2014 (has links)
Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean mu, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r; itself a random variable with distribution pi(.). A biologically more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying pi(r), as a function of mu, and show how sampling l allows us to estimate mu. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving pi(.) analytically, we develop a deterministic recurrence to calculate each pi(r) as a function of mu and the proportion of unreduced paralog pairs. This is based on a computing formula containing nested sums. The parameter mu can be estimated based on run lengths of single-copy regions. We then reduce the computing formulae, at least in the one-sided case, to closed form. This virtually eliminates computing time due to highly nested summations. We formulate a continuous version of the fractionation process, deleting line segments of exponentially distributed lengths in analogy to geometric distributed numbers of genes. We derive nested integrals and discover that the number of previously deleted regions to be skipped by a new deletion event is exactly geometrically distributed. We undertook a large simulation experiment to show how to discriminate between the gene-by-gene duplicate deletion model and the deletion of a geometrically distributed number of genes. This revealed the importance of the effects of genome size N, the mean of the geometric distribution, the progress towards completion of the fractionation process, and whether the data are based on runs of deleted genes or undeleted genes.
3

Fractionation in the Evolution of Syntenic Homology in Coffea Arabica

Yu, Zhe 13 August 2021 (has links)
Gene loss is the obverse of novel gene acquisition by a genome through a variety of evolutionary processes. It serves a number of functional and structural roles, compensating for the energy and material costs of gene complement expansion. A type of gene loss widespread in the lineages of plant genomes is ``fractionation" after whole genome doubling or tripling, where one of a pair or triplet of paralogous genes in parallel syntenic contexts is discarded. Based on previous mathematical work on the distribution of gap sizes caused by fractionation in synteny blocks, we studied fractionation in the evolutionary history of the allotetraploid Coffea arabica (CA) and its two diploid progenitors, C. canephora (CC) and C. eugenioides (CE), annotated genome assemblies being provided by the Arabica Coffee Genome Consortium. By taking advantage of synteny blocks produced by SynMap, we studied the fractionation process after speciation and tetraploidization events, including visualization and modelling the distribution of deletion segments, and mechanisms of deletion events. We also expanded the research to eight other plant species to verify the dominance of DNA excision over pseudigenization during the fractionation and other gene loss.
4

Network analyses of proteome evolution and diversity

Coulombe-Huntington, Jasmin 12 March 2016 (has links)
The mapping of biomolecular interactions reveals that the function of most biological components depends on a web of interrelations with other cellular components, stressing the need for a systems-level view of biological functions. In this work, I explore ways in which the integration of network and genomic information from different organizational levels can lead to a better understanding of cellular systems and components. First, studying yeast, I show that the evolutionary properties of target genes constitute the dominant determinant of transcription factor (TF) evolutionary rate and that this evolutionary modularity is limited to activating regulatory relationships. I also show that targets of fast-evolving TFs show greater evolutionary expression changes and are enriched for niche-specific functions and other TFs. This work highlights the importance of trans-regulatory network evolution in species-specific gene expression and network adaptation. Next, I show that genes either lost or gained across fungal evolution are enriched in TFs and have very different network and genomic properties than universally conserved genes, including, in sharp contrast to other networks, a greater number of transcriptional regulators. Placing genes in the context of their evolutionary life-cycle reveals principles of network integration of gained genes and evidence for the progressive network and functional marginalization of genes as an evolutionary process preceding gene loss. In the final chapter, I study how alternative splicing (AS)-driven expansion of human proteome diversity leads to system-level complexity through the AS-mediated rewiring of the protein-protein interaction network. By overlaying different network and genomic datasets onto the first large-scale isoform-resolution interactome, I found that differentiating between splice variants is essential to capturing the full extent of the network's functional modularity. I also discovered that AS-mediated rewiring preferentially affects tissue-specific genes and that topologically different patterns of rewiring have distinct functional consequences. Furthermore, I found that most rewiring can be traced to the AS of evolutionarily conserved sequence modules, which promote or block interactions and tend to overlap linear motifs and disrupt known domain-domain interactions. Together, this work demonstrates that a network-level perspective and genomic data integration are essential to understanding the evolution and functional diversity of proteomes.
5

Étude du processus de perte de gènes et de pseudogénisation. Intégration et informatisation des concepts de l’évolution biologique. Application à la lignée humaine depuis l'origine des Eucaryotes

Dainat, Jacques 16 October 2012 (has links)
La biologie a connu une extraordinaire révolution avec l'arrivée de nombreux génomes entièrement séquencés. L'analyse de la quantité d'informations disponibles nécessite la création et l'utilisation d'outils informatiques automatisés. L'interprétation des données biologiques prend tout son sens à la lumière de l'évolution. En ce sens, les études évolutives sont incontestablement nécessaires pour donner un sens aux données biologiques. Dans ce contexte, le laboratoire développe des outils pour étudier l'évolution des génomes (et protéomes) à travers les mutations subies. Cette thèse porte sur l'étude spécifique des événements de pertes de gènes unitaires. Ces événements peuvent révéler des pertes de fonctions très instructives pour comprendre l'évolution des espèces. En premier lieu, j'ai développé l'outil GLADX qui mime l'expertise humaine afin d'étudier automatiquement et avec précision les événements de pertes de gènes unitaires. Ces études se basent sur la création et l'interprétation de données phylogénétiques, de BLAST, de prédictions protéiques, etc., dans un contexte automatisé. Ensuite, j'ai développé une stratégie utilisant l'outil GLADX pour étudier à grande échelle les pertes de gènes unitaires au cours de l'évolution du protéome humain. La stratégie utilise d'abord comme filtre l'analyse de groupes d'orthologues fabriqués par un outil de clustérisation à partir du protéome complet de nombreuses espèces. Cette analyse a permis de détecter 6237 pertes de gènes unitaires putatives dans la lignée humaine. L'étude approfondie de ces pertes avec GLADX a mis en évidence de nombreux problèmes liés à la qualité des données disponibles dans les bases de données. / Biology has undergone an extraordinary revolution with the appearance of numerous whole genomes sequenced. Analysis of the amount of information available requires creation and use of automated tools. The interpretation of biological data becomes meaningful in light of evolution. In view of all this, evolutionary studies are undoubtedly necessary to highlight the biological data. In this context, the laboratory develops tools to study the genomes (and proteomes) evolution through all the undergone mutations. The project of this thesis focuses specifically on the events of unitary gene losses. These events may reveal loss of functions very instructive for understanding the evolution of species. First, I developed the GLADX tool that mimics human expertise to automatically and accurately investigate the events of unitary gene losses. These studies are based on the creation and interpretation of phylogenetic data, BLAST, predictions of protein, etc., in an automated environment. Secondly, I developed a strategy using GLADX tool to study, at large-scale, the loss of unitary genes during the evolution of the human proteome. The strategy uses, in the first step, the analysis of orthologous groups produced by a clustering tool from complete proteomes of numerous species. This analysis used as a filter, allowed detecting 6237 putative losses in the human lineage. The study of these unitary gene loss cases has been deepened with GLADX and allowed to highlight many problems with the quality of available data in databases.
6

Dynamics of the Bacterial Genome : Rates and Mechanisms of Mutation

Koskiniemi, Sanna January 2010 (has links)
Bacterial chromosomes are highly dynamic, continuously changing with respect to gene content and size via a number of processes, including deletions that result in gene loss. How deletions form and at what rates has been the focus of this thesis. In paper II we investigated how chromosomal location affects chromosomal deletion rates in S. typhimurium. Deletion rates varied more than 100-fold between different chromosomal locations and some large deletions significantly increased the exponential growth rate of the cells. Our results suggest that the chromosome is heterogeneous with respect to deletion rates and that deletions may be genetically fixed as a consequence of natural selection rather than by drift or mutational biases. In paper I we examined in a laboratory setting how rapidly reductive evolution, i.e. gene loss, could occur. Using a serial passage approach, we showed that extensive genome reduction potentially could occur on a very short evolutionary time scale. For most deletions we observed little or no homology at the deletion endpoints, indicating that spontaneous deletions often form through a RecA independent process. In paper III we examined further how large spontaneous deletions form and, unexpectedly, showed that 90% of all spontaneous chromosomal deletions required error-prone translesion DNA polymerases for their formation. We propose that the translesion polymerases stimulate deletion formation by allowing extension of misaligned single-strand DNA ends. In paper IV we investigated how the translesion DNA polymerase Pol IV, RpoS and different types of stresses affect mutation rates in bacteria. Derepression of the LexA regulon caused a small to moderate increase in mutation rates that was fully dependent on functional endonucleases but only partly dependent on translesion DNA polymerases. RpoS levels and growth stresses had only minor effects on mutation rates. Thus, mutation rates appear very robust and are only weakly affected by growth conditions and induction of translesion polymerases and RpoS.
7

Approches algorithmiques pour l’inférence d’histoires de duplication en tandem avec inversions et délétions pour des familles multigéniques

Lajoie, Mathieu 08 1900 (has links)
[Français] Une fraction importante des génomes eucaryotes est constituée de Gènes Répétés en Tandem (GRT). Un mécanisme fondamental dans l’évolution des GRT est la recombinaison inégale durant la méiose, entrainant la duplication locale (en tandem) de segments chromosomiques contenant un ou plusieurs gènes adjacents. Différents algorithmes ont été proposés pour inférer une histoire de duplication en tandem pour un cluster de GRT. Cependant, leur utilisation est limitée dans la pratique, car ils ne tiennent pas compte d’autres événements évolutifs pourtant fréquents, comme les inversions, les duplications inversées et les délétions. Cette thèse propose différentes approches algorithmiques permettant d’intégrer ces événements dans le modèle de duplication en tandem classique. Nos contributions sont les suivantes: • Intégrer les inversions dans un modèle de duplication en tandem simple (duplication d’un gène à la fois) et proposer un algorithme exact permettant de calculer le nombre minimal d’inversions s’étant produites dans l’évolution d’un cluster de GRT. • Généraliser ce modèle pour l’étude d’un ensemble de clusters orthologues dans plusieurs espèces. • Proposer un algorithme permettant d’inférer l’histoire évolutive d’un cluster de GRT en tenant compte des duplications en tandem, duplications inversées, inversions et délétions de segments chromosomiques contenant un ou plusieurs gènes adjacents. / [English] Tandemly arrayed genes (TAGs) represent an important fraction of most genomes. A fundamental mechanism at the origin of TAG clusters is unequal crossing-over during meiosis, leading to the duplication of chromosomal segments containing one or many adjacent genes. Such duplications are called tandem duplications, as the duplicated segment is placed next to the original one on the chromosome. Different algorithms have been proposed to infer the tandem duplication history of a TAG cluster. However, their applicability is limited in practice since they do not take into account other frequent evolutionary events such as inversion, inverted duplication and deletion. In this thesis, we propose different algorithmic approaches allowing to integrate these evolutionary events in the original tandem duplication model of evolution. Our contributions are summarized as follows: • We integrate inversion events in a tandem duplication model restricted to single gene duplications, and we propose an exact algorithm allowing to compute the minimum number of inversions explaining the evolution of a TAG cluster. • We generalize this model to the study of orthologous TAG clusters in different species. • We propose an algorithm allowing to infer the evolutionary history of a TAG cluster through tandem duplication, inverted duplication, inversion and deletion of chromosomal segments containing one or many adjacent genes.
8

Approches algorithmiques pour l’inférence d’histoires de duplication en tandem avec inversions et délétions pour des familles multigéniques

Lajoie, Mathieu 08 1900 (has links)
[Français] Une fraction importante des génomes eucaryotes est constituée de Gènes Répétés en Tandem (GRT). Un mécanisme fondamental dans l’évolution des GRT est la recombinaison inégale durant la méiose, entrainant la duplication locale (en tandem) de segments chromosomiques contenant un ou plusieurs gènes adjacents. Différents algorithmes ont été proposés pour inférer une histoire de duplication en tandem pour un cluster de GRT. Cependant, leur utilisation est limitée dans la pratique, car ils ne tiennent pas compte d’autres événements évolutifs pourtant fréquents, comme les inversions, les duplications inversées et les délétions. Cette thèse propose différentes approches algorithmiques permettant d’intégrer ces événements dans le modèle de duplication en tandem classique. Nos contributions sont les suivantes: • Intégrer les inversions dans un modèle de duplication en tandem simple (duplication d’un gène à la fois) et proposer un algorithme exact permettant de calculer le nombre minimal d’inversions s’étant produites dans l’évolution d’un cluster de GRT. • Généraliser ce modèle pour l’étude d’un ensemble de clusters orthologues dans plusieurs espèces. • Proposer un algorithme permettant d’inférer l’histoire évolutive d’un cluster de GRT en tenant compte des duplications en tandem, duplications inversées, inversions et délétions de segments chromosomiques contenant un ou plusieurs gènes adjacents. / [English] Tandemly arrayed genes (TAGs) represent an important fraction of most genomes. A fundamental mechanism at the origin of TAG clusters is unequal crossing-over during meiosis, leading to the duplication of chromosomal segments containing one or many adjacent genes. Such duplications are called tandem duplications, as the duplicated segment is placed next to the original one on the chromosome. Different algorithms have been proposed to infer the tandem duplication history of a TAG cluster. However, their applicability is limited in practice since they do not take into account other frequent evolutionary events such as inversion, inverted duplication and deletion. In this thesis, we propose different algorithmic approaches allowing to integrate these evolutionary events in the original tandem duplication model of evolution. Our contributions are summarized as follows: • We integrate inversion events in a tandem duplication model restricted to single gene duplications, and we propose an exact algorithm allowing to compute the minimum number of inversions explaining the evolution of a TAG cluster. • We generalize this model to the study of orthologous TAG clusters in different species. • We propose an algorithm allowing to infer the evolutionary history of a TAG cluster through tandem duplication, inverted duplication, inversion and deletion of chromosomal segments containing one or many adjacent genes.
9

Évolution des chromosomes sexuels chez les plantes : développements méthodologiques et analyses de données NGS de Silènes / Sex chromosome evolution in plants : methodological developments and NGS data analysis in the Silene genus

Muyle, Aline 03 September 2015 (has links)
Malgré leur importance dans le déterminisme du sexe chez de nombreux organismes, les chromosomes sexuels ont été étudiés chez quelques espèces seulement du fait du manque de séquences disponibles. En effet, le séquençage et l'assemblage des chromosomes sexuels est rendu très difficile par leurs abondantes séquences répétées. Durant cette thèse, une méthode probabiliste a été développée pour inférer les gènes liés au sexe à partir de données RNA-seq chez une famille. Des tests de cette méthode appelée SEX-DETector sur des données réelles et simulées suggèrent qu'elle fonctionnera sur une grande variété de systèmes. La méthode a inféré ∼1300 gènes liés au sexe chez Silene latifolia, une plante dioïque qui possède des chromosomes sexuels XY pour lesquels quelques données de séquence sont disponibles (dont certaines obtenues lors de cette thèse par séquençage de BACs). Les gènes du Y sont moins exprimés que ceux du X chez S. latifolia, mais le statut de la compensation de dosage (un mécanisme qui corrige la sous-expression des gènes liés au sexe chez les males) est encore controversé. L'analyse des nouveaux gènes liés au sexe inférés par SEX-DETector a permis de confirmer la compensation de dosage chez S. latifolia, qui est effectuée par la surexpression du X maternel, possiblement via un mécanisme epigénétique d'empreinte. Les données ont également été utilisées pour étudier l'évolution de l'expression biaisée pour le sexe chez S. latifolia et ont révélé que la majorité des changements de niveaux d'expression ont eu lieu chez les femelles. Les implications de nos résultats concernant l'évolution de la dioécie et des chromosomes sexuels sont discutés / In many organisms, sexes are determined by sex chromosomes. However, studies have been greatly limited by the paucity of sex chromosome sequences. Indeed, sequencing and assembling sex chromosomes are very challenging due to the large quantity of repetitive DNA that these chromosomes comprise. In this PhD, a probabilistic method was developed to infer sex-linked genes from RNA-seq data of a family (parents and progeny of each sex). The method, called SEX-DETector, was tested on simulated and real data and should performwell on a wide variety of sex chomosome systems. This new method was applied to Silene latifolia, a dioecious plant with XY system, for which partial sequence data on sex chromosomes are available (some of which obtained during this PhD by BAC sequencing), SEX-DETector returned ∼1300 sex-linked genes. In S. latifolia, Y genes are less expressed than their X counterparts. Dosage compensation (a mechanism that corrects for reduced dosage due to Y degeneration in males) was previously tested in S. latifolia, but different studies returned conflicting results. The analysis of the new set of sex-linked genes confirmed the existence of dosage compensation in S. latifolia, which seems to be achieved by the hyperexpression of the maternal X chromosome in males. An imprinting mechanism might underlie dosage compensation in that species. The RNAseq datawere also used to study the evolution of differential expression among sexes in S. latifolia, and revealed that in this species most changes have affected the female sex. The implications of our results for the evolution of dioecy and sex chromosomes in plants are discussed
10

Algorithmes pour la réconciliation d’un arbre de gènes avec un arbre d’espèces

Doyon, Jean-Philippe 04 1900 (has links)
Une réconciliation entre un arbre de gènes et un arbre d’espèces décrit une histoire d’évolution des gènes homologues en termes de duplications et pertes de gènes. Pour inférer une réconciliation pour un arbre de gènes et un arbre d’espèces, la parcimonie est généralement utilisée selon le nombre de duplications et/ou de pertes. Les modèles de réconciliation sont basés sur des critères probabilistes ou combinatoires. Le premier article définit un modèle combinatoire simple et général où les duplications et les pertes sont clairement identifiées et la réconciliation parcimonieuse n’est pas la seule considérée. Une architecture de toutes les réconciliations est définie et des algorithmes efficaces (soit de dénombrement, de génération aléatoire et d’exploration) sont développés pour étudier les propriétés combinatoires de l’espace de toutes les réconciliations ou seulement les plus parcimonieuses. Basée sur le processus classique nommé naissance-et-mort, un algorithme qui calcule la vraisemblance d’une réconciliation a récemment été proposé. Le deuxième article utilise cet algorithme avec les outils combinatoires décrits ci-haut pour calculer efficacement (soit approximativement ou exactement) les probabilités postérieures des réconciliations localisées dans le sous-espace considéré. Basé sur des taux réalistes (selon un modèle probabiliste) de duplication et de perte et sur des données réelles/simulées de familles de champignons, nos résultats suggèrent que la masse probabiliste de toute l’espace des réconciliations est principalement localisée autour des réconciliations parcimonieuses. Dans un contexte d’approximation de la probabilité d’une réconciliation, notre approche est une alternative intéressante face aux méthodes MCMC et peut être meilleure qu’une approche sophistiquée, efficace et exacte pour calculer la probabilité d’une réconciliation donnée. Le problème nommé Gene Tree Parsimony (GTP) est d’inférer un arbre d’espèces qui minimise le nombre de duplications et/ou de pertes pour un ensemble d’arbres de gènes. Basé sur une approche qui explore tout l’espace des arbres d’espèces pour les génomes considérés et un calcul efficace des coûts de réconciliation, le troisième article décrit un algorithme de Branch-and-Bound pour résoudre de façon exacte le problème GTP. Lorsque le nombre de taxa est trop grand, notre algorithme peut facilement considérer des relations prédéfinies entre ensembles de taxa. Nous avons testé notre algorithme sur des familles de gènes de 29 eucaryotes. / A reconciliation between a gene tree and a species tree depicts an evolutionary scenario of the homologous genes in terms of gene duplications and gene losses. To infer such a reconciliation given a gene tree and a species tree, parsimony is generally used according to the number of gene duplications and/or losses. The combinatorial models of reconciliation are based on probabilistic or combinatorial criteria. The first paper defines a simple and more general combinatorial model of reconciliation which clearly identifies duplication and loss events and does not only induce the most parsimonious reconciliation. An architecture of all possible reconciliations is developed together with efficient algorithms (that is counting, randomization, and exploration) to study combinatorial properties of the space of all reconciliations or only the most parsimonious ones. Based on the classical birth-death process, an algorithm that computes the likelihood of a reconciliation has recently been proposed. The second paper uses this algorithm together with the combinatorial tools described above to compute efficiently, either exactly or approximately, the posterior probability of the reconciliations located in the considered subspace. Based on realistic gene duplication and loss rates and on real/simulated datasets of fungal gene families, our results suggest that the probability mass of the whole space of reconciliations is mostly located around the most parsimonious ones. In the context of posterior probability approximation, our approach is a valuable alternative to a MCMC method and can competes against a sophisticated, efficient, and exact computation of the probability of a given reconciliation. The Gene Tree Parsimony (GTP) problem is to infer a species tree that minimizes the number of duplications and/or losses over a set of gene family trees. Based on a new approch that explores the whole species tree space for the considered taxa and an efficient computation of the reconciliation cost, the third paper describes a Branch-and- Bound algorithm that solves exactly the GTP problem. When the considered number of taxa is too large, our algorithm can naturally take into account predefined relationships between sets of taxa. We test our algorithm on a dataset of eukaryotic gene families spanning 29 taxa.

Page generated in 0.441 seconds