• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 3
  • 1
  • Tagged with
  • 22
  • 9
  • 8
  • 7
  • 7
  • 7
  • 7
  • 7
  • 7
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

High quality gene annotation for deep phylogenetic analysis

Indrischek, Henrike 27 August 2018 (has links)
Gene prediction in newly sequenced genomes is a known challenging. Although sophisticated comparative pipelines are available, computationally derived gene models are often less than perfect. This is particularly true when multiple very similar paralogs are present. The issue is aggravated further when genomes are assembled only at a preliminary draft level to contigs or short scaffolds rather than to chromosomes. However, these genomes deliver valuable information for studying gene families. High accuracy models of protein-coding genes are needed in particular for phylogenetics and for the analysis of gene family histories. In this dissertation, I established a tool, the ExonMatchSolver-pipeline (EMS-pipeline), that can assist the assembly of genes distributed across multiple fragments (e.g. contigs). The tool in particular tackles the problem of identifying those coding exon groups that belong to the same paralogous genes in a fragmented genome assembly. The EMS-pipeline accommodates a homology search step with a protein input set consisting of several highly similar paralogs as query. The core of the pipeline uses an Integer Linear Programming Implementation to solve the paralog-to-contig assignment problem. An extension to the initial implementation estimates the number of paralogs encoded in the target genome and can handle several paralogs that are situated on the same genomic fragment. The EMS-pipeline was successfully applied to simulated data, several showcase examples and to deuterostome genomes in a large scale study on the evolution of the arrestin protein family. Especially at high genome fragmentation levels, the tool outperformed a naive assignment method. Arrestins are key signaling transducers that bind to activated and phosphorylated G protein-coupled receptors and can mediate their endocytosis into the cell. The refined annotations of arrestins resulting from the application of the EMS-pipeline are more complete and accurate in comparison to a conventional database search strategy. With the applied strategy it was possible to map the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes in detail. My results support the emergence of the four arrestin paralogs from a visual and a non-visual proto-arrestin. Surprisingly, the visual ARR3 was lost in the mammalian clades afrotherians and xenarthrans. Segmental duplications in specific clades and the 3R-WGD in the teleost stem lineage, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. The four vertebrate orthology groups show an interesting pattern of divergence of three endocytosis motifs: the minor and major clathrin binding site and the adapter protein-2 (AP-2) binding motif. Identification of such signatures, of residues that determine specificity between paralogs and are positively selected after duplication was made possible by high quality alignments obtained by genome inquiries, dense species sampling and consideration of fragmented loci from poorly assembled genomes in the framework of the EMS-pipeline, that was established in this dissertation.:1 Introduction 2 1.1 Basics and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 What is a gene? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 What is a tree in phylogenetics? . . . . . . . . . . . . . . . . . . 3 1.1.3 What are paralogs and orthologs? . . . . . . . . . . . . . . . . . 4 1.1.4 Central dogma in molecular biology: From DNA to protein . . 5 1.2 Gene duplications as evolutionary playground . . . . . . . . . . . . . . 12 1.2.1 Mechanisms of gene duplication . . . . . . . . . . . . . . . . . . 13 1.2.2 Evolutionary fate of duplicated genes . . . . . . . . . . . . . . . 14 1.3 Identification and annotation of protein homologs . . . . . . . . . . . . 15 1.3.1 Challenges of existing resources . . . . . . . . . . . . . . . . . . 16 1.3.2 Similarity search approaches without consideration of the gene structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.3 Gene structure aware gene annotation approaches . . . . . . . . 19 1.3.4 Graph-based inference of orthology relationships . . . . . . . . 21 1.3.5 Chance and challenge of fragmented assemblies . . . . . . . . . 21 1.4 Applied phylogenetic methods . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.1 Phylogenetic inference in a nutshell . . . . . . . . . . . . . . . . 23 1.4.2 Inference of natural selection in inter-species data sets . . . . . 29 1.4.3 Detection of specificity determining positions . . . . . . . . . . 32 1.5 Multi-talents in cell signaling: The cytosolic arrestin proteins . . . . . . 34 1.5.1 Functions of arrestins in cell signaling . . . . . . . . . . . . . . . 34 1.5.2 Arrestin activation by GPCR binding . . . . . . . . . . . . . . . 36 1.5.3 Functions of arrestins in cellular trafficking . . . . . . . . . . . . 37 1.5.4 Evolution of arrestins . . . . . . . . . . . . . . . . . . . . . . . . 39 2 The ExonMatchSolver-pipeline 42 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.1 Pipeline overview . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.2.2 Exon assembly as an assignment problem . . . . . . . . . . . . . 43 2.2.3 Solving the Paralog-to-Contig Assignment Problem . . . . . . . 46 2.2.4 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.5 Implementation and usage . . . . . . . . . . . . . . . . . . . . . 48 2.2.6 Performance assessment by simulations . . . . . . . . . . . . . . 50 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.3.1 Performance on simulated data . . . . . . . . . . . . . . . . . . . 50 2.3.2 Performance on real data - Two Showcase Examples . . . . . . . 51 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3 Evolution of the arrestin protein family in deuterostomes 61 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.1 Database scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.2 Detailed gene annotation . . . . . . . . . . . . . . . . . . . . . . 63 3.2.3 Data resources used in the current study . . . . . . . . . . . . . 64 3.2.4 Alignment and building of phylogenetic trees . . . . . . . . . . 64 3.2.5 Identification of specificity determining positions . . . . . . . . 65 3.2.6 Testing for natural selection . . . . . . . . . . . . . . . . . . . . . 66 3.2.7 Assessement of conservation . . . . . . . . . . . . . . . . . . . . 66 3.2.8 Parsimonious reconstruction of exon gain and loss events . . . 67 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.1 Evolution of the arrestin fold family based on database inquiries 67 3.3.2 The refined arrestin annotations are more complete than database entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.3 Arrestin paralog gain and loss patterns based on the refined annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.4 Evolution of arrestin functional elements . . . . . . . . . . . . . 88 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.4.1 Limitation of arrestin database annotations . . . . . . . . . . . . 96 3.4.2 Arrestins in early vertebrate evolution . . . . . . . . . . . . . . . 98 3.4.3 Sub- and neofunctionalization as consequence of the 3R-WGD . 102 3.4.4 Independent arrestin duplications in deuterostomes . . . . . . . 104 3.4.5 Loss of arrestin paralogs in different vertebrate orders . . . . . 106 3.4.6 Previously unknown interaction partners and isoforms . . . . . 108 4 Improvements on the ExonMatchSolver-pipeline 110 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.2.1 Estimation of the paralog number . . . . . . . . . . . . . . . . . 111 4.2.2 Subdivision of gene loci on the same contig . . . . . . . . . . . . 113 4.2.3 Implementation details . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.4 Assessment of the ExonMatchSolver-pipeline Version 2 . . . 115 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5 Conclusion and Outlook 119 A Additional figures 123 B Additional tables 134 C CV 152 Bibliography 156
12

The Orthology Road: Theory and Methods in Orthology Analysis

Hernandez Rosales, Maribel 09 June 2013 (has links)
The evolution of biological species depends on changes in genes. Among these changes are the gradual accumulation of DNA mutations, insertions and deletions, duplication of genes, movements of genes within and between chromosomes, gene losses and gene transfer. As two populations of the same species evolve independently, they will eventually become reproductively isolated and become two distinct species. The evolutionary history of a set of related species through the repeated occurrence of this speciation process can be represented as a tree-like structure, called a phylogenetic tree or a species tree. Since duplicated genes in a single species also independently accumulate point mutations, insertions and deletions, they drift apart in composition in the same way as genes in two related species. The divergence of all the genes descended from a single gene in an ancestral species can also be represented as a tree, a gene tree that takes into account both speciation and duplication events. In order to reconstruct the evolutionary history from the study of extant species, we use sets of similar genes, with relatively high degree of DNA similarity and usually with some functional resemblance, that appear to have been derived from a common ancestor. The degree of similarity among different instances of the “same gene” in different species can be used to explore their evolutionary history via the reconstruction of gene family histories, namely gene trees. Orthology refers specifically to the relationship between two genes that arose by a speciation event, recent or remote, rather than duplication. Comparing orthologous genes is essential to the correct reconstruction of species trees, so that detecting and identifying orthologous genes is an important problem, and a longstanding challenge, in comparative and evolutionary genomics as well as phylogenetics. A variety of orthology detection methods have been devised in recent years. Although many of these methods are dependent on generating gene and/or species trees, it has been shown that orthology can be estimated at acceptable levels of accuracy without having to infer gene trees and/or reconciling gene trees with species trees. Therefore, there is good reason to look at the connection of trees and orthology from a different angle: How much information about the gene tree, the species tree, and their reconciliation is already contained in the orthology relation among genes? Intriguingly, a solution to the first part of this question has already been given by Boecker and Dress [Boecker and Dress, 1998] in a different context. In particular, they completely characterized certain maps which they called symbolic ultrametrics. Semple and Steel [Semple and Steel, 2003] then presented an algorithm that can be used to reconstruct a phylogenetic tree from any given symbolic ultrametric. In this thesis we investigate a new characterization of orthology relations, based on symbolic ultramterics for recovering the gene tree. According to Fitch’s definition [Fitch, 2000], two genes are (co-)orthologous if their last common ancestor in the gene tree represents a speciation event. On the other hand, when their last common ancestor is a duplication event, the genes are paralogs. The orthology relation on a set of genes is therefore determined by the gene tree and an “event labeling” that identifies each interior vertex of that tree as either a duplication or a speciation event. In the context of analyzing orthology data, the problem of reconciling event-labeled gene trees with a species tree appears as a variant of the reconciliation problem where genes trees have no labels in their internal vertices. When reconciling a gene tree with a species tree, it can be assumed that the species tree is correct or, in the case of a unknown species tree, it can be inferred. Therefore it is crucial to know for a given gene tree whether there even exists a species tree. In this thesis we characterize event-labelled gene trees for which a species tree exists and species trees to which event-labelled gene trees can be mapped. Reconciliation methods are not always the best options for detecting orthology. A fundamental problem is that, aside from multicellular eukaryotes, evolution does not seem to have conformed to the descent-with-modification model that gives rise to tree-like phylogenies. Examples include many cases of prokaryotes and viruses whose evolution involved horizontal gene transfer. To treat the problem of distinguishing orthology and paralogy within a more general framework, graph-based methods have been proposed to detect and differentiate among evolutionary relationships of genes in those organisms. In this work we introduce a measure of orthology that can be used to test graph-based methods and reconciliation methods that detect orthology. Using these results a new algorithm BOTTOM-UP to determine whether a map from the set of vertices of a tree to a set of events is a symbolic ultrametric or not is devised. Additioanlly, a simulation environment designed to generate large gene families with complex duplication histories on which reconstruction algorithms can be tested and software tools can be benchmarked is presented.
13

Gene Family Histories: Theory and Algorithms

Schaller, David 02 November 2021 (has links)
Detailed gene family histories and reconciliations with species trees are a prerequisite for studying associations between genetic and phenotypic innovations. Even though the true evolutionary scenarios are usually unknown, they impose certain constraints on the mathematical structure of data obtained from simple yes/no questions in pairwise comparisons of gene sequences. Recent advances in this field have led to the development of methods for reconstructing (aspects of) the scenarios on the basis of such relation data, which can most naturally be represented by graphs on the set of considered genes. We provide here novel characterizations of best match graphs (BMGs) which capture the notion of (reciprocal) best hits based on sequence similarities. BMGs provide the basis for the detection of orthologous genes (genes that diverged after a speciation event). There are two main sources of error in pipelines for orthology inference based on BMGs. Firstly, measurement errors in the estimation of best matches from sequence similarity in general lead to violations of the characteristic properties of BMGs. The second issue concerns the reconstruction of the orthology relation from a BMG. We show how to correct estimated BMG to mathematically valid ones and how much information about orthologs is contained in BMGs. We then discuss implicit methods for horizontal gene transfer (HGT) inference that focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of an undirected graph, the later-divergence-time (LDT) graph. We explore the mathematical structure of LDT graphs and show how much information about all HGT events is contained in such LDT graphs.
14

A novel approach to infer orthologs and produce gene annotations at scale

Kirilenko, Bogdan 21 October 2022 (has links)
Aufgrund von Fortschritten im Bereich der DNA-Sequenzierung hat die Anzahl verfügbarer Genome in den letzten Jahrzehnten rapide zugenommen. Tausende bereits heute zur Verfügung stehende Genome ermöglichen detaillierte vergleichende Analysen, welche für die Beantwortung relevanter Fragestellungen essentiell sind. Dies betrifft die Assoziation von Genotyp und Phänotyp, die Erforschung der Besonderheiten komplexer Proteine und die Weiterentwicklung medizinischer Anwendungen. Um all diese Fragen zu beantworten ist es notwendig, proteinkodierende Gene in neu sequenzierten Genomen zu annotieren und ihre Homologieverhältnisse zu bestimmen. Die bestehenden Methoden der Genomanalyse sind jedoch nicht für Menge heutzutage anfallender Datenmengen ausgelegt. Daher ist die zentrale Herausforderung in der vergleichenden Genomik nicht die Anzahl der verfügbaren Genome, sondern die Entwicklung neuer Methoden zur Datenanalyse im Hochdurchsatz. Um diese Probleme zu adressieren, schlage ich ein neues Paradigma der Annotation von Genomen und der Inferenz von Homologieverhältnissen vor, welches auf dem Alignment gesamter Genome basiert. Während die derzeit angewendeten Methoden zur Gen-Annotation und Bestimmung der Homologie ausschließlich auf codierenden Sequenzen beruhen, könnten durch die Einbeziehung des umgebenden neutral evolvierenden genomischen Kontextes bessere und vollständigere Annotationen vorgenommen werden. Die Verwendung von Genom-Alignments ermöglicht eine beliebige Skalierung der vorgeschlagenen Methodik auf Tausende Genome. In dieser Arbeit stelle ich TOGA (Tool to infer Orthologs from Genome Alignments) vor, eine bioinformatische Methode, welche dieses Konzept implementiert und Homologie- Klassifizierung und Gen-Annotation in einer einzelnen Pipeline kombiniert. TOGA verwendet Machine-Learning, um Orthologe von Paralogen basierend auf dem Alignment von intronischer und intergener Regionen zu unterscheiden. Die Ergebnisse des Benchmarkings zeigen, dass TOGA die herkömmlichen Ansätze innerhalb der Placentalia übertrifft. TOGA klassifiziert Homologieverhältnisse mit hoher Präzision und identifiziert zuverlässig inaktivierte Gene als solchet. Frühere Versionen von TOGA fanden in mehreren Studien Anwendung und wurden in zwei Publikationen verwendet. Außerdem wurde TOGA erfolgreich zur Annotation von 500 Säugetiergeenomen verwendet, dies ist der bisher umfangreichste solche Datensatz. Diese Ergebnisse zeigen, dass TOGA das Potenzial hat, sich zu einer etablierten Methode zur Gen-Annotation zu entwickeln und die derzeit angewandten Techniken zu ergänzen.
15

Identification Of Functionally Orthologous Protein Groups In Different Species Based On Protein Network Alignment

Yaveroglu, Omer Nebil 01 September 2010 (has links) (PDF)
In this study, an algorithm named ClustOrth is proposed for determining and matching functionally orthologous protein clusters in different species. The algorithm requires protein interaction networks of the organisms to be compared and GO terms of the proteins in these interaction networks as prior information. After determining the functionally related protein groups using the Repeated Random Walks algorithm, the method maps the identified protein groups according to the similarity metric defined. In order to evaluate the similarities of protein groups, graph theoretical information is used together with the context information about the proteins. The clusters are aligned using GO-Term-based protein similarity measures defined in previous studies. These alignments are used to evaluate cluster similarities by defining a cluster similarity metric from protein similarities. The top scoring cluster alignments are considered as orthologous. Several data sources providing orthology information have shown that the defined cluster similarity metric can be used to make inferences about the orthological relevance of protein groups. Comparison with a protein orthology prediction algorithm named ISORANK also showed that the ClustOrth algorithm is successful in determining orthologies between proteins. However, the cluster similarity metric is too strict and many cluster matches are not able to produce high scores for this metric. For this reason, the number of predictions performed is low. This problem can be overcomed with the introduction of different sources of information related to proteins in the clusters for the evaluation of the clusters. The ClustOrth algorithm also outperformed the NetworkBLAST algorithm which aims to find orthologous protein clusters using protein sequence information directly for determining orthologies. It can be concluded that this study is one of the leading studies addressing the protein cluster matching problem for identifying orthologous functional modules of protein interaction networks computationally.
16

Predicting gene–phenotype associations in humans and other species from orthologous and paralogous phenotypes

Woods, John Oates, III 21 February 2014 (has links)
Phenotypes and diseases may be related by seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and one member of the orthology relationship may be used to predict candidate genes for its counterpart. (There exists evidence of "paralogous phenotypes" as well, but validation is non-trivial.) In Chapter 2, I demonstrate the utility of including plant phenotypes in our database, and provide as an example the prediction of mammalian neural crest defects from an Arabidopsis thaliana phenotype, negative gravitropism defective. In the third chapter, I describe the incorporation of additional phenotypes into our database (including chicken, zebrafish, E. coli, and new C. elegans datasets). I present a method, developed in coordination with Martin Singh-Blom, for ranking predicted candidate genes by way of a k nearest neighbors naïve Bayes classifier drawing phenolog information from a variety of species. The fourth chapter relates to a computational method and application for identifying shared and overlapping pathways which contribute to phenotypes. I describe a method for rapidly querying a database of phenotype--gene associations for Boolean combinations of phenotypes which yields improved predictions. This method offers insight into the divergence of orthologous pathways in evolution. I demonstrate connections between breast cancer and zebrafish methylmercury response (through oxidative stress and apoptosis); human myopathy and plant red light response genes, minus those involved in water deprivation response (via autophagy); and holoprosencephaly and an array of zebrafish phenotypes. In the first appendix, I present the SciRuby Project, which I co-founded in order to bring scientific libraries to the Ruby programming language. I describe the motivation behind SciRuby and my role in its creation. Finally in Appendix B, I discuss the first beta release of NMatrix, a dense and sparse matrix library for the Ruby language, which I developed in part to facilitate and validate rapid phenolog searches. In this work, I describe the concept of phenologs as well as the development of the necessary computational tools for discovering phenotype orthology relationships, for predicting associated genes, and for statistically validating the discovered relationships and predicted associations. / text
17

Novel resources enabling comparative regulomics in forest tree species / Nya verktyg för komparativ regulomik i skogsträd

Sundell, David January 2017 (has links)
Lignocellulosic plants are the most abundant source of terrestrial biomass and are one of the potential sources of renewable energy that can replace the use of fossil fuels. For a country such as Sweden, where the forest industry accounts for 10% of the total export, there would be large economical benefits associated with increased biomass yield. The availability of research on wood development conducted in conifer tree species, which represent the majority of the forestry in Sweden, is limited and the majority of research has been conducted in model angiosperm species such as Arabidopsis thaliana. However, the large evolutionary distance between angiosperms and gymnosperms limits the possibility to identify orthologous genes and regulatory pathways by comparing sequence similarity alone. At such large evolutionary distances, the identification of gene similarity is, in most cases, not sufficient and additional information is required for functional annotation. In this thesis, two high-spatial resolution datasets profiling wood development were processed; one from the angiosperm tree Populus tremula and the other from the conifer species Picea abies. These datasets were each published together with a web resource including tools for the exploration of gene expression, co-expression and functional enrichment of gene sets. One developed resource allows interactive, comparative co-expression analysis between species to identify conserved and diverged co-expression modules. These tools make it possible to identifying conserved regulatory modules that can focus downstream research and provide biologists with a resource to identify regulatory genes for targeted trait improvement. / Lignocellulosa är den vanligast förekommande källan till markburen biomassa och är en av de förnybara energikällor som potentiellt kan ersätta användningen av fossila bränslen. För ett land som Sverige, där skogsindustrin som står för 10 \% av den totala exporten, skulle därför en ökad produktion av biomassa kunna ge stora ekonomiska fördelar. Forskningen på barrträd, som utgör majoriteten av svensk skog är begränsad och den huvudsakliga forskningen som har bedrivits på växter, har skett i modell organismer tillhörande gruppen gömfröiga växter som till exempel i Arabidopsis thaliana. Det evolutionära avståndet mellan gömfröiga (blommor och träd) och nakenfröiga (gran och tall) begränsar dock möjligheten att identifiera regulatoriska system mellan dessa grupper. Vid sådana stora evolutionära avstånd krävs det mer än att bara identifiera en gen i en modellorganism utan ytterligare information krävs som till exempel genuttrycksdata. I denna avhandling har två högupplösta experiment som profilerar vedens utveckling undersökts; ett från gömfröiga träd Populus tremula och det andra från nakenföriga träd (barrträd) Picea abies. Datat som behandlats har publicerats tillsammans med webbsidor med flera olika verktyg för att bland annat visa genuttryck, se korrelationer av genuttryck och test för anrikning av funktionella gener i en grupp. En resurs som utvecklats tillåter interaktiva jämförelser av korrelationer mellan arter för att kunna identifiera moduler (grupper av gener) som bevaras eller skilts åt mellan arter över tid. Identifieringen av sådana bevarade moduler kan hjälpa att fokusera framtida forskning samt ge biologer en möjlighet att identifiera regulatoriska gener för en riktad förbättring av egenskaper hos träd.
18

Recherche automatisée de motifs dans les arbres phylogénétiques / Automatic phylogenetic tree pattern matching

Bigot, Thomas 05 June 2013 (has links)
La phylogénie permet de reconstituer l'histoire évolutive de séquences ainsi que des espèces qui les portent. Les récents progrès des méthodes de séquençage ont permis une inflation du nombre de séquences disponibles et donc du nombre d'arbres de gènes qu'il est possible de construire. La question qui se pose est alors d'optimiser la recherche d'informations dans ces arbres. Cette recherche doit être à la fois exhaustive et efficace. Pour ce faire, mon travail de thèse a consisté en l'écriture puis en l'utilisation d'un ensemble de programmes capables de parcourir et d'annoter les arbres phylogénétiques. Cet ensemble de programmes porte le nom de TPMS (Tree Pattern Matching Suite). Le premier de ces programmes (tpms_query) permet d'effectuer l'interrogation de collections à l'aide d'un formalisme dédie. Les possibilités qu'il offre sont : La détection de transferts horizontaux : Si un arbre de gènes présente une espèce branchée dans un arbre au milieu d'un groupe monophylétique d'espèces avec lesquelles elle n'est pas apparentée, on peut supposer qu'il s'agit d'un transfert horizontal, si ces organismes sont des procaryotes ou des eucaryotes unicellulaires. La détection d'orthologie : Si une partie d'un arbre de gènes correspond exactement à l'arbre des espèces, on peut alors supposer que ces gènes sont un ensemble de gènes d'orthologues. La validation de phylogénies connues : Quand l'arbre des espèces donne lieu à des débats, il peut est possible d'interroger une large collection d'arbres de gènes pour voir combien de familles de gènes correspondent à chaque hypothèse. Un autre programme, tpms_computations, permet d'effectuer des opérations en parallèle sur tous les arbres, et propose notamment l'enracinement automatique des arbres via différents critères, ainsi que l'extraction de sous arbres d'orthologues (séquence unique par espèce). Il propose aussi une méthode de détection automatique d'incongruences. La thèse présente le contexte, les différents algorithmes à la base de ces programmes, ainsi que plusieurs utilisations qui en ont été faites / Phylogeny allows to reconstruct evolutionnary history of sequences and species that carry them. Recent progress in sequencing methods produced a growing number of available sequences, and so of number of gene trees that one can build. One of the consecutive issues is to optimise the extraction of information from the trees. Such an extraction should be complete and efficient. To address this, my thesis consisted in writing and then using a suite of programs which aim to browse and annotate phylogenic trees. This program suite is named TPMS (Tree Pattern Matching Suite). It browses and annotates trees with several algorithms. The first of them, tpms_query consists in querying collections using a dedicated formalism. This allows to: Detect horizontal transfers If, in a gene tree, a species is nested in a monophyletic group of unrelated species, one can infer this is a horizontal transfer, if this organisms are prokaryotic (also concerning some unicellular eukaryotes). Orthology detection: if a part of a gene tree exactly matches to the species tree, one can suppose these genes are set of orthologues. Validating known phylogenies: when controversy exists concerning the species tree, it is possible to query a lange collection of gene trees to perform a count of families matching to each hypothesis. Another program allows to perform parallel operations on all the trees, such as automating rooting of trees via different criterions. It also allows an automatic detection of incongruencies. The thesis introduces the context, different algorithms which the programs are based on, and several using performed with it
19

Exploitation de marqueurs évolutifs pour l'étude des relations génotype-phénotype : application aux ciliopathies / Exploitation of evolutionary markers to explore genotype-phenotype relationships : applications to ciliopathies

Nevers, Yannis Alain 14 December 2018 (has links)
A l’ère des omiques, l’étude des relations génotype-phénotype repose sur l’intégration de données diverses décrivant des aspects complémentaires des systèmes biologiques. La génomique comparative offre un angle d’approche original, celui de l’évolution, qui permet d’exploiter la grande diversité phénotypique du Vivant. Dans ce contexte, mes travaux de thèse ont porté sur la conception de marqueurs évolutifs décrivant les gènes selon leur histoire évolutive. Dans un premier temps, j’ai construit une ressource d’orthologie complète, OrthoInspector 3.0 pour extraire une information évolutive synthétique des données génomiques. J’ai ensuite développé des outils d’exploration de ces marqueurs en relation avec les données fonctionnelles et/ou phénotypiques. Ces méthodes ont été intégrées à la ressource OrthoInspector ainsi qu’au réseau social MyGeneFriends et appliquées à l’étude des ciliopathies, conduisant à l’identification de 87 nouveaux gènes ciliaires. / In the omics era, the study of genotype-phenotype relations requires the integration of a wide variety of data to describe diverse aspects of biological systems. Comparative genomics provides an original perspective, that of evolution, allowing the exploitation of the wide phenotypic diversity of living species. My thesis focused on the design of evolutionary markers to describe genes according to their evolutionary history. First, I built an exhaustive orthology resource, called OrthoInspector 3.0, to extract synthetic evolutionary information from genomic data. I then developed methods to explore the markers in relation to functional or phenotypic data. These methods have been incorporated in the OrthoInspector resource, as well as in the MyGeneFriends social network and applied to the study of ciliopathies, leading to the identification of 87 new ciliary genes.
20

Orthologs, turn-over, and remolding of tRNAs in primates and fruit flies

Velandia-Huerto, Cristian A., Berkemer, Sarah J., Hoffmann, Anne, Retzlaff, Nancy, Romero Marroquín, Liiana C., Hernández-Rosales, Maribel, Stadler, Peter F., Bermúdez-Santana, Clara I. January 2016 (has links)
Background: Transfer RNAs (tRNAs) are ubiquitous in all living organism. They implement the genetic code so that most genomes contain distinct tRNAs for almost all 61 codons. They behave similar to mobile elements and proliferate in genomes spawning both local and non-local copies. Most tRNA families are therefore typically present as multicopy genes. The members of the individual tRNA families evolve under concerted or rapid birth-death evolution, so that paralogous copies maintain almost identical sequences over long evolutionary time-scales. To a good approximation these are functionally equivalent. Individual tRNA copies thus are evolutionary unstable and easily turn into pseudogenes and disappear. This leads to a rapid turnover of tRNAs and often large differences in the tRNA complements of closely related species. Since tRNA paralogs are not distinguished by sequence, common methods cannot not be used to establish orthology between tRNA genes. Results: In this contribution we introduce a general framework to distinguish orthologs and paralogs in gene families that are subject to concerted evolution. It is based on the use of uniquely aligned adjacent sequence elements as anchors to establish syntenic conservation of sequence intervals. In practice, anchors and intervals can be extracted from genome-wide multiple sequence alignments. Syntenic clusters of concertedly evolving genes of different families can then be subdivided by list alignments, leading to usually small clusters of candidate co-orthologs. On the basis of recent advances in phylogenetic combinatorics, these candidate clusters can be further processed by cograph editing to recover their duplication histories. We developed a workflow that can be conceptualized as stepwise refinement of a graph of homologous genes. We apply this analysis strategy with different types of synteny anchors to investigate the evolution of tRNAs in primates and fruit flies. We identified a large number of tRNA remolding events concentrated at the tips of the phylogeny. With one notable exception all phylogenetically old tRNA remoldings do not change the isoacceptor class. Conclusions: Gene families evolving under concerted evolution are not amenable to classical phylogenetic analyses since paralogs maintain identical, species-specific sequences, precluding the estimation of correct gene trees from sequence differences. This leaves conservation of syntenic arrangements with respect to "anchor elements" that are not subject to concerted evolution as the only viable source of phylogenetic information. We have demonstrated here that a purely synteny-based analysis of tRNA gene histories is indeed feasible. Although the choice of synteny anchors influences the resolution in particular when tight gene clusters are present, and the quality of sequence alignments, genome assemblies, and genome rearrangements limits the scope of the analysis, largely coherent results can be obtained for tRNAs. In particular, we conclude that a large fraction of the tRNAs are recent copies. This proliferation is compensated by rapid pseudogenization as exemplified by many very recent alloacceptor remoldings.

Page generated in 0.1046 seconds