Global ETD Search

11	Caracterização da região Bru1 no genoma da cultivar RB867515 (Saccharum spp.) utilizando sequenciamento de nova geração / Characterization of Bru1 region of sugarcane cultivar RB867515 using next generation sequencing Souza, Isabela Pavanelli de 25 September 2014 (has links) Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2017-04-18T14:12:04Z No. of bitstreams: 2 Dissertação - Isabela Pavanelli de Souza - 2014.pdf: 8334281 bytes, checksum: 3dab37e35c18875483625a1b3a46036d (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-04-18T14:13:10Z (GMT) No. of bitstreams: 2 Dissertação - Isabela Pavanelli de Souza - 2014.pdf: 8334281 bytes, checksum: 3dab37e35c18875483625a1b3a46036d (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-04-18T14:13:10Z (GMT). No. of bitstreams: 2 Dissertação - Isabela Pavanelli de Souza - 2014.pdf: 8334281 bytes, checksum: 3dab37e35c18875483625a1b3a46036d (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2014-09-25 / Financiadora de Estudos e Projetos- Finep / Outro / Sugarcane is known as one of the most important crops of the word for its sub products utilization. Four countries, led by Brazil, supply the sugar international trade. Ethanol is other important sugarcane sub product, recognized as an alternative product to sugar, and had great demand in Brazilian trade, for its utilization as non-fossil fuel. The sugarcane genome is one of the most complex among crops, with 10 Gb. Its complete genome is not available, but the recent innovations in genomics tools open up new possibilities for the investigations about the sugarcane’s genome. We did a genome assembly and annotation of a Brazilian sugarcane cultivar (RB867515) genome region, correspondent to eight R570 homologous sequences already published. We use high qualities paired-ends libraries produced by Illumina HiSeq 2000 sequencing platform. The reads were aligned against eight R570 BACs (Bacterial Artificial Chromosome) sequences stored in NCBI using Bowtie2. We used MaSuRCA to assemble the aligned reads de novo, and the consensus sequences were obtained with SAMtools mpileup option. The transposable elements were identified using RepeatMasker and the gene regions were annotated with Blastx against the GenBank non-redundant protein database. After that, the consensus sequences were aligned with the matching reference (R570) using ClustalW in Mega software, to look for the percentage of mismatches and conserved sites between them. We obtained the number of scaffolds bigger than 1 kb ranging from 607 to 2,884, and the longest scaffold had near 21 kb. The consensus sequence length ranged from 81 to 142 kb, and the recovery rate relative to the reference ranged from 82% to 97%. The sequences amounted 1 Mb of RB867515 cultivar genome. We identified 5,145 repeated elements, which 4,662 were microsatellite and 460 were transposable elements, amounted 225 kb of repeated sequences. Among the mobile elements, the retrotransposons comprises 15% of nucleotide composition, ranging from 8% to 29% among BACs. The 134 genes identified on the eight BAC consensus sequences comprised a total of 243 kb, resulting in a density of one gene per 7.2 kb. The average number of genes per BAC was 16, with an average gene length of 1,841 bp. The percentage of mismatches between the RB867515 and R570 BACs ranged from 0.27% to 1.32%. The sugarcane BACs correspond to homeologous genomic regions, with this alignment we can suggest high divergence inside an homeologous group. / A cana-de-açúcar é reconhecida como uma das mais importante culturas do mundo, pela utilização dos seus subprodutos. O genoma da cana-de-açúcar é um dos mais complexos entre as plantas cultivadas, com aproximadamente 10 Gb. Seu genoma completo ainda não foi sequenciado, mas o surgimento e a popularização de novas ferramentas de análise genômica possibilitaram estudos refinados sobre essa cultura. Com o grande volume de informações que é possível gerar, a demanda atual é a produção ferramentas eficientes para o processamento dos dados. Foi realizado um assembly e anotação de uma região do genoma da cultivar RB867515 correspondente às sequências de 8 BACs da cultivar R570. As regiões correspondentes foram obtidas por alinhamento usando Bowtie2 com reads de bibliotecas paired-ends produzidos por sequenciador automático de nova geração e montados de novo utilizando MaSuRCA. Os scaffolds foram alinhados às sequência de referência usando BWA-SW, e as sequências consenso foram obtidas pela opção mplieup do SAMtools. Reads de cDNA de cinco tecidos vegetais, provenientes de 30 genótipos de cana-de-açúcar obtidos pela estratégia RNA-seq, foram mapeados nas sequências consenso a fim de identificar as regiões gênicas, que foram anotadas utilizando Blastx contra o banco de proteínas não redundante no GenBank. As regiões repetitivas foram determinadas pelo RepeatMasker e os microssatélites pelo IMEX. Para a comparação entre as sequências das duas cultivares, foi realizado uma alinhamento das sequências correspondentes nos dois genomas utilizando ClustalW no software Mega. O assembly das oito regiões, gerou de 607 à 2884 scaffolds maiores que 1 kb, com o maior scaffold chegando a 21 kb. As sequências consenso variam de 81 a 142 kb de tamanho, representando uma taxa de recuperação em relação à referência de 82% a 97%. O tamanho total das sequências montadas somou quase 1 Mb do genoma da cultivar de cana-deaçúcar. Em relação à anotação, foram identificados 5145 elementos genéticos repetitivos, em que 4662 são microssatélites e 460 são transposons, totalizando 225 kb em sequências repetidas ao longo dos BACs. Dentro do grupo dos elementos genéticos móveis os retrotransposons são maioria, com 15% da composição nucleotídica, variando de 8% a 29% entre os BACs. Foram identificados 134 genes nas oito sequências de cana-de-açúcar analisadas, totalizando 243 kb. O número de genes por BAC variou de 11 a 26, com uma média de 16 genes por BAC. Os genes encontrados apresentaram tamanho médio de 1841 pb, variando de 443 (BAC1) à 6316 pb (BAC3). A densidade de genes média foi de 1 gene por 7,2 kb. A porcentagem de mismatches entres as sequências dos BACs de RB867515 variou de 0,27% a 1,32%. Os BACs de cana-deaçúcar correspondem a regiões genômicas homeólogas, com o alinhamento realizado com as duas cultivares pode-se sugerir que existe alta divergência dentro do grupo de homeologia. Cana-de-açúcar Genômica Bioinformática Genome assembly Saccharum spp Illumina reads GENETICA::GENETICA VEGETAL
12	Improving genome assemblies of non-model non-vertebrate animals with long reads and Hi-C Guiglielmoni, Nadege 07 September 2021 (has links) (PDF) The corpus of reference genomes is rapidly expanding as more and more genome assemblies are released for a wide variety of species. The constant progress in sequencing technologies has led to the release in 2021 of a first complete, telomere-to-telomere, gap-less assembly of a human genome, yet a myriad of eukaryote species still lack genomic resources. For animals, genomic projects have focused on species closely related to humans (vertebrates) and those with an impact on health and agriculture. By contrast, there is still a dearth of non-vertebrate genomes that poorly represents their tremendous diversity (about 95% of animal diversity).Haploid chromosome-level genome assemblies using long reads and chromosome conformation capture (such as Hi-C) have become a standard in recent publications. To provide a haploid representation of diploid and polyploid genomes, assemblers collapse haplotypes into a single sequence, yet they are sensitive to high levels of heterozygosity and often yield fragmented assemblies with artefactual duplications. I tackled these shortcomings with two strategies: improving collapsed assemblies with a comprehensive long-read assembly methodology tuned for highly heterozygous genomes; and separating haplotypes to obtain phased assemblies using long reads and Hi-C. The assemblies were finally brought to chromosome-level scaffolds with a new Hi-C scaffolder, which demonstrated its efficiency on genomes of non-model organisms.These methods were applied to generate chromosome-level assemblies of three species for which none or few assemblies of closely related species were available: the bdelloid rotifer Adineta vaga, the coral Astrangia poculata, and the chaetognath Flaccisagitta enflata. These high-quality assemblies contribute to filling the current gaps in non-vertebrate genomics and pave the way for future sequencing initiatives aiming to generate such reference assemblies for all the species on Earth. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences bio-médicales et agricoles Genome assembly Long reads Hi-C Non-vertebrate animals
13	De novo algorithms to identify patterns associated with biological events in de Bruijn graphs built from NGS data / Algorithmes de novo pour l'identification de motifs associés à des événements biologiques dans les graphes de De Bruijn construits à partir de données NGS Ishi Soares de Lima, Leandro 23 April 2019 (has links) L'objectif principal de cette thèse est le développement, l'amélioration et l'évaluation de méthodes de traitement de données massives de séquençage, principalement des lectures de séquençage d'ARN courtes et longues, pour éventuellement aider la communauté à répondre à certaines questions biologiques, en particulier dans les contextes de transcriptomique et d'épissage alternatif. Notre objectif initial était de développer des méthodes pour traiter les données d'ARN-seq de deuxième génération à l'aide de graphes de De Bruijn afin de contribuer à la littérature sur l'épissage alternatif, qui a été exploré dans les trois premiers travaux. Le premier article (Chapitre 3, article [77]) a exploré le problème que les répétitions apportent aux assembleurs de transcriptome si elles ne sont pas correctement traitées. Nous avons montré que la sensibilité et la précision de notre assembleur local d'épissage alternatif augmentaient considérablement lorsque les répétitions étaient formellement modélisées. Le second (Chapitre 4, article [11]) montre que l'annotation d'événements d'épissage alternatifs avec une seule approche conduit à rater un grand nombre de candidats, dont beaucoup sont importants. Ainsi, afin d'explorer de manière exhaustive les événements d'épissage alternatifs dans un échantillon, nous préconisons l'utilisation combinée des approches mapping-first et assembly-first. Étant donné que nous avons une énorme quantité de bulles dans les graphes de De Bruijn construits à partir de données réelles d'ARN-seq, qui est impossible à analyser dans la pratique, dans le troisième travail (Chapitre 5, articles [1, 2]), nous avons exploré théoriquement la manière de représenter efficacement et de manière compacte l'espace des bulles via un générateur des bulles. L'exploration et l'analyse des bulles dans le générateur sont réalisables dans la pratique et peuvent être complémentaires aux algorithmes de l'état de l'art qui analysent un sous-ensemble de l'espace des bulles. Les collaborations et les avancées sur la technologie de séquençage nous ont incités à travailler dans d'autres sous-domaines de la bioinformatique, tels que: études d'association à l'échelle des génomes, correction d'erreur et assemblage hybride. Notre quatrième travail (Chapitre 6, article [48]) décrit une méthode efficace pour trouver et interpréter des unitigs fortement associées à un phénotype, en particulier la résistance aux antibiotiques, ce qui rend les études d'association à l'échelle des génomes plus accessibles aux panels bactériens, surtout ceux qui contiennent des bactéries plastiques. Dans notre cinquième travail (Chapitre 7, article [76]), nous évaluons dans quelle mesure les méthodes existantes de correction d'erreur ADN à lecture longue sont capables de corriger les lectures longues d'ARN-seq à taux d'erreur élevé. Nous concluons qu'aucun outil ne surpasse tous les autres pour tous les indicateurs et est le mieux adapté à toutes les situations, et que le choix devrait être guidé par l'analyse en aval. Les lectures longues d'ARN-seq fournissent une nouvelle perspective sur la manière d'analyser les données transcriptomiques, puisqu'elles sont capables de décrire les séquences complètes des ARN messagers, ce qui n'était pas possible avec des lectures courtes dans plusieurs cas, même en utilisant des assembleurs de transcriptome de l'état de l'art. En tant que tel, dans notre dernier travail (Chapitre 8, article [75]), nous explorons une méthode hybride d'assemblage d'épissages alternatifs qui utilise des lectures à la fois courtes et longues afin de répertorier les événements d'épissage alternatifs de manière complète, grâce aux lectures courtes, guidé par le contexte intégral fourni par les lectures longues / The main goal of this thesis is the development, improvement and evaluation of methods to process massively sequenced data, mainly short and long RNA-sequencing reads, to eventually help the community to answer some biological questions, especially in the transcriptomic and alternative splicing contexts. Our initial objective was to develop methods to process second-generation RNA-seq data through de Bruijn graphs to contribute to the literature of alternative splicing, which was explored in the first three works. The first paper (Chapter 3, paper [77]) explored the issue that repeats bring to transcriptome assemblers if not addressed properly. We showed that the sensitivity and the precision of our local alternative splicing assembler increased significantly when repeats were formally modeled. The second (Chapter 4, paper [11]), shows that annotating alternative splicing events with a single approach leads to missing out a large number of candidates, many of which are significant. Thus, to comprehensively explore the alternative splicing events in a sample, we advocate for the combined use of both mapping-first and assembly-first approaches. Given that we have a huge amount of bubbles in de Bruijn graphs built from real RNA-seq data, which are unfeasible to be analysed in practice, in the third work (Chapter 5, papers [1, 2]), we explored theoretically how to efficiently and compactly represent the bubble space through a bubble generator. Exploring and analysing the bubbles in the generator is feasible in practice and can be complementary to state-of-the-art algorithms that analyse a subset of the bubble space. Collaborations and advances on the sequencing technology encouraged us to work in other subareas of bioinformatics, such as: genome-wide association studies, error correction, and hybrid assembly. Our fourth work (Chapter 6, paper [48]) describes an efficient method to find and interpret unitigs highly associated to a phenotype, especially antibiotic resistance, making genome-wide association studies more amenable to bacterial panels, especially plastic ones. In our fifth work (Chapter 7, paper [76]), we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting high-error-rate RNA-seq long reads. We conclude that no tool outperforms all the others across all metrics and is the most suited in all situations, and that the choice should be guided by the downstream analysis. RNA-seq long reads provide a new perspective on how to analyse transcriptomic data, since they are able to describe the full-length sequences of mRNAs, which was not possible with short reads in several cases, even by using state-of-the-art transcriptome assemblers. As such, in our last work (Chapter 8, paper [75]) we explore a hybrid alternative splicing assembly method, which makes use of both short and long reads, in order to list alternative splicing events in a comprehensive manner, thanks to short reads, guided by the full-length context provided by the long reads ARN-seq Lectures courtes Lectures longues Épissage alternatif Graphes de De Bruijn Bulles Correction d'erreurs RNA-seq Short reads Long reads Alternative splicing De Bruijn graphs Bubbles Genome-wide association studies Error-correction 570.15
14	Les marchands de littérature : les nouveaux canaux de distribution d'oeuvres littéraires et de promotions de la lecture au Canada Potapowicz, Izabela January 2005 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Lecture Médias Club de lecture Communauté de lecteurs Canons littéraires Institution littéraire "Canada Reads" "The Oprah's Book Club"
15	RIPPLiT and ChimeraTie: High throughput tools for understanding higher order RNP structures Metkar, Mihir 30 July 2018 (has links) Even after their discovery more than 60 years ago, little is known about how messenger RNAs (mRNAs) are packaged inside the cells. To ensure efficient and accurate delivery of the intended message to its proper destination, it is important to package the informational molecule in a way that protects it from premature degradation but also proper decoding at the destination. However, very little is known about the this fundamentally important step of mRNA packaging inside eukaryotic cells. To this end, we developed a novel approach, RIPPLiT (RNA ImmunoPrecipitation and Proximity Ligation in Tandem), to capture the 3D architecture of the ribonucleoprotein particles (RNPs) of interest transcriptome-wide. To begin with, we applied RIPPLiT to the exon-junction complex (EJC), a set of proteins stably bound to a spliced RNA. EJCs have been shown to interact with other proteins like SR- and SR-like to form megadalton sized complexes and help protect large regions of mRNAs. Thus, we hypothesized that these RNPs would provide an ideal system to elucidate the higher order organization of mRNPs. Preliminary analysis of data obtained from RIPPLiT consisted of “chimeric reads”, reads with multiple RNA fragments ligated together, which could not be analyzed with any of the existing bioinformatics tools. Thus, we developed a new bioinformatics suite, ChimeraTie, to map, analyze and visualize chimeric reads. Performing polymer analysis on chimeric reads obtained for hundreds of mRNAs, we were able to predict that mRNPs are linearly and densely packed into flexible rod-like structures before they undergo translation. In this thesis, along with the detailed biological conclusion, I have also provided a step-wise manual to perform RIPPLiT experiment and analyze the ensuing data using ChimeraTie. mRNA RNA-RNA interactions RNA higher order structure proximity ligations RIPPLiT ChimeraTie chimeric reads Biochemistry Bioinformatics Molecular Biology Structural Biology
16	Optimizing Parameters for High-quality Metagenomic Assembly Kumar, Ashwani 29 July 2015 (has links) No description available. Microbiology
17	Nouvelles techniques informatiques pour la localisation et la classification de données de séquençage haut débit / Novel computational techniques for mapping and classification of Next-Generation Sequencing data Brinda, Karel 28 November 2016 (has links) Depuis leur émergence autour de 2006, les technologies de séquençage haut débit ont révolutionné la recherche biologique et médicale. Obtenir instantanément une grande quantité de courtes ou longues lectures de presque tout échantillon biologique permet de détecter des variantes génomiques, révéler la composition en espèces d’un métagénome, déchiffrer la biologie du cancer, décoder l'évolution d’espèces vivantes ou disparues, ou mieux comprendre les schémas de la migration humaine et l'histoire humaine en général. La vitesse à laquelle augmente le débit des technologies de séquençage dépasse la croissance des capacités de calcul et de stockage, ce qui crée de nouveaux défis informatiques dans le traitement de données de séquençage haut débit. Dans cette thèse, nous présentons de nouvelles techniques informatiques pour la localisation (mapping) de lectures dans un génome de référence et pour la classification taxonomique. Avec plus d'une centaine d’outils de localisation publiés, ce problème peut être considéré comme entièrement résolu. Cependant, une grande majorité de programmes suivent le même paradigme et trop peu d'attention a été accordée à des approches non-standards. Ici, nous introduisons la localisation dynamique dont nous montrons qu’elle améliore significativement les alignements obtenus, par comparaison avec les approches traditionnelles. La localisation dynamique est basée sur l'exploitation de l'information fournie par les alignements calculés précédemment, afin d’améliorer les alignements des lectures suivantes. Nous faisons une première étude systématique de cette approche et démontrons ses qualités à l'aide de Dynamic Mapping Simulator, une pipeline pour comparer les différents scénarios de la localisation dynamique avec la localisation statique et le “référencement itératif”. Une composante importante de la localisation dynamique est un calculateur du consensus online, c’est-à-dire un programme qui collecte des statistiques des alignements pour guider, à la volée, les mises à jour de la référence. Nous présentons OCOCO, calculateur du consensus online qui maintient des statistiques des positions génomiques individuelles à l’aide de compteurs de bits compacts. Au-delà de son application à la localisation dynamique, OCOCO peut être utilisé comme un calculateur de SNP online dans divers pipelines d'analyse, ce qui permet de prédire des SNP à partir d'un flux sans avoir à enregistrer les alignements sur disque. Classification métagénomique de lectures d’ADN est un autre problème majeur étudié dans la thèse. Etant donné des milliers de génomes de référence placés sur un arbre taxonomique, le problème consiste à affecter rapidement aux nœuds de l'arbre une énorme quantité de lectures NGS, et éventuellement estimer l'abondance relative des espèces concernées. Dans cette thèse, nous proposons des techniques améliorées pour cette tâche. Dans une série d'expériences, nous montrons que les graines espacées améliorent la précision de la classification. Nous présentons Seed-Kraken, extension sur les graines espacées du logiciel populaire Kraken. En outre, nous introduisons une nouvelle stratégie d'indexation basée sur le transformé de Burrows-Wheeler (BWT), qui donne lieu à un indice beaucoup plus compact et plus informatif par rapport à Kraken. Nous présentons une version modifiée du logiciel BWA qui améliore l’index BWT pour la localisation rapide de k-mers / Since their emergence around 2006, Next-Generation Sequencing technologies have been revolutionizing biological and medical research. Obtaining instantly an extensive amount of short or long reads from almost any biological sample enables detecting genomic variants, revealing the composition of species in a metagenome, deciphering cancer biology, decoding the evolution of living or extinct species, or understanding human migration patterns and human history in general. The pace at which the throughput of sequencing technologies is increasing surpasses the growth of storage and computer capacities, which still creates new computational challenges in NGS data processing. In this thesis, we present novel computational techniques for the problems of read mapping and taxonomic classification. With more than a hundred of published mappers, read mapping might be considered fully solved. However, the vast majority of mappers follow the same paradigm and only little attention has been paid to non-standard mapping approaches. Here, we propound the so-called dynamic mapping that we show to significantly improve the resulting alignments compared to traditional mapping approaches. Dynamic mapping is based on exploiting the information from previously computed alignments, helping to improve the mapping of subsequent reads. We provide the first comprehensive overview of this method and demonstrate its qualities using Dynamic Mapping Simulator, a pipeline that compares various dynamic mapping scenarios to static mapping and iterative referencing. An important component of a dynamic mapper is an online consensus caller, i.e., a program collecting alignment statistics and guiding updates of the reference in the online fashion. We provide OCOCO, the first online consensus caller that implements a smart statistics for individual genomic positions using compact bit counters. Beyond its application to dynamic mapping, OCOCO can be employed as an online SNP caller in various analysis pipelines, enabling calling SNPs from a stream without saving the alignments on disk. Metagenomic classification of NGS reads is another major problem studied in the thesis. Having a database of thousands reference genomes placed on a taxonomic tree, the task is to rapidly assign to tree nodes a huge amount of NGS reads, and possibly estimate the relative abundance of involved species. In this thesis, we propose improved computational techniques for this task. In a series of experiments, we show that spaced seeds consistently improve the classification accuracy. We provide Seed-Kraken, a spaced seed extension of Kraken, the most popular classifier at present. Furthermore, we suggest a new indexing strategy based on a BWT-index, obtaining a much smaller and more informative index compared to Kraken. We provide a modified version of BWA that improves the BWT-index for a quick k-mer look-up Algorithmes Séquençage haut débit Lectures ADN Classification métagenomique Localisation de lectures ADN Prédiction de consensus Algorithms Next Generation Sequencing DNA reads Metagenomic classification Read mapping Consensus calling
18	Rekonstrukce repetitivních elementů DNA / Reconstruction of Repetitive Elements in DNA Hypský, Jan January 2018 (has links) Eukaryotic genomes contain a large number of repetitive structures. Their detection and assembly today are the main challenges of bioinformatics. This work includes a classification of repetitive DNA and represents an implementation of a novel de novo assembler focusing on searching and constructing LTR retrotransposons and satellite DNA. Assembler accepts on his input short reads (single or pair-end), obtained from next-generation sequencing machines (NGS). This assembler is based on Overlap Layout Consensus approach.
19	Detekce CNV v sekvenačních datech / CNV detection in the sequencing data Pleskačová, Barbora January 2020 (has links) Copy number variation detection in prokaryotic organisms is currently receiving more and more attention, mainly due to the association of CNV with pathogenicity and antibiotic resistance in bacteria. The algorithm designed in this thesis uses peak detection in sequencing coverage to detect CNV segments. Read coverage is commonly obtained by mapping sequencing reads of one individual to an already known reference of another individual of the same species. However, two individuals will always differ in a certain number of genes, resulting in unmapped reads that are unnecessarily discarded. Therefore, this work assumes that the biological accuracy of CNV detection can be increased by using a new reference that is created from the same set of reads as the reads mapped to this reference. Sequencing reads of Klebsiella pneumoniae individuals are used to verify this assertion.
20	Étude des chromosomes sexuels et du déterminisme du sexe chez les plantes : comparaison des systèmes Silene et Coccinia / A study of sex chromosomes and sex determination in plants : Silene and Coccinia systems comparison Fruchard, Cécile 09 July 2018 (has links) Bien que les sexes séparés (dioecie) soient plus rares que chez les animaux, ∼15 600 espèces dioiques ont évolué chez les angiospermes (∼6% de l'ensemble des espèces). La manière dont le sexe de ces plantes est contrôlé est une question centrale de la biologie végétale, mais également de l'agronomie car de nombreuses plantes cultivées sont des plantes dioiques (∼20% des espèces cultivées) mais dont un seul sexe (généralement les femelles) présente un intérêt agronomique. Pourtant, seulement trois gènes du déterminisme du sexe ont été identifiés à ce jour chez les plantes dioiques, chez le kaki, l'asperge et la fraise. La dioecie a vraisemblablement évolué plusieurs fois chez les angiospermes et il est possible que les gènes du déterminisme du sexe soient divers. Deux voies principales d'évolution vers la dioecie ont été identifiées. Les deux partent d'une espèce dont les fleurs sont hermaphrodites, le régime de reproduction ancestral chez les angiospermes, puis passent soit par un intermédiaire monoique (espèce avec des fleurs unisexuées mâles et femelles sur le même individu), soit par un intermédiaire gynodioique (espèce avec des femelles et des individus avec des fleurs hermaphrodites). Cette thèse a pour objet la comparaison de deux systèmes de plantes représentant ces deux voies. Chez Coccinia grandis, une cucurbitacée ayant également des chromosomes XY, l'évolution de la dioecie est passée par la monoecie. Chez Silene latifolia, une plante dioique bien étudiée avec des chromosomes sexuels XY, l'évolution de la dioecie s'est faite à partir de la gynodioecie. Trois gènes contrôlant la monoecie ont été identifiés chez le melon et il a été proposé que ces gènes soient les gènes du déterminisme dans les espèces dioiques proches du melon comme C. grandis. Nous avons donc opté pour une approche gène candidat dans cette espèce. Très peu de ressources génétiques et génomiques sont disponibles chez C. grandis, et nous avons choisi d'utiliser SEXDETector, une méthode probabiliste qui utilise des données RNA-seq pour génotyper des parents et leurs descendants, et qui infère les gènes lies au sexe sans génome de référence. Cette méthode m'a permis d'identifier 1 364 gènes présents sur les chromosomes sexuels de C. grandis. J'ai établi que les gènes differentiellement exprimés entre les sexes étaient plus abondants sur chromosomes sexuels que sur les autosomes. J'ai également observé des marques de la dégénérescence du chromosome Y chez cette plante, comme des diminutions d'expression ou des pertes de gènes. Enfin, mes résultats démontrent la présence de compensation de dosage chez C. grandis. Le test des gènes candidats est en cours. Chez S. latifolia, 3 grandes régions liées au déterminisme ont déjà été identifiées sur le chromosome Y. Pour identifier les gènes du déterminisme, nous avons choisi de séquencer ce chromosome. Le séquençage des chromosomes Y est encore un défi pour la génomique. La phase d'assemblage est très difficile à cause des répétitions présentes en grand nombre sur ces chromosomes. En conséquence, les séquences complètes de chromosome Y sont très rares, et principalement disponibles chez les animaux. Afin de minimiser les problèmes d'assemblage dus aux répétitions, nous avons utilisé des techniques dites de 3eme génération (avec de grandes lectures). J'ai moi-même généré des données MinION (Oxford Nanopore) à partir d'ADN de chromosome Y. L'assemblage a été réalisé en combinant des données Illumina, PacBio et MinION. Notre assemblage final fait une taille de 563 Mb pour un N50 de 6 114 pb, et contient 16 219 gènes annotés de novo / Although rarer than in animals, separate sexes (dioecy) have evolved in ∼15,600 angiosperm species (∼6% of all angiosperm species). How sex is controlled is a central question in plant sciences and also in agronomy as many crops are dioecious (∼20% of crops) with only one useful sex (usually female). Only three master sex-determining genes have been identified in dioecious plants so far, namely in persimmons, asparagus and strawberry. Dioecy likely evolved several times independently in angiosperms, suggesting that sex-determining genes are of diverse origins. Hermaphroditism is the predicted ancestral state of the angiosperm flower. Two main pathways have been identified that explain the evolution of hermaphroditism towards dioecy: either through a monoecious state (with both unisexual male and female flowers on the same individual) or a gynodioecious state (with females and individuals having hermaphroditic flowers). My aim is to compare two plant systems representing each one of these two pathways. In Coccinia grandis, a Cucurbitaceae with an XY chromosome system, dioecy evolved through monoecy. In Silene latifolia, a well-studied dioecious plant with XY sex chromosomes, dioecy evolved through gynodioecy. Three genes controlling monoecy have been identified in melon, and it was suggested that these genes act as sex-determining genes in closely related dioecious species such as C. grandis. I therefore chose a candidate gene approach in this species. Very few genetic and genomic data are available in C. grandis, and we chose to use SEX-DETector, a probabilistic method that uses RNA-seq data to genotype parents and their offspring, and infers sex-linked genes with no need for a reference genome. This method allowed me to identify 1,364 genes that are present on the sex chromosomes of C. grandis. I found that the sex chromosomes are enriched in sex-biasedgenes when compared to autosomes and I characterized Y chromosome degeneration in terms of decreased expression and gene loss. Finally, I showed that dosage compensation occurs in C. grandis. Testing for the three candidates genes is ongoing. In S. latifolia 3 regions involved in sex determination have already been identified on the Y chromosome. We chose to sequence this chromosome to identify sex-determining genes. The sequencing of Y chromosomes remains one of the greatest challenges of current genomics. The assembly step is very difficult because of their highly repeated content. Consequently, fully sequenced Y chromosomes are rare and mainly available for research in animals. To overcome the difficulty of assembling reads with many repeats, I used third generation sequencing (TGS, producing long reads). I produced a dataset using the Oxford Nanopore MinION sequencer with Y chromosome DNA. Assembling was performed using a combination of Illumina, MinION and PacBio sequencing data. The final assembly had a total length of 563 Mb with a scaffold N50 of 6,114 bp, and contained 16,219 de novo annotated genes Angiospermes Déterminisme du sexe Chromosomes sexuels Dioécie Séquençage longs reads MinION Assemblage de génome de novo Silene latifolia Angiosperms Sex determination Sex chromosomes Dioecy Long read sequencing MinION De novo genome assembly Silene latifolia 570

Search results