Global ETD Search

11	Assembly of Two CCDD Rice Genomes, Oryza grandiglumis and Oryza latifolia, and the Study of Their Evolutionary Changes Alsantely, Aseel O. 01 1900 (has links) Every day more than half of the world consumes rice as a primary dietary resource. Thus, rice is one of the most important food crops in the world. Rice and its wild relatives are part of the genus Oryza. Studying the genome structure, function, and evolution of Oryza species in a comparative genomics framework is a useful approach to provide a wealth of knowledge that can significantly improve valuable agronomic traits. The Oryza genus includes 27 species, with 11 different genome types as identified by genetic and cytogenetic analyses. Six genome types, including that of domesticated rice - O. sativa and O. glaberrima, are diploid, and the remaining 5 are tetraploids. Three of the tetraploid species contain the CCDD genome types (O. grandiglumis, O. latifolia, and O. alta), which arose less than 2 million years ago. Polyploidization is one of the major contributors to evolutionary divergence and can thereby lead to adaptation to new environmental niches. An important first step in the characterization of the polyploid Oryza species is the generation of a high-quality reference genome sequence. Unfortunately, up until recently, the generation of such an important and fundamental resource from polyploid species has been challenging, primarily due to their genome complexity and repetitive sequence content. In this project, I assembled two high-quality genomes assemblies for O. grandiglumis and O. latifolia using PacBio long-read sequencing technology and an assembly pipeline that employed 3 genome assemblers (i.e., Canu/2.0, Mecat2, and Flye/2.5) and multiple rounds of sequence polishing with both Arrow and Pilon/1.23. After the primary assembly, sequence contigs were arranged into pseudomolecules, and homeologous chromosomes were assigned to their respective genome types (i.e., CC or DD). Finally, the assemblies were extensively edited manually to close as many gaps as possible. Both assemblies were then analyzed for transposable element and structural variant content between species and homoeologous chromosomes. This enabled us to study the evolutionary divergence of those two genomes, and to explore the possibility of neo-domesticating either species in future research for my PhD dissertation. Genome Assembly Structural Variants Transposable Elements Oryza Species Plants
12	Outrageous orchid organellar genomes: Structural evolution and composition Valencia Duarte, Janice E 01 May 2022 (has links) (PDF) Organellar genomes are remnants of more complex bacterial genomes reduced until reach the simplest and most efficient content. Regularly depicted as circular, these genomes can form other structures, like linear, ramified, or entangled chromosomes, or a combination of those. Nonetheless, their gene content is nearly constant throughout flowering plants based on the multiple plastid genomes (plastomes) and the comparatively few mitochondrial genomes (mitogenomes) sequenced to date. Here, I explored the evolution of the organellar genomes in orchids from a phylogenetic perspective. For this research, plastomes and mitogenomes were assembled from short pair-ended reads obtained using Illumina sequencing technology. I developed a workflow to confidently recover plastid and mitochondrial sequences, even for regions without references in databases (chapter 1). The comparison among taxa from all orchid subfamilies identified patterns of gain, loss and rearrangement of coding and non-coding DNA. Plastid and mitochondrial protein-coding genes present in all samples were used to reconstruct the phylogenetic history of orchids that was coincident in terms of topology (chapter 1). Plastomes can suffer degradation in heterotrophic species, however that is not true for mixotrophic species, as I discovered by comparing albino and green individuals of the orchid Epipactis helleborine. I found that albino plants did not suffer loss of any genes and that the sequence was almost identical to the photosynthetic plants (chapter 2). In contrast to what it is observed in angiosperm plastomes, for which the structure, content and size is conserved, plant mitogenomes are highly variable in size, which can increase by the acquisition of external DNA via horizontal gene transfer. In some orchids, the mitogenome hosts a sixteen-gene sequence transferred from a fungal mitogenome to a clade of epidendroid orchids 12-60 My ago, and has been fragmented, conserved, or fully lost since (chapter 3). Transfer RNA genes are variable in number and origin throughout orchid evolution. I identified that they had four different sources, three novel possible replacement events of the native genes with plastid-origin genes, seven tRNA remodeling events in orchids and three more in other angiosperms (chapter 4). Our comparative studies conclude that there are three main dynamics that shape the organellar genomes: gain, loss and rearrangement of genomic content. I presented examples of them in orchids (chapter 5). Additionally, I created two sets of genomic resources: one comprises eighteen new orchid mitogenomes and plastomes, and the second consists of a well-curated set of references of tRNA genes in mitogenomes discriminated by origin. These results contribute to increasing the knowledge of angiosperm organellar genomes and highlight the importance of comprehensive studies that allow the interpretation of the genomic changes in the light of the phylogenetic evolution. Genome assembly Horizontal gene transfer Mitogenome Orchidaceae Plastome tRNA
13	<em>De novo</em> Genome Assembly and SNP Marker Development of <em>Pyrenophora semeniperda</em> Soliai, Marcus Makina 17 March 2011 (has links) (PDF) Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen of a variety of grass genra and species, including Bromus tectorum, an exotic grass that has invaded many natural ecosystems of the U.S. Intermountain West. As a natural seed pathogen of B. tectorum, P. semeniperda has potential as a biocontrol agent due to its effectiveness at killing dormant B. tectorum seeds; however, few genetic resources exist for this fungus. Here, the genome assembly of a P. semeniperda isolate using 454 GS-FLX genomic and paired-end pyrosequencing techniques is presented. The total assembly is 32.5 Mb and contains 11,453 gene models greater than 24 amino acids. The assembly contains a variety of predicted genes that are involved in pathogenic pathways typically found in necrotrophic fungi. In addition, 454 sequence reads were used to identify single nucleotide polymorphisms between two isolates of P. semeniperda. In total, 20 SNP markers were developed for the purposes of recombination assesment of 600 individual P. semeniperda isolates representing 36 populations from throughout the U.S. Intermountain West. Although 17 of the fungal populations were fixed at all SNP loci, linkage disequilibrium was determined in the remaining 18 populations. This research demonstrates the effectiveness of the 454 GS-FLX sequencing technology, for de novo assembly and marker development of filamentous fungal genomes. Many features of the assembly match those of other Pyrenophora genomes including P. tritici-repentis and P. teres f. teres, which lend validity to our assembly. These findings present a significant resource for examining and furthering our understanding of P. semeniperda biology. 454 sequencing genome assembly SNPs linkage disequilibrium Animal Sciences
14	The Bioluminescence Heterozygous Genome Assembler Price, Jared Calvin 01 December 2014 (has links) (PDF) High-throughput DNA sequencing technologies are currently revolutionizing the fields of biology and medicine by elucidating the structure and function of the components of life. Modern DNA sequencing machines typically produce relatively short reads of DNA which are then assembled by software in an attempt to produce a representation of the entire genome. Due to the complex structure of all but the smallest genomes, especially the abundant presence of exact or almost exact repeats, all genome assemblers introduce errors into the final sequence and output a relatively large set of contigs instead of full-length chromosomes (a contig is a DNA sequence built from the overlaps between many reads). These problems are dramatically worse when homologous copies of the same chromosome differ substantially. Currently such genomes are usually avoided as assembly targets and, when they are not avoided, they generally produce assemblies of relatively low quality. An improved algorithm for the assembly of such data would dramatically improve our understanding of the genetics of a large class of organisms. We present a unique algorithm for the assembly of diploid genomes which have a high degree of variation between homologous chromosomes. The approach uses coverage, graph patterns and machine-learning classification to identify haplotype-specific sequences in the input reads. It then uses these haplotype-specific markers to guide an improved assembly. We validate the approach with a large experiment that isolates and elucidates the effect of single nucleotide polymorphisms (SNPs) on genome assembly more clearly than any previous study. The experiment conclusively demonstrates that the Bioluminescence heterozygous genome assembler produces dramatically longer contigs with fewer haplotype-switch errors than competing algorithms under conditions of high heterozygosity. genome genome assembly polymorphic polymorphism heterozygous haplotype algorithm Computer Sciences
15	Characterization of the genetic diversity and thermal tolerance of Pocilloporid Corals in the Red Sea Buitrago-López, Carol 07 1900 (has links) This dissertation characterizes the genetic diversity and thermal tolerance of the coral holobiont Stylophora pistillata and Pocillopora verrucosa (family Pocilloporidae) across the Saudi Arabian Red Sea coast (~1500 km). The population genetic structure and holobiont diversity was assessed using genome-wide single nucleotide polymorphisms (SNPs) identified with reference genome-based RAD-Seq, while the associated microbial communities of the algal symbiont (Symbiodiniaceae) and bacteria were inferred from metabarcoding analyses of the ITS2 and 16S rRNA gene. Thermal tolerance of Stylophora pistillata colonies was assessed using standardized short-term heat stress assays on the novel Coral Bleaching Automated Stress System (CBASS). Chapter 1 details the assembly and annotation of the P. verrucosa genome (~380 Mbp; 27,439 gene models), which was highly complete and compared well to the already available S. pistillata genome. Chapter 2 presents population genetic analyses of both coral species, which revealed pronounced differences in their population genetic structure. While P. verrucosa seemed to be highly connected across the Red Sea basin with the exception of the far south, S. pistillata depicted a complex population genetic structure. Microbial communities of Symbiodiniaceae and bacteria were overall less diverse in P. verrucosa than in S. pistillata, and followed an association pattern that was partly determined by the environment and partly by host genotype. Chapter 3 identifies thermally tolerant S. pistillata genotypes by comparing the heat stress response of colonies collected at two sites within the same reef. Ex-situ heat-stress assays confirmed that colonies from the more temperature stable site (fore reef) were less thermally tolerant than their conspecifics from the back reef, where the diel temperature is more variable. This chapter also highlights the utility of acute heat-stress assays as a tool to identify thermotolerant colonies. Taken together, the work of this dissertation provides a foundation for coral conservation in the Red Sea. It highlights that the genetic structure differs between coral species, suggesting that effective conservation through marine protected areas need to incorporate data from multiple species. Coral population genetic data should further be complemented by thermal tolerance assays across the Red Sea to associate genetic diversity with patterns of heat stress tolerance. Corals Population genetics Thermal tolerance Red Sea Genome assembly Pocilloporid
16	Draft Assembly and Baseline Annotation of the Ziziphus spina-christi Genome Shuwaikan, Raghad H. 07 1900 (has links) Third generation sequencing has revolutionized our understanding of genomics, and enabled the in-depth discovery of complex plant genomes. In this project I aimed to assemble and annotate the genome of Z. spina-christi, a native plant to Saudi Arabia, as part of the the Kingdom of Saudi Arabia Native Genome Project established at the Center for Desert Agriculture at KAUST. Initially, a voucher plant was selected from the Al Lith region of Western Saudi Arabia. Fresh leaf tissue was collected for high-molecular weight (HMW) DNA extraction, as well as seed for greenhouse propagation. After HMW DNA extraction, library construction and PacBio HiFi sequencing, I generated a de novo assembly of the Z. spina-christi genome using the Hifiasm assembler, which yielded a 1.9 Gbp long assembly with high levels of duplication. The assembled contigs were scaffolded using an in-house script based on the software RagTag, that yielded a 406 Mbp long scaffold with 331 gaps (85.45% of estimated genome size). A preliminary analysis of the assembly for transposable elements revealed a TE content of 32.36%, with Long Terminal Repeats retrotransposons (LTR-RTs) being the major contributor to the total TE content. Basline annotation was completed using Omicsbox revealing 18,330 functional genes. This work describes the first genomic resource for the desert plant Z. spina-christi. To improve the assembly, I suggest the use of scaffolding using optical mapping, long Nanopore reads and Hi-C data to capture the spatial organization of the genome. Further experimental, genetic and TEs analysis is needed to explore the plant’s resilience to abiotic stresses in extreme environments. sequencing genome assembly plant genomics plant genomes third-generation sequencing
17	Using Pan-Genomes to Include Functional Data in Ancient Pathogen Studies / Ancient DNA and Gene Function Analyses Long, George S. January 2024 (has links) Ancient DNA analyses are reliant on reference genomes to authenticate and identify endogenous genomes. While this has lead to many successful studies involving proboscidians, hominids, and ancient pathogens such as Yersinia pestis, our reliance on at most a small number of genomes greatly limits our ability to functionally describe the genome of interest. Further, given the existence of open bacterial genomes and horizontal gene transfers it is likely that reference biases have been incorporated and cited in following studies as representative of past gene diversity. By implementing and standardizing the use of bacterial pan-genomes the effect of these biases are greatly diminished while also revealing the relative capabilities of the target genome compared to the modern diversity. Describing an ancient strain by both its phylogenetic and functional similarities to modern strains allows for a more nuanced analysis of the species evolutionary history. Incongruencies between the phylogeny and genetic function are ripe for deeper analyses and the implications of its findings resonate beyond the characterization of an ancient genome. A pan-genome centric approach to ancient bacterial studies offers significant improvements compared to the current paradigm. / Dissertation / Doctor of Philosophy (PhD) Ancient DNA Bioinformatics Genome Analyses Paleogenomics Genome Assembly Pan-genomes
18	Analyse bioinformatique du génome et de l’épigénome du pommier / Bioinformatic analysis of the apple genome and epigenome Daccord, Nicolas 27 November 2018 (has links) La pomme est l’un des fruits les plus consommés au monde. En utilisant les dernières technologies de séquençage (PacBio) et de cartes optiques (BioNano), nous avons généré un assemblage de novo de haute qualité du génome du pommier (Malus domestica Borkh.). Nous avons réalisé une annotation des gènes et des éléments transposables pour permettre à cet assemblage d’être utilisé en tant que génome de référence. La grande contiguité de l’assemblage a permis de détecter les éléments transposables de façon exhaustive, ce qui fournit une opportunité sans précédents d’étudier les régions non-caractérisées d’un génome d’arbre. Nous avons également trouvé que le génome du pommier est entièrement dupliqué, comme montré par les relations de synthénie entre les chromosomes. En utilisant du Whole Genome Bisulfite Sequencing (WGBS) ainsi que l’assemblage précédemment généré, nous avons montré des cartes de méthylation de l’ADN pour tout le génome et montré une corrélation générale entre la méthylation de l’ADN près des gènes et l’expression des gènes. De plus, nous avons identifié plusieurs Régions Différentiellement Méthylées (RDMs) entre les méthylomes de fruits et de feuilles du pommier, associées à des gènes candidats qui pourraient être impliqués dans des traits agronomiques importants tel que le développement du fruit. Enfin, nous avons développé un pipeline rapide, simple et complet qui prend entièrement en charge l’analyse des données WGBS, de l’alignement des reads au calcul des RDMs. / Apple is one of the most consumed fruits in the world. Using the latest sequencing (PacBio) and optical mapping (BioNano) technologies, we have generated a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. We performed a gene annotation as well as a transposable element annotation to allow this assembly to be used as a reference genome. The highcontiguity of the assembly allowed to exhaustively detect the transposable elements, which represented over half the assembly, thus providing an unprecedented opportunity to investigate the uncharacterized regions of a tree genome. We also found that the apple genome is entirely duplicated as showed by the synteny links between chromosomes. Using Whole Genome Bisulfite Sequencing (WGBS) and the previously generated assembly, we produced genome-wide DNA methylation maps and showed a general correlation between DNA methylation next to genes and gene expression. Moreover, we identified several Differentially Methylated Regions (DMRs) between apple fruits and leaf methylomes associated to candidate genes that could be involved in agronomically relevant traits such as apple fruit development. Finally, we developped a complete and easyto- use pipeline which aim is to handle the complete treatment of WGBS data, from the reads mapping to the DMRs computing. It can handle datasets having a low number of biological replicates. Assemblage de génome Annotation de genes Méthylation différentielle Genome assembly Gene annotation Epigenetics Differential methylation 630
19	GATOOL - Genome Assembly Tool: uma ferramenta web para montagem de genomas bacterianos Oliveira, Matheus Brito de 12 June 2017 (has links) Submitted by Ricardo Cedraz Duque Moliterno (ricardo.moliterno@uefs.br) on 2017-10-09T22:34:41Z No. of bitstreams: 1 MATHUES BRITO DE OLIVEIRA Disserta??ov.pdf: 5287293 bytes, checksum: 8d3e3b854b5799f16c0b61b6a5d33f1c (MD5) / Made available in DSpace on 2017-10-09T22:34:41Z (GMT). No. of bitstreams: 1 MATHUES BRITO DE OLIVEIRA Disserta??ov.pdf: 5287293 bytes, checksum: 8d3e3b854b5799f16c0b61b6a5d33f1c (MD5) Previous issue date: 2017-06-12 / The assembly of bacterial genomes consists of a process of reordering fragments so that the original genome can be represented. However, to maximize the results of genome assembly, some steps are required, for instance, read quality analysis and preprocessing, repetition identification and quality check. The process of assembly of genomes is a complex step that involves the type of sequencing that was used, there are several types of sequencers which imply different characteristics for each one for example: fragments size, throughput, among others. Analyzing these characteristics requires the use of several computational tools, to assist in all the processes mentioned above, and since the range of software available is quite broad and distinct, it is necessary for the user to learn to work with this computational diversity, dominating often knowledge that is not of the biological area, implying in less time for a deepening in biological questions. Based on this context, we developed a pipeline to perform an automated fragment analysis, read preprocessing, genome assembly and orientation of contigs, having as the assembly the main objective of the pipeline and that it will be managed by a Web application called GATOOL (Genome Assembly Tool). Aiming to evaluate the performance of the application, tests were carried out with two samples of prokaryotic organisms, which are: Bacillus amyloliquefaciens and Serratia marcescens. Also perform a test with seven SRA samples. Both organisms are sequenced on the Ion PGMTM platform. The tools used to perform the assembly were SPAdes and Velvet, both assemblers use de Bruijn graph algorithm as a paradigm for the assembly of the genome, after this stage the resulting set of contigs was ordered through the CONTIGuator, which is a reference ordering. We observed that the interface GATOOL allowed a quick and easy execution of several steps and processes in the field of genome assembly, including the assembly of two prokaryotic species in an automated way, thus facilitating the use and accomplishment of such processes by any user. / A montagem de genomas bacterianos ? um processo de reordena??o de fragmentos, de forma que se possa representar o genoma original. Entretanto, para que a montagem de um genoma seja realizada visando maximizar os resultados, ? preciso que algumas etapas sejam cumpridas, por exemplo: a an?lise dos fragmentos, o pr?-processamento destes fragmentos e novamente uma repeti??o do processo de an?lise, para verificar a efic?cia do pr?-processamento realizado. O processo de montagem de genomas ? uma etapa complexa, que envolve o tipo de sequenciamento que foi utilizado. Existem diversos tipos de sequenciadores, o que implica caracter?sticas distintas em cada um, como por exemplo: tamanho dos fragmentos, quantidade de fragmentos gerados por corrida, dentre outros. Analisando essas caracter?sticas, faz-se necess?ria a utiliza??o de diversas ferramentas computacionais para auxiliar a todos os processos citados anteriormente e, como a gama de softwares dispon?veis ? bem ampla e distinta, ? importante que o usu?rio domine essa diversidade computacional, contendo muitas vezes conhecimentos que n?o s?o da ?rea biol?gica, implicando menos tempo para um aprofundamento das quest?es biol?gicas. Com base neste contexto, prop?em-se um pipeline para a realiza??o da an?lise de fragmentos, pr?-processamento dos fragmentos, montagem de genomas e orienta??o de contigs, tendo como a montagem o objetivo principal do pipeline e este ser? gerenciado por uma aplica??o web chamada GATOOL (Genome Assembly Tool). Visando avaliar o desempenho da aplica??o, foram feitos testes com duas amostras de organismos procariontes, que s?o: Bacillus amyloliquefaciens e Serratia marcescens. Tamb?m foram realizados testes com sete amostras SRA. Ambos os organismos est?o sequenciados na plataforma Ion PGMTM. Os montadores usados foram o SPAdes e o Velvet, ambos montadores, utilizam o algor?tmo grafo de Bruijn como paradigma para a montagem do genoma; ap?s esta etapa, o conjunto de contigs resultante foi ordenado atrav?s do CONTIGuator, que ? uma ordena??o por refer?ncia. Observamos que a interface GATOOL permitiu uma execu??o r?pida e f?cil de diversas etapas e processos no campo da montagem de genomas, inclusive realizando a montagem de duas esp?cies procariontes de maneira automatizada, facilitando assim a utiliza??o e realiza??o de tais processos por qualquer usu?rio. Genome assembly Bacterial NGS Pipeline Montagem de genoma Bact?ria
20	Hypothesis-free detection of genome-changing events in pedigree sequencing Garimella, Kiran January 2016 (has links) In high-diversity populations, a complete accounting of de novo mutations can be difficult to obtain. Most analyses involve identifying such mutations by sequencing pedigrees on second-generation sequencing platforms and aligning the short reads to a reference assembly, the genomic sequence of a canonical member (or members) of a species. Often, large regions of the genomes under study may be greatly diverged from the reference sequence, or not represented at all (e.g. the HLA, antigenic genes, or other regions under balancing selective pressure). If the haplotypic background upon which a mutation occurs is absent, events can easily be missed (as reads have nowhere to align) and false-positives may abound (as the software forces the reads to align elsewhere). This thesis presents a novel method for de novo mutation discovery and allele identification. Rather than relying on alignment, our method is based on the de novo assembly of short-read sequence data using a multi-color de Bruijn graph. In this data structure, each sample is assigned a unique index (or "color"), reads from each sample are decomposed into smaller subsequences of length k (or "kmers"), and color-specific adjacency information between kmers is recorded. Mutations can be discovered in the graph itself by searching for characteristic motifs (e.g. a "bubble motifs", indicative of a SNP or indel, and "linear motifs" indicative of allelic and non-allelic recombination). De novo mutations differ from inherited mutations in that the kmers spanning the variant allele are absent in the parents; in a sense, they facilitate their own discovery by generating "novel" sequence. We exploit this fact to limit processing of the graph to only those regions containing these novel kmers. We verified our approach using simulations, validation, and visualization. On the simulations, we developed genome and read generation software driven by empirical distributions computed from real data to emit genomes with realistic features: recombinations, de novo variants, read fragment sizes, sequencing errors, and coverage profiles. In 20 artifical samples, we determined our sensitivity and specificity for novel kmer recovery to be approximately 98% and 100% at worst, respectively. Not every novel stretch can be reconstituted as a variant, owing to errors and homology in the graph. In simulations, our false discovery rate was 10% for "bubble" events and 12% for "linear" events. On validation, we obtained a high-quality draft assembly for a single P. falciparum child using a third-generation sequencing platform. We discovered three de novo events in the draft assembly, all three of which are recapitulated in our calls on the second-generation sequencing data for the same sample; no false-positives are present. On visualization, we developed an interactive web application capable of rendering a multi-color subgraph that assists in visually distinguishing between true variation and sequencing artifacts. We applied our caller to real datasets: 115 progeny across four previously analyzed experimental crosses of Plasmodium falciparum. We demonstrate our ability to access subtelomeric compartments of the genome, regions harboring antigenic genes under tremendous selective pressure, thus highly divergent between geographically distinct isolates and routinely masked and ignored in reference-based analyses. We also show our caller's ability to recover an important form of structural de novo variation: non-allelic homologous recombination (NAHR) events, an important mechanism for the pathogen to diversify its own antigenic repertoire. We demonstrate our ability to recover the few events in these samples known to exist, and overturn some previous findings indicating exchanges between "core" (non-subtelomeric) genes. We compute the SNP mutation rate to be approximately 2.91 per sample, insertion and deletion mutation rates to be 0.55 and 1.04 per sample, respectively, multi-nucleotide polymorphisms to be 0.72 per sample, and NAHR events to be 0.33 per sample. These findings are consistent across crosses. Finally, we investigated our method's scaling capabilities by processing a quintet of previously analyzed Pan troglodytes verus (western chimpanzee) samples. The genome of the chimpanzee is two orders of magnitude larger than the malaria parasite's (3, 300 Mbp versus 23 Mbp), diploid rather than haploid, poorly assembled, and the read dataset is lower coverage (20x versus 120x). Comparing to Sequenom validation data as well as visual validation, our sensitivity is expectedly low. However, this can be attributed to overaggressiveness in data cleaning applied by the de novo assembler atop which our software is built. We discuss the precise changes that would likely need to be made in future work to adapt our method to low-coverage samples.

Search results