• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 18
  • 4
  • 3
  • 2
  • 2
  • Tagged with
  • 47
  • 47
  • 11
  • 10
  • 9
  • 8
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Polymorphism and Genome Assembly

Donmez, Nilgun 11 December 2012 (has links)
When Darwin introduced natural selection in 1859 as a key mechanism of evolution, little was known about the underlying cause of variation within a species. Today we know that this variation is caused by the acquired genomic differences between individuals. Polymorphism, defined as the existence of multiple alleles or forms at a genomic locus, is the technical term used for such genetic variations. Polymorphism, along with reproduction and inheritance of genetic traits, is a necessary condition for natural selection and is crucial in understanding how species evolve and adapt. Many questions regarding polymorphism, such as why certain species are more polymorphic than others or how different organisms tend to favor some types of polymorphism among others, when solved, have the potential to shed light on important problems in human medicine and disease research. Some of these studies require more diverse species and/or individuals to be sequenced. Of particular interest are species with the highest rates of polymorphisms. For instance, the sequencing of the sea squirt genome lead to exciting studies that would not be possible to conduct on species that possess lower levels of polymorphism. Such studies form the motivation of this thesis. Sequencing of genomes is, nonetheless, subject to its own research. Recent advances in DNA sequencing technology enabled researchers to lead an unprecedented amount of sequencing projects. These improvements in cost and abundance of sequencing revived a greater interest in advancing the algorithms and tools used for genome assembly. A majority of these tools, however, have no or little support for highly polymorphic genomes; which, we believe, require specialized methods. In this thesis, we look at challenges imposed by polymorphism on genome assembly and develop methods for polymorphic genome assembly via an overview of current and past methods. Though we borrow fundamental ideas from the literature, we introduce several novel concepts that can be useful not only for assembly of highly polymorphic genomes but also genome assembly and analysis in general.
2

Polymorphism and Genome Assembly

Donmez, Nilgun 11 December 2012 (has links)
When Darwin introduced natural selection in 1859 as a key mechanism of evolution, little was known about the underlying cause of variation within a species. Today we know that this variation is caused by the acquired genomic differences between individuals. Polymorphism, defined as the existence of multiple alleles or forms at a genomic locus, is the technical term used for such genetic variations. Polymorphism, along with reproduction and inheritance of genetic traits, is a necessary condition for natural selection and is crucial in understanding how species evolve and adapt. Many questions regarding polymorphism, such as why certain species are more polymorphic than others or how different organisms tend to favor some types of polymorphism among others, when solved, have the potential to shed light on important problems in human medicine and disease research. Some of these studies require more diverse species and/or individuals to be sequenced. Of particular interest are species with the highest rates of polymorphisms. For instance, the sequencing of the sea squirt genome lead to exciting studies that would not be possible to conduct on species that possess lower levels of polymorphism. Such studies form the motivation of this thesis. Sequencing of genomes is, nonetheless, subject to its own research. Recent advances in DNA sequencing technology enabled researchers to lead an unprecedented amount of sequencing projects. These improvements in cost and abundance of sequencing revived a greater interest in advancing the algorithms and tools used for genome assembly. A majority of these tools, however, have no or little support for highly polymorphic genomes; which, we believe, require specialized methods. In this thesis, we look at challenges imposed by polymorphism on genome assembly and develop methods for polymorphic genome assembly via an overview of current and past methods. Though we borrow fundamental ideas from the literature, we introduce several novel concepts that can be useful not only for assembly of highly polymorphic genomes but also genome assembly and analysis in general.
3

VIRAL QUASISPECIES RECONSTRUCTION USING NEXT GENERATION SEQUENCING READS

Tork, Bassam A 12 August 2013 (has links)
The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods.
4

Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

Tork, Bassam A 12 August 2013 (has links)
The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods.
5

Short-read Chromosome Level Genome Assembly of Digitaria exilis

Gapa, Liubov 11 1900 (has links)
Genomics has become an important tool in agriculture. Many modern crop breeding approaches such as genomic selection and genome editing require detailed information of the genomic composition of a crop species. However, the assembly of high-quality genome sequences is prone to technical artifacts that arise from inaccuracies in the sequencing technology and assembly algorithms. This is particularly true for the genomes of cereal crops, which are often very large, repeat-rich, and polyploid. Until recently, the highly continuous assembly of such cereal crop genomes from short-read data was mainly possible with proprietary assembly tools. In this work, we combined data generated with several short-read sequencing protocols and genomics technologies, including paired-end and mate-pair reads with multiple insert sizes, 10X linked reads, Hi-C contacts, and optical maps to assemble a chromosome level reference genome of Digitaria exilis (fonio millet) with open-source tools. Fonio millet is a semi-domesticated cereal orphan crop native to West Africa that has a high potential for desert agriculture. We implemented the TRITEX pipeline - a recently developed open-source pipeline for the assembly of large Triticeae genomes. We modified the pipeline to include 10X and Hi-C reads into the assembly process independently. We then compared the TRITEX assembly to the fonio reference genome, which had previously been assembled from the same input data but using proprietary algorithms. We found the two assemblies highly similar in content with high concordance in the local order (0.91 Pearson coefficient for alignments). However, we detected many small putative discrepancies between the two assemblies. While the TRITEX assembly was able to produce a highly continuous genome assembly, further work is needed to characterize the putative discrepancies in more detail.
6

A Machine Learning Approach to Genome Assessment

Thrash, Charles Adam 09 August 2019 (has links)
A key use of high throughput sequencing technology is the sequencing and assembly of full genome sequences. These genome assemblies are commonly assessed using statistics relating to contiguity of the assembly. Measures of contiguity are not strongly correlated with information about the biological completion or correctness of the assembly, and a commonly reported metric, N50, can be misleading. Over the past ten years, multiple research groups have rejected the overuse of N50 and sought to develop more informative metrics. This research seeks to create a ranking method that includes biologically relevant information about the genome, such as completeness and correctness of the genome. Approximately eight hundred genomes were initially selected, and information about their completeness, contiguity, and correctness was gathered using publicly available tools. Using this information, these genomes were scored by subject matter experts. This rating system was explored using supervised machine learning techniques. A number of classifiers and regressors were tested using cross validation. Two metrics were explored in this research. First, a metric that describes the distance to the ideal genome was created as a way to explore the incorporation of human subject matter expert knowledge into the genome assembly assessment process. Second, random forest regression was found to be the method of supervised learning with the highest scores. A model created by an optimized random forest regressor was saved, and a tool was created to load the saved model and rank genomes provided by the end user. These metrics both serve as ways to incorporate human subject matter expert knowledge into genome assembly assessment.
7

Computational methods for de novo assembly of next-generation genome sequencing data / Méthodes de calcul pour assemblage de novo de nouvelle génération des techniques de séquençage du génome

Chikhi, Rayan 02 July 2012 (has links)
Dans cette thèse, nous présentons des méthodes de calcul (modèles théoriques et algorithmiques) pour effectuer la reconstruction de séquences d'ADN. Il s'agit de l'assemblage de novo de génome à partir de lectures (courte séquences ADN) produites par des séquenceurs à haut débit. Ce problème est difficile, aussi bien en théorie qu'en pratique. Du point de vue théorique, les génomes sont structurellement complexes. Chaque instance d'assemblage de novo doit faire face à des ambiguïtés de reconstruction. Les lectures peuvent conduire à un nombre exponentiel de reconstructions possibles, une seule étant correcte. Comme il est impossible de déterminer laquelle, une approximation fragmentée du génome est retournée. Du point de vue pratique, les séquenceurs produisent un énorme volume de lectures, avec une redondance élevée. Une puissance de calcul importante est nécessaire pour traiter ces lectures. Le séquençage ADN évolue désormais vers des génomes et méta-génomes de plus en plus grands. Ceci renforce la nécessité de méthodes efficaces pour l'assemblage de novo. Cette thèse présente de nouvelles contributions en informatique autour de l'assemblage de génomes. Ces contributions visent à incorporer plus d'information pour améliorer la qualité des résultats, et à traiter efficacement les données de séquençage afin de réduire la complexité du calcul. Plus précisément, nous proposons un nouvel algorithme pour quantifier la couverture maximale d'un génome atteignable par le séquençage, et nous appliquons cet algorithme à plusieurs génomes modèles. Nous formulons un ensemble de problèmes informatiques pour incorporer l'information des lectures pairées dans l'assemblage, et nous étudions leur complexité. Cette thèse introduit la notion d'assemblage localisé, qui consiste à construire et parcourir un graphe d'assemblage partiel. Pour économiser l'utilisation de la mémoire, nous utilisons des structures de données optimisées spécifiquement pour la tâche d'assemblage. Ces notions sont implémentées dans un nouvel assembleur de novo, Monument. Enfin, le dernier chapitre de cette thèse est consacré à des concepts d'assemblage dépassant l'assemblage de novo classique. / In this thesis, we discuss computational methods (theoretical models and algorithms) to perform the reconstruction (de novo assembly) of DNA sequences produced by high-throughput sequencers. This problem is challenging, both theoretically and practically. The theoretical difficulty arises from the complex structure of genomes. The assembly process has to deal with reconstruction ambiguities. The output of sequencing predicts up to an exponential number of reconstructions, yet only one is correct. To deal with this problem, only a fragmented approximation of the genome is returned. The practical difficulty stems from the huge volume of data produced by sequencers, with high redundancy. Significant computing power is required to process it. As larger genomes and meta-genomes are being sequenced, the need for efficient computational methods for de novo assembly is increasing rapidly. This thesis introduces novel contributions to genome assembly, both in terms of incorporating more information to improve the quality of results, and efficiently processing data to reduce the computation complexity. Specifically, we propose a novel algorithm to quantify the maximum theoretical genome coverage achievable by sequencing data (paired reads), and apply this algorithm to several model genomes. We formulate a set of computational problems that take into account pairing information in assembly, and study their complexity. Then, two novel concepts that cover practical aspects of assembly are proposed: localized assembly and memory-efficient reads indexing. Localized assembly consists in constructing and traversing a partial assembly graph. These ingredients are implemented in a complete de novo assembly software package, the Monument assembler. Monument is compared with other state of the art assembly methods. Finally, we conclude with a series of smaller projects, exploring concepts beyond classical de novo assembly.
8

Genome assembly of the cichlid fish Astatotilapia latifasciata with focus in population genomics of B chromosome polymorphism

Jehangir, Maryam January 2017 (has links)
Orientador: Cesar Martins / Resumo: B chromosomes (Bs) are additional to the standard regular chromosome set (As), and present in all groups of eukaryotes. A reference genome is key to understand genomics aspects of an organism. Here, we present the de novo genome assembly of the cichlid fish A. latifasciata: a well known model to study Bs. The assembly of A. latifasciata genome has not been performed so far. The main focus of this study is to analyze and assemble the A. latifasciata genome with no B (B-) and with B (B+) chromosomes. The assembled draft B- and B+ genomes comprised of 774 Mb and 781 Mb with 1.8 Mb and 2.5Mb of N50 value of scaffolds respectively, and spanning 23,391 number of genes. High coverage data with Illumina sequencing was obtained for males and females with 0B, 1B and 2B chromosomes to provide information regarding the population polymorphism of these genomes. We observed a high scale genomic diversity in all analyzed genomes showing a high rate/frequency of population polymorphism with no evident effect of B chromosome presence. However, the B specific single nucleotide polymorphisms were found in the sequences that were located on B chromosome. While, the whole-genome rearrangements (inter chromosomal translocations) were detected in B+ genome, and structural variations including insertions, deletions, inversions and duplications were predicted in a representative genomic region of B chromosome. These results bring an evidence that existence of Bs in a genome should favour the accumu... (Resumo completo, clicar acesso eletrônico abaixo) / Mestre
9

Genome assembly and metabolic pathway reconstruction of Pantoea ananatis LMG 20103

Chan, Wai Yin 13 October 2012 (has links)
Next generation of sequencing (NGS) technologies have taken life science research into a new era. With the rapid advances in these technologies and the associated reduction in overall costs, the sequencing and assembly of genomes have come within reach of most laboratories. Studies related to the evolution, ecology and biology of an organism now rely heavily on genomic data and obtaining a genome sequence has become an essential resource for the rapid progress and success of these studies. Pantoea ananatis is recognised as an emerging but rather unconventional pathogen capable of infecting a wide range of different hosts. Numerous plants of agricultural and economic importance including maize, rice, onion, pineapple, melon, sudan grass and Eucalyptus trees have been affected. With the outbreak of P. ananatis in a South African Eucalyptus nursery in 1998, it was realised that very little is known about this pathogen. A better understanding of the pathogenicity, metabolism and ecology of the bacterium is required to develop strategies for the control of the disease. During this study, the genome sequence of P. ananatis strain LMG 20103 was obtained using the Roche 454 technology. To aid in the assembly of this Eucalyptus pathogen’s genome sequence, the type strain of P. ananatis LMG 2665 was also sequenced using Illinima’s Genome Analyzer (GA). A draft assembly of P. ananatis LMG 20103, consisting of 117 contigs, was generated after optimization of the Newbler assembly parameters and comparison with other genome assemblies and genomes. This study demonstrated that the assembly could be completed using both in-vitro, and in-silico approaches such as contig scaffolding, gap closure with conventional PCR reactions and sequencing, manual curation and automated genome annotation. The final complete genome consisted of a 4 386 227 bp chromosome and a 317 146 bp mega-plasmid. With the complete genome sequence available, the reconstruction of metabolic network of P. ananatis LMG 20103 was attempted using two pathways reconstruction pipelines namely, Pathway Tools and Model SEED. It was found that missing metabolic reactions and incomplete pathways in the draft metabolic networks were mainly caused by incorrect gene annotations or bioinformatic errors during the automated network reconstruction. These two pipelines differed substantially in the way network reconstruction is undertaken. Performing a comparison between the two proposed networks, annotation errors could be detected and corrected. Although some improvement could be made to the predicted network further experimental data is still required to improve the accuracy of the draft metabolic network. Despite the amount of effort and cost, it is believed that the complete genome and a draft metabolic network of P. ananatis LMG 20103 will be a valuable resource for many subsequent studies to investigate the evolution and biology of this emerging plant pathogen. This information will be essential for the development of strategies to predict and control future disease outbreaks associated with this pathogen. / Dissertation (MSc)--University of Pretoria, 2012. / Microbiology and Plant Pathology / unrestricted
10

Expanding the Knowledgebase of Earth’s Microbiome Using Culture Dependent and Independent Methods

Murphy, Trevor 01 June 2021 (has links)
Microorganisms exist ubiquitously on Earth, yet their functions and ecological roles remain elusive. Investigating these microbes is accomplished by using culture-dependent and culture-independent methodologies. This study employs both methodologies to characterize: 1) the genomic potential of the novel deep-subsurface bacterial isolate Thermanaerosceptrum fracticalcis strain DRI-13T by combining next-generation and nanopore sequencing technologies and 2) the microbiome of the artificial marine environment for the Hawaiian Bobtail Squid in aquaculture using next-generation sequencing of 16S rRNA gene. Microbial ecology of the deep-subsurface remains understudied in terms of microbial diversity and function. The genomic information of DRI-13T revealed a potential for syntrophic relationships, diverse metabolic potential including prophages/antiviral defenses, and novel methylation motifs. Artificial marine environments housing marine the Hawaiian Bobtail Squid (Euprymna scolopes) contain microorganisms that can directly influence animal and aquaculture health. No studies presently show if bacterial communities of the tank environment correlate with the health and productivity of E. scolopes. This study sought to address this by sampling from a year of unproductive aquaculture yield and comparing the bacterial communities from productive cohorts. Bacterial communities from unproductive samples show less bacterial diversity and abundance coupled with shifts in bacterial composition. Nitrate and pH levels between the tanks were found to be strong influences on determining the bacterial populations of productive and unproductive cohorts.

Page generated in 0.0642 seconds