Return to search

Evaluation of de novo assembly using PacBio long reads

New sequencing technologies show promise for the construction of complete and accurate genome sequences, by a process called de novo assembly that joins reads by overlap to longer contiguous sequences without the need for a reference genome. High-quality de novo assembly leads to better understanding in genetic variations. The purpose of this thesis is to evaluate human genome sequences obtained from the PacBio sequencing platform, which is a new technology suitable for de novo assembly of large genomes. The evaluation focuses on comparing sequence identity between our own de novo assemblies and the available human reference and through that, benchmark accuracy of our data. Sequences that are absent from the reference genome, are investigated for potential unannotated genes coordinately. We also assess the complex structural variation using different approaches. Our assemblies show high consensus with the human reference genome, with ⇠ 98.6% of the bases in the assemblies mapped to the human reference. We also detect more than ten thousand of structural variants, including some large rearrangements, with respect to the reference.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-302744
Date January 2016
CreatorsChe, Huiwen
PublisherUppsala universitet, Institutionen för biologisk grundutbildning
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds