• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Inverted Sequence Identification in Diploid Genomic Scaffold Assembly via Weighted MAX-CUT Reduction

Bodily, Paul Mark 25 June 2013 (has links) (PDF)
Virtually all genome assemblers to date are designed for use with data from haploid or homozygous diploid genomes. Their use on heterozygous genomic datasets generally results in highly-fragmented, error-prone assemblies, owing to the violation of assumptions during both the contigging and scaffolding phases. Of the two phases, scaffolding is more particularly impacted and algorithms to facilitate the scaffolding of heterozygous data are lacking. We present a stand-alone scaffolding algorithm, ScaffoldScaffolder, designed specifically for scaffolding diploid genomes. A fundamental step in the scaffolding phase is the assignment of sequence orientations to contigs within scaffolds. Deciding such an assignment in the presence of ambiguous evidence is what is termed the contig orientation problem. We define this problem using bidirected graph theory and show that it is equivalent to the weighted MAX-CUT problem. We present a greedy heuristic solution which we comparatively assess with other solutions to the contig orientation problem, including an advanced MAX-CUT heuristic. We illustrate how a solution to this problem provides a simple means of simultaneously identifying inverted haplotypes, which are uniquely found in diploid genomes and which have been shown to be involved in the genetic mechanisms of several diseases. Ultimately our findings show that due to the inherent biases in the underlying biological model, a greedy heuristic algorithm performs very well in practice, retaining a higher total percent of edge weight than a branch-and-bound semidefinite programming heuristic. This application exemplifies how existing graph theory algorithms can be applied in the development of new algorithms for more accurate assembly of heterozygous diploid genomes.

Page generated in 0.0499 seconds