The genetic structure of an intra-host viral population has an effect on many clinically important phenotypic traits such as escape from vaccine induced immunity, virulence, and response to antiviral therapies. Next-generation sequencing provides read-coverage sufficient for genomic reconstruction of a heterogeneous, yet highly similar, viral population; and more specifically, for the detection of rare variants. Admittedly, while depth is less of an issue for modern sequencers, the short length of generated reads complicates viral population assembly. This task is worsened by the presence of both random and systematic sequencing errors in huge amounts of data. In this dissertation I present completed work for reconstructing a viral population given next-generation sequencing data. Several algorithms are described for solving this problem under the error-free amplicon (or sliding-window) model. In order for these methods to handle actual real-world data, an error-correction method is proposed. A formal derivation of its likelihood model along with optimization steps for an EM algorithm are presented. Although these methods perform well, they cannot take into account paired-end sequencing data. In order to address this, a new method is detailed that works under the error-free paired-end case along with maximum a-posteriori estimation of the model parameters.
Identifer | oai:union.ndltd.org:GEORGIA/oai:scholarworks.gsu.edu:cs_diss-1086 |
Date | 12 August 2014 |
Creators | Mancuso, Nicholas |
Publisher | ScholarWorks @ Georgia State University |
Source Sets | Georgia State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Computer Science Dissertations |
Page generated in 0.002 seconds