The ability of Next-Generation Sequencing (NGS) to produce massive quantities of genomic data inexpensively has allowed to study the structure of viral populations from an infected host at an unprecedented resolution. As a result of a high rate of mutation and recombination events, an RNA virus exists as a heterogeneous "swarm". Virologists and computational epidemiologists are widely using NGS data to study viral populations. However, discerning rare variants is muddled by the presence of errors introduced by the sequencing technology. We develop and implement time- and cost-efficient strategy for NGS of multiple viral samples, and computational methods to analyze large quantities of NGS data and to handle sequencing errors. In particular, we present: (i) combinatorial pooling strategy for massive NGS of viral samples; (ii) kGEM and 2SNV — methods for viral population haplotyping; (iii) ShotMCF — a Multicommodity Flow (MCF) based method for frequency estimation of viral haplotypes; (iv) QUASIM — an agent-based simulator of viral evolution taking in account viral variants and immune response.
Identifer | oai:union.ndltd.org:GEORGIA/oai:scholarworks.gsu.edu:cs_diss-1131 |
Date | 08 August 2017 |
Creators | Artyomenko, Alexander |
Publisher | ScholarWorks @ Georgia State University |
Source Sets | Georgia State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Computer Science Dissertations |
Page generated in 0.002 seconds