Return to search

Development of a data processing toolkit for the analysis of next-generation sequencing data generated using the primer ID approach

Philosophiae Doctor - PhD / Sequencing an HIV quasispecies with next generation sequencing technologies yields a dataset with
significant amplification bias and errors resulting from both the PCR and sequencing steps. Both the
amplification bias and sequencing error can be reduced by labelling each cDNA (generated during the
reverse transcription of the viral RNA to DNA prior to PCR) with a random sequence tag called a Primer
ID (PID). Processing PID data requires additional computational steps, presenting a barrier to the
uptake of this method. MotifBinner is an R package designed to handle PID data with a focus on
resolving potential problems in the dataset.
MotifBinner groups sequences into bins by their PID tags, identifies and removes false unique bins,
produced from sequencing errors in the PID tags, as well as removing outlier sequences from within a
bin. MotifBinner produces a consensus sequence for each bin, as well as a detailed report for the
dataset, detailing the number of sequences per bin, the number of outlying sequences per bin, rates
of chimerism, the number of degenerate letters in the final consensus sequences and the most
divergent consensus sequences (potential contaminants).
We characterized the ability of the PID approach to reduce the effect of sequencing error, to detect
minority variants in viral quasispecies and to reduce the rates of PCR induced recombination. We
produced reference samples with known variants at known frequencies to study the effectiveness of
increasing PCR elongation time, decreasing the number of PCR cycles, and sample partitioning, by
means of dPCR (droplet PCR), on PCR induced recombination. After sequencing these artificial samples
with the PID approach, each consensus sequence was compared to the known variants. There are
complex relationships between the sample preparation protocol and the characteristics of the
resulting dataset. We produce a set of recommendations that can be used to inform sample
preparation that is the most useful the particular study.
The AMP trial infuses HIV-negative patients with the VRC01 antibody and monitors for HIV infections.
Accurately timing the infection event and reconstructing the founder viruses of these infections are
critical for relating infection risk to antibody titer and homology between the founder virus and
antibody binding sites. Dr. Paul Edlefsen at the Fred Hutch Cancer Research Institute developed a
pipeline that performs infection timing and founder reconstruction. Here, we document a portion of
the pipeline, produce detailed tests for that portion of the pipeline and investigate the robustness of
some of the tools used in the pipeline to violations of their assumptions.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:uwc/oai:etd.uwc.ac.za:11394/6736
Date January 2018
CreatorsLabuschagne, Jan Phillipus Lourens
ContributorsTravers, Simon
PublisherUniversity of the Western Cape
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
RightsUniversity of the Western Cape

Page generated in 0.0026 seconds