Return to search

Comparing Two Algorithms for the Detection of Cross-Contamination in Simulated Tumor Next-Generation Sequencing Data

In this thesis, two algorithms that detect cross-contamination in tumor samples sequenced with next-generation sequencing (NGS) were evaluated. In genomic medicine, NGS is commonly used to sequence tumor DNA to detect disease-associated genetic variants and determine the most suitable treatment option. Targeted NGS panels are often employed to screen for genetic variations in a selection of specific tumor-associated genes. NGS handles samples from multiple patients in parallel, which poses a risk of cross-contamination between samples. Contamination is a significant issue in the interpretation of NGS results, as it can lead to the incorrect identification of genetic variants and, consequently, incorrect treatment. Therefore, contamination detection is a crucial quality control steps in the analysis of NGS data. Numerous algorithms for detection of cross-contamination have been developed, but many of these algorithms are not suited for small, targeted NGS panels, and several are not developed for tumor data. In this thesis, GATK's CalculateContamination and a self-created algorithm called ContaCheck were evaluated on simulated tumor NGS data. NGS samples were generated in silico with a Python script called BAMSynth and mixed to simulate cross-contamination with contamination rates between 1% and 50%. ContaCheck accurately detected contaminations ranging from 3% to 50% and identified the correct contaminant with an accuracy of 94%. CalculateContamination, on the other hand, detected contaminations ranging from 1% to 15% relatively accurately, but consistently failed to detect high level contaminations. The study showed that ContaCheck outperformed CalculateContamination on simulated NGS data, but to determine which algorithm is the best on real data and determine ContaCheck's applicability in a clinical setting, the algorithms need to be further evaluated on real tumor NGS samples.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-204445
Date January 2024
CreatorsPersson, Sofie
PublisherLinköpings universitet, Bioinformatik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.1205 seconds