In this thesis, two algorithms that detect cross-contamination in tumor samples sequenced with next-generation sequencing (NGS) were evaluated. In genomic medicine, NGS is commonly used to sequence tumor DNA to detect disease-associated genetic variants and determine the most suitable treatment option. Targeted NGS panels are often employed to screen for genetic variations in a selection of specific tumor-associated genes. NGS handles samples from multiple patients in parallel, which poses a risk of cross-contamination between samples. Contamination is a significant issue in the interpretation of NGS results, as it can lead to the incorrect identification of genetic variants and, consequently, incorrect treatment. Therefore, contamination detection is a crucial quality control steps in the analysis of NGS data. Numerous algorithms for detection of cross-contamination have been developed, but many of these algorithms are not suited for small, targeted NGS panels, and several are not developed for tumor data. In this thesis, GATK's CalculateContamination and a self-created algorithm called ContaCheck were evaluated on simulated tumor NGS data. NGS samples were generated in silico with a Python script called BAMSynth and mixed to simulate cross-contamination with contamination rates between 1% and 50%. ContaCheck accurately detected contaminations ranging from 3% to 50% and identified the correct contaminant with an accuracy of 94%. CalculateContamination, on the other hand, detected contaminations ranging from 1% to 15% relatively accurately, but consistently failed to detect high level contaminations. The study showed that ContaCheck outperformed CalculateContamination on simulated NGS data, but to determine which algorithm is the best on real data and determine ContaCheck's applicability in a clinical setting, the algorithms need to be further evaluated on real tumor NGS samples.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-204445 |
Date | January 2024 |
Creators | Persson, Sofie |
Publisher | Linköpings universitet, Bioinformatik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0031 seconds