This thesis details the development of two computational tools, Telomerecat and Parabam, as well as their applications to whole genome sequencing (WGS) data. Telomerecat is a tool for estimating telomere length from WGS data. The strength of Telomerecat lies in its applicability. This applicability is due to a number of advantages over previous attempts to estimate telomere length from WGS. Chief amongst these advantages is that it makes no assumption about the underlying chromosome count or size of the genome within input samples. This means that Telomerecat lends itself well to analysing cancer samples where such assumptions are unfounded. This also means it is applicable to non-human samples, a first for tools of its kind. Furthermore, a novel method for filtering reads derived from interstitial telomere sequences means that it does not rely on previously applied analyses, a source of bias. The other tool described in this thesis is Parabam. Parabam is the first tool of its kind to allow users to apply a function to all of the reads in sequence alignment files, in parallel. Furthermore, Parabam includes a novel method for iterating over index sorted sequence files as if they were name sorted. We provide evidence that Parabam is a quicker way to create complex subsets and statistics from sequence alignment files. In the latter half of the thesis we detail two applications of Telomerecat to large scale WGS projects. The first application, to the Prostate ICGC UK cohort, unveils hitherto uncovered associations between telomere length and previously identified molecular subtypes as well as cancer stage. In the second application, to the NIHR BioResource - Rare Disease cohort, we discover a previously unidentified variant in DKC1 that we propose is directly linked to short telomeres and an immunodeficient phenotype.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:744843 |
Date | January 2018 |
Creators | Farmery, James Henry Royston |
Contributors | Lynch, Andy Graeme |
Publisher | University of Cambridge |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | https://www.repository.cam.ac.uk/handle/1810/275827 |
Page generated in 0.0022 seconds