The study of variation found in DNA is fundamental in human genetic studies. Single nucleotide polymorphisms (SNPs) are simple to document because they can be captured in single DNA sequence reads. Larger structural variation including duplications, insertions, deletions, termed as copy number variation (CNV), inversions and translocations are more challenging to discover. Recent studies using microarray and sequencing technologies have demonstrated the prevalence of structural variation in humans. They can disrupt genic and regulatory sequences, be associated with disease, and fuel evolution. Therefore, it is important to identify and characterize both SNPs and structural variants to fully understand their impact.
This thesis presents the analysis of structural variation in the human genome. The primary DNA sample used for my experiments is the DNA of J. Craig Venter, also termed HuRef. It was the first personal human genome sequenced. I combined computational re-analysis of sequence data with microarray-based analysis, and detected 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing study. The results indicated that the genomes of two individuals differed 1.3% by CNV, 0.3% by inversion and 0.1% by SNP. Structural variation discovery is dependent on the strategy used. No single approach can readily capture all types of variation, and a combination of strategies is required.
I analyzed the formation mechanisms of all HuRef structural variants. The results showed that the relative proportion of mutational processes changed across size range: the majority of small variants (<1kb) were associated with nonhomologous processes and microsatellite events; median size variants (<10kb) were commonly related to minisatellites and retrotransposons; and large variants were associated with nonallelic homologous recombination.
Eight new breakpoint-resolved HuRef inversions were genotyped in populations to elucidate these understudied variants. I discovered that the structures of inversion could be complex, could create conjoined genes, and their frequencies could exhibit population differentiation.
The data here contributes to our understanding of structural variation in humans. It shows the need to use multiple strategies to identify variants, and it emphasizes the importance to examine the full complement of variation in all biomedical studies.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/35919 |
Date | 09 August 2013 |
Creators | Pang, Wing Chun Andy |
Contributors | Scherer, Stephen W. |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis, Dataset |
Page generated in 0.0024 seconds