Return to search

Pathogenicity and selective constraint in the non-coding genome

Gene regulation plays a central role in evolution, organismal development, and disease. Despite the critical importance of gene regulation throughout development, there have been few genetic variants in regulatory elements with large effects that have been robustly associated to disease. In this work, my overarching aim was to gain a better understanding of the contribution of genetic variation in regulatory elements to Mendelian disorders and attempted to approach this problem from three different perspectives. I first sought to assess the contribution of regulatory variation to severe developmental disorders using sequence data from 8,000 affected individuals and their parents and to identify individual elements with a high probability of harbouring pathogenic regulatory elements. Next, I used population genetic models and data from more than 28,000 whole genome sequenced individuals to examine the forces of selection operating on non-coding elements genome-wide. Finally, I conducted a pilot experiment to assay >50,000 different non-coding variants across more than 700 different non-coding elements, including variants observed in patients with developmental disorders in a massively parallel reporter assay (MPRA) and collaborated on an assessment of the impact of patient mutations in eleven different enhancers using mouse transgenesis assays. A few key results from the work are summarised below: - I provide evidence that de novo SNVs in non-coding elements contribute to severe developmental disorders, and estimate that they contribute in 1-3% of cases not harbouring a likely diagnostic coding variant. - These de novo SNVs reside primarily in highly evolutionarily conserved regulatory elements and I estimate that a large fraction of conserved non-coding elements (50-70%) are acting as enhancers and a smaller subset (10-15%) have a function related to alternative splicing. - Statistical modelling of the distribution of variants in developmental disorder patients suggests that a small fraction of bases (maximum likelihood estimate of 3%) within a disease-associated non-coding element are likely pathogenic with high penetrance when mutated. - I develop a new genome-wide mutation rate model that accounts for a variety of germline features including recombination rate, replication timing, sequence context, and histone marks which greatly outperforms models based on sequence-context alone. - I find evidence for widespread purifying selection in the non-coding genome that is correlated with nucleotide-level evolutionary conservation, even when the conserved nucleotides lie within otherwise poorly conserved sequence. - I show that the selective constraint on small insertions and deletions is likely greater than the selective constraint on SNVs. - I present data from a pilot experiment assessing more than 50,000 different non-coding variants in a massively parallel reporter assay conducted in both HeLa and Neuroblastoma cells.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:767732
Date January 2019
CreatorsShort, Patrick
ContributorsHurles, Matthew
PublisherUniversity of Cambridge
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://www.repository.cam.ac.uk/handle/1810/289705

Page generated in 0.0015 seconds