Mutations cause genetic variations among cells within an individual as well as variations between individuals within a species. It is the fuel for evolution and contributes to most human diseases. Despite its importance, it still remains elusive how mutagenesis and repair shape the mutation pattern in the human genome and how to interpret the impact of a mutation with respect to its ability to cause disease (referred to as pathogenicity). The availability of large-scale genomic data provides us an opportunity to use machine learning methods to answer these questions.
This thesis is composed of two parts. In the first part, a single statistical model is applied to both mutations in germline and soma to compare the determinant factors that influence local mutation. Notably, our model revealed that one determinant, expression level, has an opposite effect on mutation rate in the two types of tissues. More specifically, somatic mutation rates decrease with expression levels and, in sharp contrast, germline mutation rates increase with expression levels, indicating that the DNA damage or repair processes during transcription differ between them. In the second part, we developed a new neural-network-based machine learning method to predict the pathogenicity of missense variants. Besides predictors commonly used in previous methods, we included additional predictors at the variant-level such as the probability of being in protein-protein interaction interface and gene-level such as dosage sensitivity and protein complex formation probability. To benchmark real-world performance, we compiled somatic mutation data in cancer and germline de novo mutation data in developmental disorders. Our model achieved better performance in prioritizing pathogenic missense variants than previously published methods.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D8Z625XQ |
Date | January 2018 |
Creators | Chen, Chen |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0022 seconds