<p> A mathematical model is an abstraction that distills quantifiable behaviors and properties into a well-defined formalism in order to learn or predict something about a system. Such models may be as light as pencil-and-paper calculations on the back of an envelope or as heavy as to entail modern super computers. They may be as simple as predicting the trajectory of a baseball or as complex as forecasting the weather. By using macromolecular protein structures as substrates, the objective of this thesis is to improve upon and leverage mathematical models in order to address what is both a growing challenge and a burgeoning opportunity in the age of next-generation sequencing. The rapidly growing volume of data being produced by emerging deep sequencing technologies is enabling more in-depth analyses of protein conservation than previously possible. Increasingly, deep sequencing is bringing to light many disease-associated loci and localized signatures of strong conservation. These signatures in sequence space are the "shadows" of selective pressures that have been acting on proteins over the course of many years. However, despite the rapidly growing abundance of available data on such signatures, as well as the finer resolution with which they may be detected, an intuitive biophysical or functional rationale behind such genomic shadows is often missing (such intuition may otherwise be provided, for instance, by the need to engage in protein-protein interactions, undergo post-translational modification, or achieve a close-packed hydrophobic core). Allostery may frequently provide the missing conceptual link. Allosteric mechanisms act through changes in the dynamic behavior of protein architectures. Because selective evolutionary pressures often act through processes that are intrinsically dynamic in nature, static renderings can fail to provide any plausible rationale for constraint. In the work outlined here, models of protein conformational change are used to predict allosteric residues that either <i>a)</i> act as essential cavities on the protein surface which serve as sources or sinks in allosteric communication; or <i>b)</i> function as important information flow bottlenecks within the allosteric communication pathways of the protein interior. Though most existing approaches entail computationally expensive methods (such as MD) or rely on less direct measures (such as sequence features), the framework discussed herein is simultaneously both computationally tractable and fundamentally structural in nature – conformational change and topology are directly included in the search for allosteric residues – thereby enabling allosteric site prediction across the Protein Data Bank. Large-scale (i.e., general) properties of the predicted allosteric residues are then evaluated with respect to conservation. Multiple threads of evidence (using different sources of data and employing a variety of metrics) are used to demonstrate that the predicted allosteric residues tend to be significantly conserved across diverse evolutionary time scales. In addition, specific examples in which these residues can help to explain previously poorly understood disease-associated variants are discussed. Finally, a practical and computationally rapid software tool that enables users to perform this analysis on their own proteins of interest has been made available to the scientific public.</p>
Identifer | oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:10160849 |
Date | 16 September 2016 |
Creators | Clarke, Declan |
Publisher | Yale University |
Source Sets | ProQuest.com |
Language | English |
Detected Language | English |
Type | thesis |
Page generated in 0.0049 seconds