Genetic sequences are being collected at an ever increasing rate due to rapid cost reductions; however, experimental approaches to determine the structure and function of the protein(s) each gene codes are not keeping pace. Therefore, computational methods to augment experimental structures with comparative (i.e. homology) models using physics-based methods for building residues, loops and domains are needed to thread new sequences onto homologous structures. In addition, even experimental structure determination relies on analogous first principles structure refinement and prediction algorithms to place structural elements that are not defined by the data alone.
Computational methods developed to find the global free energy minimum of an amino acid sequence (i.e. the protein folding problem) are increasingly successful, but limitations in accuracy and efficiency remain. Optimization efforts have focused on subsets of systems and environments by utilizing potential energy functions ranging from fixed charged force fields (Fiser, Do, & Sali, 2000; Jacobson et al., 2004), statistical or knowledge based potentials (Das & Baker, 2008) and/or potentials incorporating experimental data (Brunger, 2007; Trabuco, Villa, Mitra, Frank, & Schulten, 2008).
Although these methods are widely used, limitations include 1) a target function global minimum that does not correspond to the actual free energy minimum and/or 2) search protocols that are inefficient or not deterministic due to rough energy landscapes characterized by large energy barriers between multiple minima.
Our Global Optimization Using Metadynamics and a Polarizable Force Field (GONDOLA) approach tackles the first limitation by incorporating experimental data (i.e. from X-ray crystallography, CryoEM or NMR experiments) into a hybrid target function that also includes information from a polarizable molecular mechanics force field (Lopes, Roux, & MacKerell, 2009; Ponder & Case, 2003). The second limitation is overcome by driving the sampling of conformational space by adding a time-dependent bias to the objective function, which pushes the search toward unexplored regions (Alessandro Barducci, Bonomi, & Parrinello, 2011; Zheng, Chen, & Yang, 2008).
The GONDOLA approach incorporates additional efficiency constructs for search space exploration that include Monte Carlo moves and fine grained minimization. Furthermore, the dimensionality of the search is reduced by fixing atomic coordinates of known structural regions while atoms of interest explore new coordinate positions. The overall approach can be used for optimization of side-chains (i.e. set side-chain atoms active while constraining backbone atoms), residues (i.e. side-chain atoms and backbone atoms active), ligand binding pose (i.e. set atoms along binding interface active), protein loops (i.e. set atoms connecting two terminating residues active) or even entire protein domains or complexes. Here we focus on using the GONDOLA general free energy driven optimization strategy to elucidate the structural details of missing protein loops, which are often missing from experimental structures due to conformational heterogeneity and/or limitations in the resolution of the data.
We first show that the correlation between experimental data and AMOEBA (i.e. a polarizable force field) structural minima is stronger than that for OPLS-AA (i.e. a fixed charge force field). This suggests that the higher order multipoles and polarization of the AMOEBA force field more accurately represented the true crystalline environment than the simpler OPLS-AA model. Thus, scoring and optimization of loops with AMOEBA is more accurate than with OPLS-AA, albeit at a slightly increased computational cost.
Next, missing PDZ domain protein loops and protein loops from a loop decoy data set were optimized for 5 ns using the GONDOLA approach (i.e. under the AMOEBA polarizable force field) as well as a commonly used global optimization procedure (i.e. simulated annealing under the OPLS-AA fixed charge force field). The GONDOLA procedure was shown to provide more accurate structures in terms of both experimental metrics (i.e. lower Rfree values) and structural metrics (i.e. using the MolProbity structure validation tool). In terms of Rfree, only one out of seven simulated annealing results was better than the Gondola global optimization. Similarly, one simulated anneal loop had a better MolProbity score, but none of the simulated annealing loops were better in both categories. On average, GONDOLA achieved an Rfree value 19.48 and simulated annealing saw an average Rfree value of 19.63, and the average MolProbity scores were 1.56 for GONDOLA and 1.75 for simulated annealing.
In addition to providing more accurate predictions, GONDOLA was shown to converge much faster than the simulated annealing protocol. Ten separate 5 ns optimizations of the 4 residue loop missing from one of the PDZ domains were conducted. Five were done using GONDOLA and five with the simulated annealing protocol. The fastest four converging results belonged to the GONDOLA approach. Thus, this work demonstrates that GONDOLA is well-suited to refine or predict the coordinates of missing residues and loops because it is both more accurate and converges more rapidly.
Identifer | oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-6387 |
Date | 01 May 2016 |
Creators | Avdic, Armin |
Contributors | Schnieders, Michael J. |
Publisher | University of Iowa |
Source Sets | University of Iowa |
Language | English |
Detected Language | English |
Type | thesis |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | Copyright 2016 Armin Avdic |
Page generated in 0.0025 seconds