Return to search

Structure Determination of Proteins of Unknown Origin by a Marathon MR Protocol and Investigations on Parameters Important for Molecular Replacement Structure Solution

Occasionally, crystallisation of proteins works in mysterious ways! One might obtain crystals of a protein of unknown identity in place of the protein for which crystallisation experiments were performed. If the investigator is not aware of such possibilities, valuable time and resources might be lost in attempting to determine the structure of such proteins. Instances of non-target protein getting crystallised may not come to light at all or may be realised only when attempts to determine the structure completely fail by conventional procedures after collecting and processing the diffraction data. Usually, it is not possible to reproduce the crystals of the same protein as their occurrence is serendipitous. Such rare instances of crystallisation are probably caused by fluctuating environmental or crystallisation conditions and are not reproducible. It could also be due to contaminating microbes, which is more likely when the experimentalist is not well experienced. Therefore, experimental phasing of the data collected on serendipitously obtained crystals could be a challenging task.
With the rapid increase in the number of structures deposited in the protein data bank (PDB), molecular replacement has become the method of choice for structure determination in macromolecular X-ray crystallography. This is due to the fact that it is possible to select a suitable phasing model for most target proteins based on their sequence information. However, if the identity of the target protein itself is uncertain, all attempts of structure determination using phasing models selected on the basis of target protein sequence-dependent search would fail. Sequence-independent ab initio phasing techniques such as ARCIMBOLDO (Meindl et al., 2012), which has recently become available, could provide leads only if the non-target protein is an all-α-protein and the associated diffraction data extends to a resolution better than 2 Å. Even then, the success rate with this technique is low. Hence, it becomes important to employ a sequence-independent method of structure determination for such mysteriously obtained crystals. This thesis reports crystal structures of proteins which are serendipitously crystallised using a large-scale application of Molecular Replacement (MR) technique (referred in this thesis as MarathonMR). This thesis also presents an evaluation of molecular replacement strategies for structure determination.
The thesis begins with an overview of crystallographic methods of structure determination with an emphasis on the method of molecular replacement (Chapter 1). The most prominent of the results obtained in the course of these investigations pertains to a crystal obtained during routine crystallisation of a viral protein mutant in the year 2011. The cell parameters were different from cell constants of crystals obtained with other known viral protein mutants crystallised earlier in the same laboratory. Unfortunately, this crystal could not be reproduced in the same form in subsequent crystallisation trials. All attempts to determine the structure through conventional molecular replacement techniques using a combination of domains from a nearly identical virus coat protein protomer as the phasing model had failed. The data was shelved as “not-solvable” in late 2011. However, the crystal had diffracted to 1.9 Å and had excellent merging statistics. Therefore, the data was retrieved recently and additional attempts were made to determine the structure through phasing techniques that have become available recently. Techniques such as AMPLE (Bibby et al., 2013) and Rosetta (DiMaio, 2013), which use large-scale homology models coupled with molecular replacement, did not lead to meaningful solutions. A couple of helices identified by ARCIMBOLDO (Meindl et al., 2012) were neither correct (retrospectively) nor sufficient to determine the entire structure. Given the excellent merging statistics of the crystal data, there was significant motivation to determine the structure, though it meant developing a fresh protocol. It was at this time that we came across the work of Stokes-Rees and Sliz (2010) in which they had demonstrated that it is possible to determine structure of proteins of unknown identity by employing almost every known protein structure as a potential phasing model.
The work reported in the thesis is a result of an earlier project to examine the relationship between properties of phasing models and the quality of target protein model generated through MR by employing large scale molecular replacement runs. This project was initiated because of the realisation that the recent explosion in crystallographic structural studies has resulted in near complete exploration of the “fold-space” of proteins and PDB now has a representative structure for most plausible folds of proteins. Some folds are highly represented in the PDB. Hence, it is likely that there would be at least one homologue in the PDB which could be used as a phasing model to successfully determine the structure of a protein of unknown identity if the diffraction dataset is of excellent quality. Hence, the single dataset which had diffracted to 1.9 Å resolution was used to
develop a MarathonMR procedure for structure determination. MarathonMR procedure takes sequence-independent approach to structure determination and employs large-scale molecular replacement calculations to identify the closest homologue (in structural terms initially). This protocol is described in Chapter 2 (Materials and methods) of the thesis. Through MarathonMR, structure of the dataset which had remained unsolved for 5 years was finally determined. Nearly complete sequence of the polypeptide could be deduced by inspecting the electron density map due to the high resolution and quality of the map. The protein was found to be a phosphate binding protein from a soil bacterium Stenotrophomonas maltophilia (SmPBP). The way in which the structure was determined and possible explanations for the mysterious source of this protein which had crystallised instead of the target protein is discussed in Chapter 3. Though MarathonMR procedure was developed to solve a single dataset, it was soon realised that the same procedure could be applied to other similar datasets, all of which had diffracted to reasonable resolutions with good merging statistics but had remained unsolved for unknown reasons. Among such datasets, one of the datasets which was collected in 2007 and had diffracted to 2.3 Å resolution had cell parameters very close to that of SmPBP. Hence, a poly-alanine model of the structure of SmPBP, which was determined by then, was used as the phasing model to run molecular replacement and the structure was readily solved. It was surprising to note that SmPBP had crystallised serendipitously not once but twice, once in 2011 resulting in crystals that diffracted to 1.9 Å resolution and earlier in 2007 in crystals that diffracted to 2.3 Å resolution independently by two different investigators in the same laboratory. Both the structures are nearly identical and a comparison of these structures is presented in Chapter 4. Structure of SmPBP determined at 2.3 Å resolution by MarathonMR also corresponds to the dataset that had remained unsolved for the longest period of time (9 years). This success of structure determination after the lapse of such a long period emphasises the importance of carefully preserving X-ray diffraction data irrespective of its immediate outcome.
In Chapter 5 of the thesis, another instance of non-target protein crystallisation, the structure of which was determined using the MarathonMR procedure is described. The crystal was obtained while carrying out crystallisation of mutants of a survival protein (SurE) expressed in Salmonella typhimurium when the bacterium is subjected to environmental or internal stresses. The original investigator had used the structure of SurE as the phasing model to determine structure of the mutant crystals and obtained a model with R and Rfree of 35% and 40%, respectively. However, the model did not refine further to lower R-factors suggesting that the solution obtained may not be correct. MarathonMR indicated that the fold of the crystallised protein could be similar to that of glycerol dehydrogenase. As SurE shares some fold similarity with one of the domains of GlyDH, the original investigator might have been able to achieve a limited success with R/Rfree factors of 35% and 40%, respectively. As the merging statistics for this diffraction data set was poor, the diffraction images were reprocessed in XDS program on Xia2 automated spot processing pipeline. The data statistics indicated merohedral twinning (14%). However, using appropriate parameters, it was possible to refine the structure obtained by MarathonMR to acceptable R/Rfree using the Refmac program. Four protomers were present in the crystal asymmetric unit (ASU). Non-crytsallographic symmetry averaging of electron density over these four molecules further improved the electron density. As the data was limited to 2.7 Å resolution, it was not possible to deduce the identity of every residue of the protein unambiguously based solely on the resulting electron density map. With the identity of the amino acids that could be deduced with certainty, it was clear that the protein belongs to glycerol dehydrogenase from a species of Enterobacteriacea family. Though a similar structure of glycerol dehydrogenase has been reported from Serratia, there are clear differences in many unambiguously determined residues which suggest that the protein is not from Serriatia. The protein has been named EnteroGlyDH as the source of the protein is likely to be from a species of Enterobacteriacea family. The structure of the protein, its biochemical implications and possible reasons for the serendipitous crystallisation of a non-target are discussed.
Chapter 6 discusses the structure determination of an inorganic pyrophosphatase and catalytic domain of Succinyl transferase, the crystals of which had diffracted to 2.3 Å and 3.1 Å, respectively, but had remained unsolved. Neither of the datasets corresponds to the intended target proteins. The dataset corresponding to the protein whose structure was determined as that of an inorganic pyrophosphatase was provided by a colleague from a different laboratory in the Indian Institute of Science. It is interesting to note that the investigator had carried this dataset to one of the CCP4 workshops and had tried to determine the structure with the help of experts in the workshop. The attempts to determine its structure had however failed for reasons that are obvious now. The original investigator was unfortunately making efforts with an erroneous assumption on the identity of the target protein. As these enzymes are well studied, their structures and functions are briefly discussed.
It is already well established that molecular replacement is being used with increasing frequency as the phasing technique when compared to other experimental phasing techniques. With the ever growing number of structures in the PDB, high population of certain folds and a near-plateau attained in the identification and growth of new folds, it is reasonable to expect that molecular replacement will be used even more frequently in the years to come. Therefore, for carrying out molecular replacement for a given diffraction dataset of a target protein, it is very likely that several homologous structures would be available in the PDB that could be used as potential phasing models. Hence, it becomes important to understand the influence of phasing model on the quality and accuracy of model generated through MR to achieve the best structure solution. To understand this relationship between phasing model and model obtained by MR protocol, re-determination of already known structures deposited in the PDB starting with their respective structure factors and various phasing models was initiated. Structures belonging to TIM beta/alpha-barrel (SCOPe ID: c.1) and Lysozyme-like (SCOPe ID: d.2) folds were chosen as targets. The structure of each target was re-determined serially starting with poly-alanine models of all available unique homologues as phasing models. Due to the multi-dimensional nature of this study, the results obtained were represented in a graphical form with nodes and edges. Detailed methodology of the work carried out and the data representation model are discussed in the Chapter 2 (Materials and methods). It was found that after a certain sequence identity cut-off, sequence identity between phasing model and target seems to have little influence on the quality and accuracy of the model generated through MR. Instead, other qualities of the phasing model such as Rfree and RSCC influence the quality of MR models. These results are discussed in Chapter 7. Learning from the work reported in this thesis are discussed in concluding chapter. The possible logical and programmatic upgrades to MarathonMR protocol and future path in which the relationship between phasing models and models generated through MR can be studied are discussed in Chapter 8 (Conclusion and future prospects).

Identiferoai:union.ndltd.org:IISc/oai:etd.ncsi.iisc.ernet.in:2005/3148
Date January 2016
CreatorsHatti, Kaushik S
ContributorsSrinivasan, N, Murthy, M R N
Source SetsIndia Institute of Science
Languageen_US
Detected LanguageEnglish
TypeThesis
RelationG27854

Page generated in 0.0022 seconds