Spelling suggestions: "subject:"7molecular biophysics"" "subject:"7molecular geophysics""
131 |
Secondary Structures in Proteins : Identification and AnalysesKumar, Prasun January 2016 (has links) (PDF)
Proteins are large biomolecules consisting of one or more long chains of amino acid residues. They perform a vast array of functions within living organisms. In this thesis, we present analyses of different secondary structural elements (SSEs) in proteins and various methods developed for the same purpose. Using only the geometric parameters, a program for identification of SSEs has been developed, which is more sensitive to the local structural variations. An understanding of the factors that determine the length, geometry as well as location of a particular SSE in the protein is essential to fully appreciate their respective roles in protein structures. The comparative analysis of the geometry of α-helices identified by different programs showed that STRIDE assigned α-helices are more kinked. Conformation of Pro residues in α-helices has also been studied in detail. Several interesting conclusions are drawn from the comprehensive study of π-helices and PolyProline-II (PPII) helices. In the subsequent paragraphs, a brief summary of each chapter is provided.
The Introduction (Chapter 1) summarizes the relevant literature, which includes both experimental as well as theoretical studies explaining the structural and functional importance of SSEs in proteins and lays down a suitable background for the subsequent chapters in the thesis. The major questions addressed and the main goals of this thesis are described to set a suitable stage for the detailed discussions. The methodologies involved are discussed in Chapter 2. These include protocol used for preparing non-redundant datasets of protein structures, various statistical methods used to test the significance of position-wise amino acid propensities and different programs used during the course of present investigations.
SSEs play an important role in the folding of proteins. However, identification of these SSEs in proteins is a common yet important concern in structural biology. Chapter 3 details a new method ASSP (Assignment of Secondary Structure in Proteins), which uses only the path traversed by the Cα atoms of the consecutive residues. The algorithm is based on the premise that the protein structure can be divided into continuous or uniform stretches, which can be defined in terms of helical parameters and depending on their values, the stretches can be classified into different SSEs, viz. α, 310, π, extended β-strands and PPII and other left handed
helices. The methodology was validated using an unbiased clustering of these parameters for a protein dataset containing 1008 protein chains, which advocate that there are seven well defined clusters associated with different SSEs. Apart from α-helices and extended β-strands, 310 and π-helices were also found to occur in considerable numbers. Various analyses demonstrated that the ASSP was able to discriminate the non α-helical segments from flanking α-helices, which were often identified as a part of α-helix by other algorithms. The standalone version of the program for the Linux as well as Windows operating systems is freely downloadable and the web server version is also available at http://nucleix.mbu.iisc.ernet.in/assp/index.html.
Among all SSEs in proteins, α-helices are relatively well defined. However, a precise quantitative estimate of their geometrical features and identification of terminal residues is difficult. In Chapter 4, a set of major changes/ updates, implemented in the algorithm of in-house program for analysis of geometry of helices in proteins (HELANAL), has been discussed in detail. It defines the helix parameters based on the path traced by Cα atoms alone and classifies the geometry of the helices into linear, curved, kinked and unassigned type, by fitting the least square 3D line and sphere to the local helix origin points (LHOP). The geometry assigned using HELANAL-Plus is independent of the orientation of the helix in 3D space and also does not depend on the database from which it is taken. The program is made available as a webserver as well as standalone and the helices can be viewed in the JmolApplet along with the best fit helix axis, which makes HELANAL-Plus useful for analysing the inter helix interaction and packing. The utility of the webserver has been increased by incorporating the use of SSE assignment programs like ASSP, DSSP or STRIDE. Pro kinked helices and correlation with the UP and DOWN conformation of Pro were studied in more detail. HELANAL-Plus is available at
http://nucleix.mbu.iisc.ernet.in/helanalplus/index.html. Linux/Unix and windows
compatible executables are also available for download.
The analyses of kinks in a dataset of helices indicated a correlation with the large radius of the cylinder encompassing the residue at which the kink has been observed and many a time ASSP identified that as a π-helix. The detailed analysis of π-helices was limited due to the low frequency of identification by different algorithms. ASSP identified 659 π-helices in 3582 protein chains, solved at resolution ≤ 2.5Å and validated by molprobity. Chapter 5 reports the detailed study of the functional and structural roles of π-helices along with the position-wise amino acid propensity within and around them. These helices were found to range from 5 to 18 residues in length with the average twist and rise being 85.2°±7.2° and 1.28ű0.31Å respectively. The investigation of π-helices illustrated that they occur mostly in conjunction with α-helices. The majority of π-helices, with flanking α-helices at both termini, were found to be conserved across a large number of structures within a protein family and induce local distortions in the neighbouring α-helices. The presence of a π-helical fragment leads to appropriate orientation as well as positioning of the constituent residues and hence facilitate favourable interactions and also help in proper folding of the protein chain. The comprehensive analyses of position-wise amino acid propensity within and around π-helices showed their unique preferences, which are different from those of α-helices. Additionally and most importantly, the study also brought to light the influence of π-helices on the residue preference in preceding or succeeding α-helices and vice-versa.
Study of another important SSE in proteins (Chapter 6), PPII helices, was inspired by their large number of occurrence and initiated with the aim of understanding their structural and functional roles. These helices are defined as an extended, flexible left-handed helix without intra-helical H-bonds and found to occur very frequently. ASSP identifies 3597 PPII helices in 3582 protein chains. Though PPII helices occur on a much smaller scale than α-helices and β-strands, their sheer number is still more than that of π-helices. The analyses of PPII-helices revealed that almost 50% of the total helices do not contain Pro residues and show a preference for polar residues. PPII-helices were found in conjunction with major SSEs and they often connect them. These helices range from 3 to 13 residues in length with the average twist and rise being -121.2°±9.2° and 3.0ű0.1Å respectively. The analysis of various non-bonded interactions revealed the frequent presence of C-H…N and C-H…O non-bonded interactions. The analysis of the amino acid preference within and around PPII-helices showed the avoidance of aromatic residues within the helix, while preference of Gly, Asn and Asp residues in the flanking region. Detailed analyses of various functional and structural roles mediated by PPII-helices have also been carried out.
Identification and analysis of non-bonded interactions within a molecule and with the surrounding molecules are an essential part of structural studies. Given the importance of these interactions, we have developed a new algorithm named MolBridge and Chapter 7 provides the detailed description about it. MolBridge is an easy to use algorithm based purely on geometric criteria that can identify all possible non-bonded interactions, such as hydrogen bond, halogen bond, cation…π, π…π and van der Waals, in small molecules as well as biomolecules. Various features available in the webserver make it more user-friendly and interactive. The Unix/Linux version of the program is freely downloadable and the web server version is available at http://nucleix.mbu.iisc.ernet.in/molbridge/index.php.
The overall conclusion from the current investigation and the possible future directions are presented in Chapter 8. Our findings suggest that the path traversed by Cα atoms is enough for the identification of SSEs. We believe that the various algorithms (ASSP, HELANAL-Plus and MolBridge) developed can provide a better understanding of the finer nuances of protein secondary structures. ASSP can make an important contribution in the better understanding of comparatively less frequent structural motifs and identification of novel SSEs. The most comprehensive study of π-helices gives in-depth insight about it. The analysis of interspersed π-helices gives a comprehensive understanding of the local deformations and variations in the helical segments.
Apart from studies embodied in the thesis, author has been involved in few other studies, which are provided as appendix:
Appendix A describes a program RNAHelix, which can regenerate duplexes from the dinucleotide step and base pair parameters for a given double helical DNA or RNA sequence. It can be used to generate/ regenerate the duplexes with the non-canonical base pairing as well.
|
132 |
(p)ppGpp and Stress Response : Decoding the Key Pathways by Small Molecule Analogues Biophysical Methods and Mass SpectrometrySyal, Kirtimaan January 2015 (has links) (PDF)
Under hostile conditions, bacteria elicit stress response. Such stress response is regulated by a secondary messenger called (p)ppGpp. (p)ppGpp is involved in wide range of functions such as GTP homeostasis, biofilm formation and cell growth. Its regulation and mode of action is not well understood. This work has been initiated with an aim to gain insights into the molecular basis of stress response. (p)ppGpp was discovered on the chromatogram of cell extract from starved E. coli cells. (p)ppGpp is synthesized and hydrolyzed by Rel/SpoT in Gram negative bacteria (such as E. coli), and by bifunctional enzyme called Rel in Gram positive bacteria (such as Mycobacteria).
The obvious question that comes in our mind is how bifunctional Rel enzyme decides on synthesis or hydrolysis in Gram positive bacteria such as Mycobacterium? In our laboratory, it has been shown that N-terminal domain of Rel shows unregulated (p)ppGpp synthesis implying regulatory role of C-terminal domain. Also, concurrent increase in anisotropy of Rel C-terminal domain with the increase in concentration of pppGpp has been observed indicating the binding of pppGpp to the C-terminal domain. We performed Isothermal Calorimetry experiment to confirm that pppGpp binds with C-terminal domain of Rel enzyme. For identification of the binding region, small molecule analogue 8-azido-pppGpp has been synthesized. This analogue is UV-crosslinked with C-terminal domain of Rel and specificity of the interaction has been determined by gel based crosslinking experiments. Crosslinked protein has been subjected to the ingel¬trypsin digestion and analyzed by mass spectrometry. We identified two crosslinked peptides in the mass spectra of trypsin digest in case of the crosslinked protein where identity of the parent peptide is confirmed by MS-MS analysis. Site directed mutagenesis has been carried out based on the conservation of residues in the crosslinked peptides. Isothermal Calorimetry analysis has been done where Rel C-terminal domain mutants are titrated with pppGpp in order to detect any defect in binding due to the mutations. Mutations leading to the reduced binding affinity of pppGpp to Rel C-terminal domain have been introduced in the full length Rel protein and activity assays are carried out so as to evaluate the effects of mutations on synthesis and hydrolysis activity. In mutants, synthesis activity is found to be increased with the concomitant reduction in hydrolysis activity. This indicates the feedback loop where pppGpp binds to Rel C-terminal domain to regulate it own synthesis and hydrolysis.
In E. coli, pppGpp binds to RNA polymerase and modulates the transcription. The region where it binds is controversial. In addition, whether ppGpp and pppGpp have different binding site on RNA polymerase is not known. The latter question becomes important in the light of evidence where differential regulation of transcription by ppGpp and pppGpp have been indicated. We found that ppGpp and pppGpp have an overlapping binding site on RNA polymerase. The 8-azido-ppGpp has been mapped on β and β’ subunits whereas binding site of 8-azido-pppGpp has been located on the β’ subunit. We observed that the 8-azido¬pppGpp labels RNA polymerase more efficiently than ppGpp. pppGpp can compete out ppGpp as illustrated by DRaCALA assay and gel based crosslinking experiment. However, the RNAP from B. subtilis does not bind to (p)ppGpp.
(p)ppGpp is ubiquitous in bacteria but absent in mammals. Thus, blocking (p)ppGpp synthesis would impede the survival of bacteria without having any effect on humans. Recently, Relacin compound has been synthesized by another group in order to inhibit (p)ppGpp synthesis. The limitations of this compound are the requirement of high concentration (5mM) for inhibition and low permeability across the membrane. Taking hints from the latter compound, we acetylated the
nd 2’, 3’ and 5’ position of ribose ring and benzoylated the 2position of guanine moiety in guanosine molecule. We observed significant inhibition of in vitro pppGpp synthesis and biofilm formation. More studies will be conducted in near future to test these compounds for their plausible functions.
In collaboration with Prof. Jayaraman (Organic Chemistry, IISc), many artificial glycolipids are synthesized and tested for biological function. We observed that synthetic glycolipids exhibit a profound effect as inhibitors of the key mycobacterial functions. These analogs impede biofilm formation and can plausibly affect long term survival. Glycolipid analogs can compete with natural glycolipids, thus may help in understanding their functions. Our past and recent studies have showed that the synthetic glycolipids act as inhibitors of mycobacterial growth, sliding motility and biofilm formation. The major lacuna of these glycolipid inhibitors is the requirement of high concentration. Their inhibitions at nanomolar concentrations remain to be achieved. Issues surrounding the thick, waxy mycobacterial cell wall structures will continue to be the focus in manifold approaches to mitigate detrimental effects of mycobacterial pathogens.
In chapter 1, introduction to the research work has been written and role of (p)ppGpp and its functions have been discussed. In chapter 2, novel binding site of pppGpp on Rel C-terminal domain and its regulatory role have been discussed. In chapter 3, differential binding of ppGpp and pppGpp to RNA polymerase has been discussed. In chapter 4, studies on natural and synthetic analogues of pppGpp have been presented. In chapter 5, synthetic glycolipids studies have been described. Chapter 6 summarizes all the chapters.
|
133 |
Modulation of Protein Stability and Function by Cysteine Mutations and Signal PeptidesSharma, Likhesh January 2016 (has links) (PDF)
Chapter 1gives a general introduction to the CXXC motif found in natural proteins. It then reviews the studies where disulphides were engineered in various proteins. The various strategies developed to engineer metal binding activity and redox activity are described. The objectives behind engineering the CXXC motif into a protein, such as imparting it novel metal-binding and redox activities, are discussed next. Alternative strategies which achieve the same objectives are described as well. This chapter then introduces the model proteins used in the course of this thesis: maltose-binding protein (MBP) and E. coli. Thioredoxin (Trx). This chapter also briefly discusses the role of signal peptide in protein export.
Chapter 2describes the experimental studies and their results in which we introduced the widely occurring cysteine motif CXXC into the maltose binding protein (one-at-a-time, in five alpha-helices, at the N-termini) to test three hypotheses: 1) Does a disulphide bond form at the N-terminus? 2) Does the protein acquire any oxido-reductase activity? 3) Does it acquire new metal-binding properties?
The results confirmed: 1) Each cysteine pair forms a stable intrahelical disulphide bond under non-reducing conditions. 2) The five mutant proteins acquire considerable oxidoreductase activity, tested by the insulin aggregation assay. 3) The mutants acquire novel metal-binding properties for Ni2+, Cd2+, and Zn2+ upon reduction. Further, introducing the CXXC motif neither destabilizes the protein nor affects its global structure.
Our results demonstrated that introduction of CXXC motifs can be used to probe alpha-helix start sites and to introduce oxidoreductase and metal binding functionality into proteins.
Chapter 3describes further experimentson a few of the metal ion binding mutants discussed in the previous chapter. We explore the effect and usefulness of reducing agents (DTT and TCEP) on the binding of metal salts to the CXXC mutants. We also studied the explore of metal salts on the thermal stability of the mutants and show that metal ions bind to the CXXC motif even when the protein is in the unfolded state. The chapter describes the use of an immobilized metal affinity chromatography (IMAC) based method for the purification of MBP mutants.Yields ranging from 60-85% were obtained for thethree MBP mutants. The cysteines were located at different positions in thesethree MBP mutants (MBP 42-45 Cys, MBP 128-131 Cys, and MBP 359-359 Cys mutants). The yields for wild-type MBP, a single cysteine mutant (MBP S211C), a double cysteine mutant (MBP 230, 30) were all below 15%. Chapter 3 also reports a new crystal structure of the MBP356-359 mutant in ligand bound form:it crystallizes as an intermolecular dimer, bonded by two disulfides formed by the cysteines of the CXXC motif.
Chapter 4describes the effects of inserting signal peptide sequences on protein folding and expression. We fused the malE and pelB signal sequences at the N-terminus of the model protein thioredoxin and observed that the wild-type and pelB fusion constructs are soluble when expressed, but the malE construct was targeted to inclusion bodies. Nonetheless, it could be refolded in vitro to yield a monomeric product with a secondary structure identical to the wild-type thioredoxin. This chapter also details the thermodynamic stability, aggregation propensity and activity of the purified recombinant proteins in comparison with the wild-type thioredoxin. The presence of the signal sequences reduces the thermodynamic stability and activity of the recombinants and increases their aggregation propensity, with malE having much larger effects than pelB. These studies show that besides acting as address labels, different signal sequences affect protein stability and aggregation differently.
Chapter 5describes three different strategies to label a protein at different sites with cysteine-specific fluorophores using MBP as the model. The first strategy exploits the differential accessibility of residues within MBP in its maltose-bound and maltose-free states. The second strategy involves insertion of a 14-amino-acid loop called V3 from the HIV gp120 protein into MBP; anti-V3 antibodies shield the cysteine residue present inside the inserted loop, while we label another cysteine present outside the loop. In the third strategy, we introduce a third cysteine residue onto the background of the MBP mutant already containing a disulphide bridge at the N-terminus of one of its helices (discussed in Chapter 2). We label the third, free cysteine while the cysteines involved in the disulphide bridge remain protected. We observed successful differential labelling using the first strategy and also observed FRET between the fluorophore labels. Similarly, after trying the second strategy we could individually label all the mutants except one. The third strategy based on the triple-cysteine mutant was not successful because the fluorophore we chose (DBM) did not show site specificity and instead labelled all three cysteines. In addition, the triple-cysteine mutant did not even show disulphide-bridge formation.We showed that indeed the V3 loop inserted in MBP binds anti-V3 antibodies and we could individually label all the mutants expect D41C. The third strategy was not successful because unfortunately in the triple cysteine mutant, the fluorophore we chose (DBM) did not show site specificity and labeled all three cysteines. In addition, the disulfide bridge was not found to be present in the triple cysteine mutant.
Chapter 6discusses the synthesis, characterization and binding of various maltolipids, (and their corresponding maltose-free controls) to MBP. The maltolipids were synthesised with varying linker lengths and anchor- & head-groups and then used to prepare liposomes and micelles. Although both liposomal and micellar forms could bind to MBP, only the micelles were screened subsequently for their ability to bind to MBP. The binding was assessed using various techniques such as fluorescence spectroscopy, gel filtration and thermal stability assay. We screened the maltolipids and determined how their anchor group, linker length and charge on the head group influences the binding of MBP to micelles formed by these maltolipids.
|
134 |
Structural Studies on SeMV Chimeras and TSV : Insights into Capsid AssemblyGulati, Ashutosh January 2015 (has links) (PDF)
Assembly of virus capsid protein (CP) into icosahedrally symmetric particles is an intriguing and elegant process. In most cases of virus assembly, a large number of identical protein subunits self-assemble to generate a shell that protects the viral genome. Studies on virus assembly have resulted in a new scientific technique that uses these proteinaceous shells as nano-particles for a variety of biological applications. The current thesis deals with understanding the factors that govern the assembly of the Sesbania mosaic virus (SeMV) and a pleomorphic virus, Tobacco streak virus (TSV).
CP of SeMV, a T=3 plant virus, consists of a disordered N-terminal R-domain and an ordered S-domain. The importance of the R-domain in the assembly was probed by replacement with polypeptides such as the B-domain of Staphylococcus aureus protein A and polypeptides P10 and P8 of SeMV. These chimera assembled into T=3 or larger virus like particles (VLPs). Addition of divalent cations resulted in the formation of heterogeneous nucleoprotein complexes that disappeared upon treatment with EDTA/RNAse. One of the chimeras (N∆65-B) purified in a dimeric form by affinity chromatography assembled into T=1 VLPs during crystallization. The three dimensional structure of these VLPs showed that they were devoid of divalent ions and the B-domain was disordered. These studies demonstrate the importance of N-terminal residues, metal ions in virus assembly and robustness of the assembly process. Also, the B-domain was functional in N∆65-B VLPs, suggesting possible biotechnological applications.
Tobacco streak virus (TSV) is a polymorphic virus and a major plant pathogen. TSV capsids encapsidate the tri-partite ss-RNA genome of the virus in three spheroidal particles of diameters 27, 30 and 33 nm, respectively. CPs of ilarviruses are also involved in genome activation. The labile nature of ilarviruses has posed difficulties in their structure determination. This thesis describes the first crystal structure of truncated TSV-CP. The core of TSV CP conforms to the canonical β-barrel jelly roll tertiary structure found in other viral coat proteins. Dimers of CP with swapped C-terminal arms (C-arm) were observed in the two crystal structures determined. The C-arm was found to be flexible and responsible for the polymorphic and pleomorphic nature of TSV capsids. Mutations in the hinge region of the C-arm that reduce the flexibility resulted in the formation of more uniform particles. TSV CP was also found to be structurally similar to that of Alfalfa mosaic virus (AMV) accounting for similar mechanism of genome activation in alfamo and ilar viruses.
|
135 |
NMR Solution Structures of Human γC-Crystallin & the Intrinsically Disordered Viral Genome Linked Protein in the Free & Bound FormDixit, Karuna January 2016 (has links) (PDF)
This thesis describes the tertiary structures and dynamic studies of two protein systems. The first is human γC -crystallin protein, which is present in the nucleus of the human eye lens and the other is the plant viral protein VPg (an intrinsically disordered protein) in its free as well as its protease bound forms. The structural studies described here have been carried out using high-resolution solution NMR spectroscopic methods.
Project I: Determination of solution structure and dynamics of Human γC-crystallin
(HGC) using NMR spectroscopy
The crystallins are the most abundant proteins in the eye lens of vertebrates. These proteins are packed in short-range spatial order to provide the transparency and appropriate refractive index gradient that are required for vision. The crystallins belong to two gene families, which are categorized as the alpha and beta/gamma crystallins respectively. The classification on the basis of molecular size and structure results in the proteins being referred to as alpha, beta and gamma crystallins. Again, each of the crystallins has two or more subtypes. The stoichiometry of the subtypes of α, β and γ crystallins varies with the age of the organism, but the order of abundance remains as β > α > γ irrespective of age. The most abundant crystallins in the nucleus (central region) of eye lens are the γ -crystallins. In the human lens, only three members of the γ− crystallin family are mainly expressed i.e. γS- (HGS), γC - (HGC) and γD - (HGD). HGS is expressed postnatally and thus is present mainly in the cortical region of the lens unlike HGC and HGD crystallins, which are present in the nucleus. It is known that aging and some cataract-associated genetic mutations alter the structure of these proteins. Other point mutations result in minimum structural perturbation but with drastically lowered solubility. Mutation in the human γC -crystallin leads to congenital cataract such as Coppock-like cataract, while structural information is available for HGD & HGS but no structure is available for HGC. However, recently a model structure has been reported for HGC based on a mouse orthologous. Based on this model structure, it was argued that HGC is an insoluble protein and was explained by lower magnitude of dipole moment and fluctuation in N-terminal domain of the model structure. However it is shown that HGC is very soluble protein.
Solution structure of human γC-crystallin has been determined from an analysis of multidimensional triple resonance NMR spectroscopy using distance restraints from unambiguously assigned 1H-1H NOE peaks and dihedral angle restraints from HNHA and HNHB spectra. 15N relaxation average T1 and T2 correspond to 0.729 ± 0.02 and 0.060 ± 0.04 second from 15N backbone relaxation study, which gives average rotational correlation time 10.87 ns that shows human γC-crystallin is monomer in solution of molecular weight 21 kDa (173 residues). The ensemble of 20 lowest energy structures shows a root mean square deviation of 0.60 ± 0.12 Å for the backbone atoms, and 1.03 ± 0.09 Å for all heavy atoms. The comparison between the calculated NMR structure with backbone chain atoms C`, Cα and NH, of the x-ray crystal structure of the mouse γC - crystallin shows that the structure determined here of human γC-crystallin is very similar with an RMSD of 1.3 Å, which is not surprising given the 84.5% amino acid sequence identity between the two proteins.
More importantly, the NMR structure reported here shows the subtle differences in the orientation of specific residues as well as the domain interface between the human and mouse orthologs. The orientation of the calculated dipole moment for this NMR structure differs from earlier reported for model structure. However it is similar to the other known soluble proteins. The determined solution structure of human γC-crystallin also enables us to estimate the effect of cataract-associative mutations on the structure and properties of the protein. Several such mutations are already known, and the work presented here could likely shed light on the molecular basis of these cataracts.
Project II: Solution structural studies of intrinsically disordered protein VPg in free and bound forms from Sesbania mosaic virus
Sesbania mosaic virus (SeMV) is a plant virus, which infects the Sesbania grandiflora tree. SeMV belongs to Sobemovirus genus, which is not defined under any family. The length of this viral genome is ~4kb. This viral genome has four open-reading frames (ORF). ORF1 and ORF2 encode movement and coat proteins, respectively. ORF2 is again split into two ORFs i.e. ORF2a and ORF2b by a -1 shift in the reading frame and encode two polypeptide chains. These polypeptide chains generate several functional proteins upon polyprotein processing. Polyprotein processing is a mechanism employed by animal and plant viruses to produce several functional proteins from a single polypeptide chain. The two polyproteins expressed are catalytically cleaved by a serine protease, thus releasing the four proteins: VPg (viral protein genome linked), RdRP (RNA dependent RNA polymerase), P10, and P8.
VPg (“Viral Protein genome linked”) as its name suggests, is covalently linked to the 5` end of the viral RNA. VPgs are generally known to be intrinsically disordered proteins and have many interacting partners. Intrinsically Disordered Proteins (IDPs) are not explained by the 3D structure–function dogma. However, they are important for biological functions such as molecular recognition, signal transduction and regulation. It is known that SeMV protease becomes inactive in the absence of the VPg domain at its C-terminal. VPgs of animal viruses are well studied as compared to VPgs of plant virues. The size of VPg varies across the Sobemovirus genus. It is important to know the structure of VPg since it is necessary for protease activity. The studies conducted here focus on the structural analysis of the VPg in its free and bound forms with protease (VPg complex) as well as some aspect of full-length ProVPg.
For structural studies, two constructs of VPg as fusion protein with Cytb5 tag, one lacking 23 residues at its C-terminal using the pET21a(+) plasmid vector have been designed. Sub-cloning was also done to add a thrombin recognition site to remove the hexa-His tag from new constructs of full-length ProVPg and protease (PRO). These proteins were highly expressed, isotopically labeled and purified for NMR study. The sample used for structural studies of the ProVPg 23 complex was prepared using selectively protonated Ile, Leu and Val; and isotopically labeled i.e. 2H, 13C, and 15N-VPg 23 protein.
VPg in its free form is an intrinsically disordered protein and this has been confirmed by its dynamic nature observed using solution NMR spectroscopy. VPg binds to its partner protease and adopts a 3D-structure, which has been shown here. The tertiary structure has been determined using distance restraints from 1HN-1HN NOEs and methyl 1HN NOEs, and dihedral angle predicted from analysis of chemical shift values. The tertiary structure of ProVPg 23 complex has one β -sheet composed of three antiparallel β-strands and an α-helix. The ensemble of 20 lowest energy structures shows a root mean square deviation of 0.42 ± 0.09 Å for the backbone atoms, and 1.09 ± 0.11 Å for all heavy atoms for residues 15 to 50 that are primarily involved in structure formation. On the other hand RMSD is 2.34 ± 0.72 Å for the backbone and 2.55 ± 0.60 Å for all heavy atoms for all residues including both termini. That the tertiary fold of VPg both in full-length ProVPg and when complexed with protease domain (PRO) are the same has been shown here. The NMR structure reported here provides a structural basis for the origin of resonances in the up-field region of one–dimensional proton spectrum of full length ProVPg. The binding surface based on the structures of ProVPg 23 complex determined here and X-ray structure of PRO; has been determined using HADDOCK. The structural model here of full length ProVPg 23 shows the presence of aromatic interaction between Trp271 of PRO and Trp46 of VPg, which is consistent with the earlier biochemical studies.
|
136 |
Structure Determination of Proteins of Unknown Origin by a Marathon MR Protocol and Investigations on Parameters Important for Molecular Replacement Structure SolutionHatti, Kaushik S January 2016 (has links) (PDF)
Occasionally, crystallisation of proteins works in mysterious ways! One might obtain crystals of a protein of unknown identity in place of the protein for which crystallisation experiments were performed. If the investigator is not aware of such possibilities, valuable time and resources might be lost in attempting to determine the structure of such proteins. Instances of non-target protein getting crystallised may not come to light at all or may be realised only when attempts to determine the structure completely fail by conventional procedures after collecting and processing the diffraction data. Usually, it is not possible to reproduce the crystals of the same protein as their occurrence is serendipitous. Such rare instances of crystallisation are probably caused by fluctuating environmental or crystallisation conditions and are not reproducible. It could also be due to contaminating microbes, which is more likely when the experimentalist is not well experienced. Therefore, experimental phasing of the data collected on serendipitously obtained crystals could be a challenging task.
With the rapid increase in the number of structures deposited in the protein data bank (PDB), molecular replacement has become the method of choice for structure determination in macromolecular X-ray crystallography. This is due to the fact that it is possible to select a suitable phasing model for most target proteins based on their sequence information. However, if the identity of the target protein itself is uncertain, all attempts of structure determination using phasing models selected on the basis of target protein sequence-dependent search would fail. Sequence-independent ab initio phasing techniques such as ARCIMBOLDO (Meindl et al., 2012), which has recently become available, could provide leads only if the non-target protein is an all-α-protein and the associated diffraction data extends to a resolution better than 2 Å. Even then, the success rate with this technique is low. Hence, it becomes important to employ a sequence-independent method of structure determination for such mysteriously obtained crystals. This thesis reports crystal structures of proteins which are serendipitously crystallised using a large-scale application of Molecular Replacement (MR) technique (referred in this thesis as MarathonMR). This thesis also presents an evaluation of molecular replacement strategies for structure determination.
The thesis begins with an overview of crystallographic methods of structure determination with an emphasis on the method of molecular replacement (Chapter 1). The most prominent of the results obtained in the course of these investigations pertains to a crystal obtained during routine crystallisation of a viral protein mutant in the year 2011. The cell parameters were different from cell constants of crystals obtained with other known viral protein mutants crystallised earlier in the same laboratory. Unfortunately, this crystal could not be reproduced in the same form in subsequent crystallisation trials. All attempts to determine the structure through conventional molecular replacement techniques using a combination of domains from a nearly identical virus coat protein protomer as the phasing model had failed. The data was shelved as “not-solvable” in late 2011. However, the crystal had diffracted to 1.9 Å and had excellent merging statistics. Therefore, the data was retrieved recently and additional attempts were made to determine the structure through phasing techniques that have become available recently. Techniques such as AMPLE (Bibby et al., 2013) and Rosetta (DiMaio, 2013), which use large-scale homology models coupled with molecular replacement, did not lead to meaningful solutions. A couple of helices identified by ARCIMBOLDO (Meindl et al., 2012) were neither correct (retrospectively) nor sufficient to determine the entire structure. Given the excellent merging statistics of the crystal data, there was significant motivation to determine the structure, though it meant developing a fresh protocol. It was at this time that we came across the work of Stokes-Rees and Sliz (2010) in which they had demonstrated that it is possible to determine structure of proteins of unknown identity by employing almost every known protein structure as a potential phasing model.
The work reported in the thesis is a result of an earlier project to examine the relationship between properties of phasing models and the quality of target protein model generated through MR by employing large scale molecular replacement runs. This project was initiated because of the realisation that the recent explosion in crystallographic structural studies has resulted in near complete exploration of the “fold-space” of proteins and PDB now has a representative structure for most plausible folds of proteins. Some folds are highly represented in the PDB. Hence, it is likely that there would be at least one homologue in the PDB which could be used as a phasing model to successfully determine the structure of a protein of unknown identity if the diffraction dataset is of excellent quality. Hence, the single dataset which had diffracted to 1.9 Å resolution was used to
develop a MarathonMR procedure for structure determination. MarathonMR procedure takes sequence-independent approach to structure determination and employs large-scale molecular replacement calculations to identify the closest homologue (in structural terms initially). This protocol is described in Chapter 2 (Materials and methods) of the thesis. Through MarathonMR, structure of the dataset which had remained unsolved for 5 years was finally determined. Nearly complete sequence of the polypeptide could be deduced by inspecting the electron density map due to the high resolution and quality of the map. The protein was found to be a phosphate binding protein from a soil bacterium Stenotrophomonas maltophilia (SmPBP). The way in which the structure was determined and possible explanations for the mysterious source of this protein which had crystallised instead of the target protein is discussed in Chapter 3. Though MarathonMR procedure was developed to solve a single dataset, it was soon realised that the same procedure could be applied to other similar datasets, all of which had diffracted to reasonable resolutions with good merging statistics but had remained unsolved for unknown reasons. Among such datasets, one of the datasets which was collected in 2007 and had diffracted to 2.3 Å resolution had cell parameters very close to that of SmPBP. Hence, a poly-alanine model of the structure of SmPBP, which was determined by then, was used as the phasing model to run molecular replacement and the structure was readily solved. It was surprising to note that SmPBP had crystallised serendipitously not once but twice, once in 2011 resulting in crystals that diffracted to 1.9 Å resolution and earlier in 2007 in crystals that diffracted to 2.3 Å resolution independently by two different investigators in the same laboratory. Both the structures are nearly identical and a comparison of these structures is presented in Chapter 4. Structure of SmPBP determined at 2.3 Å resolution by MarathonMR also corresponds to the dataset that had remained unsolved for the longest period of time (9 years). This success of structure determination after the lapse of such a long period emphasises the importance of carefully preserving X-ray diffraction data irrespective of its immediate outcome.
In Chapter 5 of the thesis, another instance of non-target protein crystallisation, the structure of which was determined using the MarathonMR procedure is described. The crystal was obtained while carrying out crystallisation of mutants of a survival protein (SurE) expressed in Salmonella typhimurium when the bacterium is subjected to environmental or internal stresses. The original investigator had used the structure of SurE as the phasing model to determine structure of the mutant crystals and obtained a model with R and Rfree of 35% and 40%, respectively. However, the model did not refine further to lower R-factors suggesting that the solution obtained may not be correct. MarathonMR indicated that the fold of the crystallised protein could be similar to that of glycerol dehydrogenase. As SurE shares some fold similarity with one of the domains of GlyDH, the original investigator might have been able to achieve a limited success with R/Rfree factors of 35% and 40%, respectively. As the merging statistics for this diffraction data set was poor, the diffraction images were reprocessed in XDS program on Xia2 automated spot processing pipeline. The data statistics indicated merohedral twinning (14%). However, using appropriate parameters, it was possible to refine the structure obtained by MarathonMR to acceptable R/Rfree using the Refmac program. Four protomers were present in the crystal asymmetric unit (ASU). Non-crytsallographic symmetry averaging of electron density over these four molecules further improved the electron density. As the data was limited to 2.7 Å resolution, it was not possible to deduce the identity of every residue of the protein unambiguously based solely on the resulting electron density map. With the identity of the amino acids that could be deduced with certainty, it was clear that the protein belongs to glycerol dehydrogenase from a species of Enterobacteriacea family. Though a similar structure of glycerol dehydrogenase has been reported from Serratia, there are clear differences in many unambiguously determined residues which suggest that the protein is not from Serriatia. The protein has been named EnteroGlyDH as the source of the protein is likely to be from a species of Enterobacteriacea family. The structure of the protein, its biochemical implications and possible reasons for the serendipitous crystallisation of a non-target are discussed.
Chapter 6 discusses the structure determination of an inorganic pyrophosphatase and catalytic domain of Succinyl transferase, the crystals of which had diffracted to 2.3 Å and 3.1 Å, respectively, but had remained unsolved. Neither of the datasets corresponds to the intended target proteins. The dataset corresponding to the protein whose structure was determined as that of an inorganic pyrophosphatase was provided by a colleague from a different laboratory in the Indian Institute of Science. It is interesting to note that the investigator had carried this dataset to one of the CCP4 workshops and had tried to determine the structure with the help of experts in the workshop. The attempts to determine its structure had however failed for reasons that are obvious now. The original investigator was unfortunately making efforts with an erroneous assumption on the identity of the target protein. As these enzymes are well studied, their structures and functions are briefly discussed.
It is already well established that molecular replacement is being used with increasing frequency as the phasing technique when compared to other experimental phasing techniques. With the ever growing number of structures in the PDB, high population of certain folds and a near-plateau attained in the identification and growth of new folds, it is reasonable to expect that molecular replacement will be used even more frequently in the years to come. Therefore, for carrying out molecular replacement for a given diffraction dataset of a target protein, it is very likely that several homologous structures would be available in the PDB that could be used as potential phasing models. Hence, it becomes important to understand the influence of phasing model on the quality and accuracy of model generated through MR to achieve the best structure solution. To understand this relationship between phasing model and model obtained by MR protocol, re-determination of already known structures deposited in the PDB starting with their respective structure factors and various phasing models was initiated. Structures belonging to TIM beta/alpha-barrel (SCOPe ID: c.1) and Lysozyme-like (SCOPe ID: d.2) folds were chosen as targets. The structure of each target was re-determined serially starting with poly-alanine models of all available unique homologues as phasing models. Due to the multi-dimensional nature of this study, the results obtained were represented in a graphical form with nodes and edges. Detailed methodology of the work carried out and the data representation model are discussed in the Chapter 2 (Materials and methods). It was found that after a certain sequence identity cut-off, sequence identity between phasing model and target seems to have little influence on the quality and accuracy of the model generated through MR. Instead, other qualities of the phasing model such as Rfree and RSCC influence the quality of MR models. These results are discussed in Chapter 7. Learning from the work reported in this thesis are discussed in concluding chapter. The possible logical and programmatic upgrades to MarathonMR protocol and future path in which the relationship between phasing models and models generated through MR can be studied are discussed in Chapter 8 (Conclusion and future prospects).
|
137 |
Insights into Substrate Specificity in Sortase Enzymes from Structural Studies on a Novel Class of Housekeeping Sortase (SrtE) Identifying Functionally Important Cis-Peptide Containing Segments in Proteins and their utility in Molecular Function AnnotationDas, Sreetama January 2016 (has links) (PDF)
Understanding protein function is fundamental to the fields of protein engineering and drug design. While most of the previous efforts in this direction have focused on the sequence-structure-function paradigm, recent studies have pointed to protein dynamics as being integral to its activity. The work in the current thesis follows this overall theme of obtaining insights into protein function from its structure and dynamics. It can be broadly divided into two sections. In the first section, the thesis candidate has tried to elucidate the residues modulating the substrate specificity of a particular family of enzymes, known as sortases, through structural and computational studies (including dynamics simulations) on a novel member in the family. This work has been carried out in collaboration with Dr. R.P. Roy, National Institute of Immunology, New Delhi (biochemical characterization was performed by Mr. Vijay Pawale at Dr. Roy‟s laboratory). In the second half of this thesis, the candidate has described a structure-based method involving the use of cis-peptide containing segments for the function annotation of proteins. The incorporation of dynamics information leads to an improvement of our annotation approach, which is also demonstrated. This part of the work has been carried out in collaboration with Dr. Debnath Pal, Department of Computational and Data Sciences, Indian Institute of Science. Following is a chapter-wise description of the overall layout of the thesis.
Section I: Insight into substrate specificity in sortase enzymes from structural studies on a novel housekeeping sortase of class E (SrtE)
Chapter 1| A brief account of sortases: This chapter provides a brief survey of the literature on sortases and the scope of the work presented in the thesis.
Many surface proteins in Gram-positive bacteria are incorporated into the cell wall through covalent ligation by a class of cysteine transpeptidases known as Sortase. These surface proteins contain a cell wall sorting signal (CWSS) which is recognized by sortase, enzymatically cleaved and subsequently joined covalently to the pentaglycine branch of lipid II (a peptidoglycan precursor) in general, which is finally incorporated into the peptidoglycan cell wall.
Six classes of sortases have been identified on the basis of their sequence. These sortases differ in the substrate motif that they recognize and the function performed. The class A sortase (SrtA) is expressed ubiquitously in Gram-positive bacteria. It is involved in the cell surface anchoring of a large number of functionally distinct proteins which contain an LPXTG recognition motif in their CWSS, and is referred to as the „house-keeping‟ sortase. Sortases of other types are not ubiquitous and are meant to perform specialized functions. Sortase B is involved in iron acquisition, sortase C in pilus formation and sortase D in sporulation. The substrate motifs recognized by these sortases are, in general, different from the recognition motif in SrtA substrates. Several Gram-positive bacteria with a high GC content in their genome have been suggested to use a sortase E (SrtE) instead of SrtA to perform the housekeeping activity. These sortase sequences share low identity with sortases of classes A-D. The substrates of SrtE have been proposed to contain an LAXTG recognition motif instead of LPXTG based on genomic analyses. Class F consists of sortases from several Actinobacteria. However, the biological function of these sortases is not well understood. To date, structures of sortases from classes A-D have been determined, all of which display an eight-stranded beta barrel fold (termed the sortase fold), a conserved catalytic triad of His-Cys-Arg and a TLXTC motif at the active site (C: catalytic Cysteine; X varies across the different classes of sortases). Sortase B and C are augmented by additional secondary structure features which are absent in sortase A. SrtA from Staphylococcus aureus is the most well studied among sortases of known structure.
Several of the surface proteins attached by sortases are responsible for bacterial virulence. SrtA deletion mutants have been found to exhibit reduced virulence without affecting cell viability. Moreover, the localization of sortase in the cell membrane and the absence of eukaryotic homologs have made sortase an attractive target for the development of novel therapeutics. In addition, the transpeptidase activity of sortase has found extensive applications in biotechnology. The prototype SrtA from Staphylococcus aureus is commonly used for these applications; however, its use is limited by its obligate Ca2+ ion-dependent activity and the stringent preference for an LPXTG motif. Hence, characterization of new sortases with altered substrate recognition profiles and rational modification of
known sortases has tremendous potential for biotechnological applications and
advancements.
While sortases of classes A-D have been studied extensively to date and their structures determined, no structural data is available for a class E sortase. The thesis candidate has solved the first high resolution crystal structure of a putative housekeeping Sortase E in Streptomyces avermitilis (SavSrtE), a bacterium with a GC rich genome. Biochemical experiments performed by our collaborator on this protein have demonstrated Ca2+ independent transpeptidase activity and a preference for LAXTG-containing peptides as its cognate substrate over the LPXTG motif that is recognized by sortase A. Moreover, the protein exhibits a preference for small uncharged residues in the position succeeding the penta-peptide motif. This thesis documents the results of crystal structure analyses, molecular docking studies and dynamics simulations to understand the structural basis for these experimental findings. Finally, sequence analyses were performed to detect possible residues which modulate substrate specificity. Based on these analyses, mutations were performed. The thesis also documents the crystal structure solution and analysis of an active site mutant (residue T196 at the position X in the TLXTC motif).
Chapter 2| Methods for the analyses of Sortase E from S. avermitilis (SavSrtE): This chapter provides a description of the procedures used to carry out the thesis work.
An N-terminus truncated construct (∆N50) of wild type SavSrtE and its mutant T196V were cloned, expressed and purified in the laboratory of our collaborator, Dr. R.P. Roy (NII, New Delhi), and provided to us for structure and sequence analyses. Initially, crystallization trials were carried out on the wild type protein using commercially available screening kits and the sitting drop vapor diffusion method. The condition which gave crystals was optimized further. Finally, diffraction quality crystals were obtained in a drop containing 1μL of protein (4 mg/mL in 10 mM Tris-HCl buffer pH 7.2, 100 mM NaCl and 2 mM beta-mercaptoethanol) mixed with 1μL solution of the crystallization condition containing 1.6 M ammonium sulfate, 0.1 M citric acid at pH 3.75 using the hanging drop vapor diffusion method. The crystals were cryo-protected in a 10% sucrose solution and diffraction data collected at the European Synchrotron Radiation Facility (BM-14, ESRF). The crystals diffracted to 1.65Å. The protein crystallized in the P3221 space group with unit cell parameters a = b = 85.84Å, c = 48.20Å, α = β = 90°, γ = 120°. Calculation of Matthews coefficient indicated the presence of one molecule in the asymmetric unit. T196V mutant protein yielded diffraction quality crystals in the same condition as the wild type protein. The crystals were cryo-protected using sucrose and diffraction data were collected at the BM-14 beamline.
The mutant crystals diffracted to 1.70Å. The protein crystallized in the P3221 space group with unit cell parameters a = b = 84.98Å, c = 48.00Å, α = β = 90°, γ = 120° and one molecule in the asymmetric unit. The quality of the datasets was assessed by SFCHECK and data were found to be of appropriate quality for structure solution.
SavSrtE has low sequence identity (25 – 34%) to other class A sortases of known structure. Hence the scaled data, sequence information and model coordinates (sortase A from Streptococcus agalactiae, PDB ID: 3rcc) were submitted to the MR (molecular replacement) phasing option in the EMBL-Hamburg AutoRickshaw pipeline. The model generated from the server was used as input to PHASER for MR. The MR solution was subjected to one cycle of rigid body refinement followed by several cycles of restrained refinement using REFMAC from the CCP4 suite, with alternate rounds of inspection and manual model building in COOT for model improvement. The convergence of the refinement procedure was checked from the reduction in R-factors. The most essential refinement statistics for the final models of the wild type protein and T196V mutant are tabulated below.
Table 1 Wild type (5GO5) T196V (5GO6)
Resolution 1.65 Å 1.70 Å
Rwork / Rfree (%) 16.11 / 19.05 17.31 / 20.82
R.M.S. bond lengths (Å) 0.012 0.019
R.M.S. bond angles (°) 1.53 1.89
Average B-factors (Å2)
Protein 19.1 32.5
Water 32.6 42.4
SO42- 58.7 60.8
Gly 36.0 -
Ramachandran map statistics
Most favoured region (%) 86.8 89.8
Additional allowed region (%) 13.2 10.2
Generously allowed region (%) 0.0 0.0
Outliers (%) 0.0 0.0
The genome of S. avermitilis was searched using the ScanProsite tool to identify putative substrates, details of which are also documented in this chapter. Additionally, the thesis candidate performed Mutual Information analysis on an alignment of 1569 sortase sequences from different classes to identify the residues possibly regulating substrate specificity. Based on this analysis, mutations were performed of which the T196V mutant has been studied in this thesis.
Finally, this chapter describes the protocol used to perform protein peptide docking and subsequent molecular dynamics simulations to understand how dynamics may influence substrate specificity.
Chapter 3| Analyses of SavSrtE sequence and structure: This chapter provides a description of the analyses on the wild type SavSrtE and the T196V mutant.
The overall fold of SavSrtE is very similar to that observed in the structures of other sortases, although the sequence similarity to other classes is low. Variations are observed in the loop regions (longer β1/β2 and β6/β7 loops). The active site is comprised by residues from the β2/H1 loop, β3/β4 loop, β4 strand, β6/β7 loop, β7 strand, β7/β8 loop and β8 strand. It also does not carry any cluster of electronegative residues close to the active site and therefore, is expected to have Ca2+ ion independent activity, which is observed in biochemical experiments (Dr. R.P. Roy‟s lab). Comparison with other housekeeping sortases showed that the β6/β7 loop in SavSrtE is in a closed conformation, indicating the presence of a preformed binding pocket for the LAXTG substrate binding, contrary to the prototype SrtA from Staphylococcus aureus which requires a Ca2+ ion to stabilize the closed conformation. Moreover, a small pocket is observed adjacent to the catalytic triad which contained electron density fitting a Gly molecule. This pocket is proposed to be the binding site for the second substrate that resolves the protein-peptide intermediate through a nucleophilic attack. Our docking simulations showed that a Gly of a triglycine moiety can be positioned in this pocket.
Biochemical experiments established that SavSrtE recognizes the substrate motif LAXTG instead of LPXTG which is preferred by class A sortases. It also prefers Gly based nucleophiles as the second substrate. Additionally, the protein is found to prefer neutral residues over charged residues in the position succeeding the Gly of the LAXTG motif.
Structure analyses showed the presence of a bulky Tyr residue (Y112) at the active site pocket which, according to molecular docking studies, hinders the productive binding of Pro-containing peptides (LPXTG) over Ala-containing ones (LAXTG). The OH group of Y112 is involved in a hydrogen bond with the backbone nitrogen of the second Ala in the ALANT peptide but not in the Pro-containing peptide. Y112 is held rigidly in place via interactions with neighbouring residues and a network of hydrogen-bonded water molecules in the crystal structure. A Tyr residue is found to be present in an equivalent position in several sortase sequences of Class E, and may be a general feature responsible for the specificity of sortase Es to putative LAXTG-containing substrates in their genomes. It may be mentioned that class D sortases, which contain a Phe residue at the equivalent position, recognize the LPXTA substrate motif. The side chain of this Phe displays different rotamers in the NMR structure of Bacillus anthracis SrtD, pointing to its flexibility, whereas Y112 in S. avermitilis SrtE is rigid. In addition, molecular dynamics simulations on the models of protein-peptide complex (obtained from docking) showed that the two peptides have similar backbone dynamics, unlike the case of S. aureus SrtA where the Ala-containing peptide does not maintain a kinked conformation similar to the Pro-containing cognate peptide. Hence the Tyr at the active site appears to be the main factor behind the discrimination of the two peptides.
Substrate sequences in the S. avermitilis genome contain small neutral residues in the position succeeding the Thr-Gly peptide bond in the substrate. This preference is also observed in biochemical assays. Docking calculations showed that the protein cannot accommodate large side chains in the site where this residue is positioned.
To detect the residues involved in altering the substrate specificity of SavSrtE, we performed a multiple sequence alignment using 1569 sortase sequences and carried out mutual information (MI) analysis on this data. Our analysis implicated several residue pairs lining the active site pocket in modulating substrate specificity. These included the aforementioned Tyr residue as well as the position X (T196 in SavSrtE) in the TLXTC motif at the active site. Mutations were performed at these positions and crystallization trials performed. We could successfully crystallize and solve the structure of the T196V mutant, which has been documented in this thesis.
The mutant protein has the same overall structure as the wild type. Moreover, the catalytic Cys residue was observed to be unmodified in this structure, compared to the wild type which was presumably altered by β-mercaptoethanol added during protein purification. The mutated residue (Val) was found to have a different side chain rotamer than T196. Moreover, the absence of any polar atom in the side chain of V196 disrupted the hydrogen-bonded network of water molecules observed at the active site in the wild type structure. Experiments on the mutant showed a reduction in activity, implying that T196 is important for substrate recognition. The altered side chain orientation of V196 is expected to be responsible for the reduction in activity, though a peptide-bound crystal structure would be necessary to clearly understand the mechanism. In this respect, future crystallization trials may be performed with modified peptides that bind covalently to the active site Cys residue, similar to the strategy employed for S. aureus SrtA and Bacillus anthracis SrtA.
Our structure and sequence analyses have pointed to some residue positions responsible for the modified substrate specificity. While only one mutant has been characterized, the other mutants also need to be studied (through biochemical asssays and structure analysis) to understand how they contribute to substrate recognition. In this context, double mutants may also be generated to understand the combined effect. For example, single mutations of E105 and E108 were found to reduce the activity of Staphylococcus aureus SrtA, while the double mutant resulted in Ca2+ ion independent activity. Additional structure and sequence analysis coupled with experiments are necessary to detect residues which may be mutated to enhance the activity of SavSrtE, similar to what has been performed for S. aureus SrtA.
To summarize, our studies show that the substrate specificity of SavSrtE is different from that of class A sortases, and provide an explanation for it using structure analyses and computation. This altered specificity profile, orthogonal to that of S. aureus SrtA, and Ca2+ ion independent activity make it a potential candidate for use in simultaneous conjugation of multiple peptide substrates to their target. Moreover, this structure may be used firstly as a model to design inhibitors for housekeeping srtEs from pathogenic organisms like Corynebacterium diphtheriae and Tropheryma whipplei. Secondly, most of the previous studies on inhibitor design for sortases documented small molecules or peptidomimetics binding to the pocket of the first substrate. Since distinct binding pockets have been observed
for the two substrates in SavSrtE, this information may be used to build inhibitors targeting the second pocket or spanning both the pockets.
Section II: Identifying functionally important cis-peptide containing segments in proteins and their utility in molecular function
annotation Chapter 4| Functionally important cis-peptide fragments in proteins: detection and relevance: This chapter describes the relevance of cis-peptides to protein function and a method to detect such functionally important cis-peptides in proteins.
Cis-peptide bonds are comparatively rare in proteins due to the steric strain associated with the 1,4-atomic clash in the peptide chain. Consequently, only about 0.03% of Xaa-Xnp (Xaa: any amino acid; Xnp: any amino acid other than Pro) peptide bonds occur in the cis conformation; the occurrence is somewhat higher (5%) for imino peptide-containing Xaa-Pro cases. Despite their low occurrence, cis-peptides have been found to be evolutionarily conserved, pointing to their important role in structure and function. Cis-Xnp peptide bonds exhibit a significant disposition towards ligand-binding sites and dimerization interfaces, whereas cis-Pro bonds have been found to occur in a rare „touch-turn‟ motif at functional sites. Cis-trans isomerization is expected to play a regulatory role in many cellular processes. Non-conservation of these peptides is implicated in the evolution of different function among similar protein folds. Hence, there has been a renewed interest in detecting cis-peptides from residue patterns and linking them to molecular function.
The importance of proteins as molecular 'workhorses' makes it imperative to understand how they function. However, a vast majority of the proteins catalogued in public sequence and structure databases do not have experimentally verified functional annotation. Experimental approaches are inadequate to manually curate these large numbers of un-annotated proteins. This necessitates the use of computational function prediction tools. The simplest prediction methods involve the assessment of similarity in sequence and three-dimensional structure with homologous proteins of known function. The presence of high overall similarity, however, does not predict function unambiguously since certain protein folds are associated with multiple functions while proteins with different folds may share functional traits. Often proteins with different global structure are found to have structural similarity at the local level of segments of residues that are responsible for the similarity in function. This has given rise to fragment-based (FB) function annotation methods. FB methods may involve locating functionally relevant surface patches or cavities formed by sequentially distant residues, or the presence of structurally conserved, contiguous residue fragments with proven relevance to function. The direct relevance of the cis-peptide bond to protein function suggests its use for the purpose of function annotation in a FB approach, yet no method exists to exploit it.
This chapter describes a method using geometric clustering and level-specific Gene Ontology (GO) molecular-function (MF) terms to identify, in a statistically significant manner, cis-peptide embedded fragments (henceforth referred to as cis-fragments) in a protein linked to its molecular function. Such fragments were associated with GO MF based propensity value ≥ 20 at p-value ≤ 0.05, indicating the statistical significance of our results. The relevance of the identified cis-fragments to protein function was further verified through a literature survey. The features of these fragments are discussed in this chapter. Some of these fragments do not overlap with known PROSITE patterns, depicting the utility of these fragments as sequence patterns. Moreover, the thesis candidate identified contiguous stretches of functionally important trans-peptide fragments and cis-fragments forming extended structure-based functional signatures.
Chapter 5| Use of functionally important cis-fragments in annotation: In this chapter, the candidate describes how a library of cis-peptide embedded fragments with proven association to molecular function can be useful for annotating proteins with known structure (and having cis-peptide) but unknown function.
The functionally important fragments detected in the previous chapter were searched for exact matches in sequence and cis-peptide in a test set of PDB entries of known function at different thresholds of sequence redundancy and p-value. Additionally, the match or mis-match in GO MF term between the functionally important fragment and the test protein was also evaluated. To assess the efficiency of our method in annotation, true positive rate (TPR) and false positive rate (FPR) were calculated at each threshold as follows:
TPR TP and FPR FP
TP FN FP TN
The following table explains how the numbers of cases with TP, FP, etc. were assigned.
Cases with match in Match in cis-peptide No match in cis-
sequence peptide
Match in GO MF TP FN
No match in GO MF FP TN
The cis-fragments alone were sufficient to identify other proteins with similar function. Over different thresholds, TPR >0.91 and FPR <0.23 were observed. Annotation recall benchmarks interpreted using receiver-operator-characteristic-plot returned >0.9 area-under-curve, corroborating the utility of the annotation method. Further, the applicability of our method in fragment-based function annotation is illustrated for cases where homology-based annotation transfer is not possible. The work presented here adds to the repertoire of function annotation approaches and also facilitates engineering, design and allied studies around the cis-peptide neighbourhood of proteins.
The results presented in chapters 4 and 5 have already been published (reprint enclosed) with the thesis candidate as the first author. Chapter 6| Molecular dynamics information improves cis-peptide based function annotation of proteins:
The preceding chapters have demonstrated the use of functionally relevant cis-peptide segments in a homology-independent, fragment match-based protein function annotation method. However, proteins are not static molecules; their dynamics is integral to their activity. Hence we have incorporated the dynamics (obtained using an in-house coarse-grained forcefield) of functionally important cis-peptide segments in our annotation method. This is the first study to include both static and dynamics information to improve the prediction of protein molecular function.
To ascertain the improvement upon incorporating dynamics, the ACV-based dynamics profiles (details in chapter) were compared in a dataset consisting of 102 pairs each of positive data (PDB entries with match in fragment sequence and cis-peptide) and negative data (PDB entries with match in fragment sequence but no match in cis-peptide). Our analyses depicted that using only cis-peptide information gave less false positives and a low FPR (0.11), which is desirable, but also a relatively low TPR (0.72). This is due to large FN (trans-peptide with matching GO MF), which can arise when the cis-fragment undergoes cis-trans isomerization to accomplish its function and coordinates have been obtained for the segment in the test data in the trans-state, or if there is an error in assignment of the omega angle during structure solution. On the other hand, using only dynamics information increases the numbers of both true and false positives and hence the TPR (0.95) and FPR (0.51). This is due to false-positive matches for cases where fragments with similar secondary structure show similar dynamics, but the proteins do not share a common function. Combining the predictions from the two methods reduces errors while detecting the true matches, thereby enhancing the utility of our method in function annotation (TPR: 0.95 and FPR: 0.07). Subsequently, we have combined static and dynamics information to annotate proteins of unknown function. A combined approach, therefore, opens up new avenues of improving existing automated function annotation methodologies.
The work described in this chapter has been submitted to a peer reviewed journal. Future prospects include the development of a web server to facilitate the application of our method by a wide research community. A possible improvement includes identification and comparison of the dynamics of additional sites close to the identified cis-fragment, in an automated manner, to improve the accuracy of our annotation.
Appendix 1 gives a description of the results of biochemical experiments performed in the laboratory of our collaborator Dr. R.P. Roy, NII, New Delhi. Appendix 2 contains additional data supplementary to chapter 4.
Appendix 3 provides additional data supplementary to chapter 5.
Appendix 4 provides additional data supplementary to chapter 6.
Appendix 5 contains reprints of publications.
|
138 |
Structural Feature of Prokaryotic Promoters and their Role in Gene ExpressionAditya Kumar, * January 2015 (has links) (PDF)
Transcription initiation is an important step in the process of gene regulation in prokaryotes. Promoters are stretches of DNA sequence that are present in the upstream region of transcription start sites (TSSs), where RNA polymerase and other transcription factors bind to initiate transcription. Recent advancement in sequencing technologies has resulted in huge amount of raw data in the form of whole genome sequences. This sequence data has to be annotated, in order to identify coding, non-coding and regulatory regions. Computational tools are useful for a quick and fairly reliable annotation of many genome sequences. Promoter prediction is an important step in genome annotation process which is needed, not only for the validation of predicted genes, but also for the identification of novel genes, especially those coding for non-coding RNA, which are missed by gene prediction programs. DNA sequence dependent structural properties such as DNA duplex stability, bendability and intrinsic curvature have been found to be associated with promoter regions in all domains of life. The work presented in this thesis focuses on the analysis of these structural features in the promoter regions of published prokaryotic transcriptome data. Furthermore, promoters were predicted using these structural features and their role in gene expression were studied. The organization of thesis is as follows. An overview of transcription machinery of prokaryotes, promoter architecture, available promoter prediction programs and sequence dependent structural features is presented in chapter 1.
Chapter 2 describes the datasets and methods used in entire study.
Structural features of promoters associated with primary and operon TSSs of H.pylori26695 genes and their orthologs (chapter 3)
Promoter regions in genomic sequences from all domains of life show similar trends in their structural properties such as stability, bendability, curvature. This chapter dis-cuss the DNA duplex stability and bendability of various classes of promoter regions (based on the identification of different classes of transcription start sites, viz. primary, secondary, internal, operon TSSs etc, in transcriptome study) of Helicobacter pylori 26695 strain. It is found that the primary TSS and operon associated TSS promoters show significantly strong structural features in their promoter regions. DNA free energy based promoter prediction tool PromPredict has been used to annotate promoters of different classes and very high recall values (80%) are obtained for primary TSS. Orthologous genes from 10 different strains of H. pylori show conservation of structural properties in promoter regions as well as coding regions. PromPredict annotates promoters of orthologous genes with very high recall and precision values. DNA duplex stability of promoter region is conserved in the orthologous genes in 10 different strains of Helicobacter pylori genome.
Sequence dependent structural features of promoters in prokaryotic transcriptome (chapter 4)
Next-generation sequencing studies have revealed that a wide range of transcripts such as primary, internal, antisense and non-coding RNA, are present in the prokaryotic transcriptome and a large fraction of them are functionally involved in various regulatory activities. Identification of promoters associated with different transcripts is important for characterization of transcriptome. The current chapter discusses DNA sequence dependent structural properties like stability, bendability and curvature in the promoter region of six different prokaryotic transcriptomes (Helicobacter pylori, Anabaena, Synechocystis, Escherichia coli, Salmonella and Klebsiella). Using these structural features, promoters associated with different category of transcripts were predicted, which constitute an integral part of the transcriptome. Promoter annotation using structural features is fairly accurate and reliable as compared to motif-based approach since different category of transcripts show poor sequence conservation in the promoter region. Most importantly, it is universal in nature unlike sequence-based approach that is generally organism specific.
Role of sequence dependent structural properties in gene expression in prokaryotes (chapter 5)
DNA duplex stability, bendability and intrinsic curvature play crucial roles in the process of transcription initiation. Hence, in order to understand the relationship be-tween these structural features and gene expression, the relative differences in stability, bendability and curvature in the promoter regions of high and low expressed genes were studied. It is found that these features are relatively accentuated in the promoter regions associated with high gene expression as compared to low gene expression. Promoter regions associated with high gene expression are annotated more reliably using DNA structural features, compared to those for low gene expression.
Sequence dependent structural properties in the promoter region of essential and non-essential genes of the prokaryotes (chapter 6)
Essential genes are the minimal possible set of genes required for the survival of organism. These sets of genes can be identified by experiments such as single gene deletion and transposon mediated inactivation. Here, the analysis of DNA duplex stability and bendability in the promoter regions of essential and nonessential genes of prokaryotes is reported. It is found that the average free energy and bendability pro-files are distinct in the promoters regions of essential and nonessential genes. Whole genome promoter predictions using in-house program, PromPredict, for essential and nonessential genes has also been carried out.
Chapter 7 present the summary and conclusion of the entire thesis work followed by future perspectives in the field.
Optimization of PromPredict algorithm and updating PromBase with newly sequenced genomes (Appendix A)
PromPredict is an in-house program, which is based on the relative stability of the DNA in flanking regions. It was found to perform well in predicting promoters across all organisms. In previous studies, it was observed that for organisms having low genomic GC content (<35%), promoter prediction resulted in low precision values, which indicates higher false positive rate. Threshold values of PromPredict algorithm were re-vised in order to optimize the algorithm with low false positive rate. PromBase is a comparative genomics database of microbial genomes. It stores different genomic and structural properties of the microbial genomes. It also displays the predictions obtained from PromPredict in a graphical as well as tabular format. Newly sequenced genomes were downloaded from NCBI and processed using in-house programs and added to the mysql database (back end of the PromBase). Stability profiles for predictions were also added for the RNA coding genes, earlier only profiles for protein coding genes were displayed. Comparative genomics of asymmetric gene orientation in prokaryotes (Appendix B)
Transcription proceeds in 5’ to 3’ direction on the template strand, hence it provides directionality. Prokaryotic genomes show asymmetry in gene orientation on leading and lagging strands. The different phyla of prokaryotes were analyzed in terms of asymmetry in gene orientation. It is found that organisms belonging to a particular phyla known as “Firmicutes”, show high asymmetry in gene orientation, which are known to have different DNA polymerase systems for replication.
|
139 |
Dynamics of Protein Kinases : Its Relationship to Functional Sites and StatesKalaivani, R January 2017 (has links) (PDF)
A cell is a highly complex, ordered, and above all, a robust system. It copes with in-trennel and external uncertainties like heterogeneous stimuli, errors in processing and execution, and changes within and outside the cell. Maintenance of such a system critically depends on a large body of signalling networks and associated regulatory mechanisms. Of the recurrent manoeuvres in cell signalling, protein phosphorylation is the most prominent, and is used as a switch to transmit information and effect-ate various outcomes. It is estimated that 30% of the entire proteome of a typical eukaryotic cell is phosphorylated at one time or another, almost exclusively at the hydroxyl groups of one or more Seer(S)/Thru(T)/Tyr(Y) residues. This phosphorylation is accomplished through the transfer of g-phosphate of ATP in the presence of cations by a superfamily of enzymes called protein kinases, or STY kinases.
In accordance with widespread phosphorylation events, STY kinases form a large and diverse superfamily, constituting 2% of the proteins encoded in an eukaryotic genome and about 500 proteins in the human proteome. Distantly related STY kinases share less than 20% sequence identity, phosphorylate specific substrates, bind to dis-tint interaction partners, localise in different cellular compartments and are regulated by different mechanisms. Despite flexibly accommodating these specific attributes, all STY kinases share a conserved 3-dimensional fold and retain the catalytic function. Moreover, all STY kinases can be manipulated by the signalling machinery to be in the “on” (functionally active) or “off” (inactive) state, thereby adding another layer of regulatory control. Such versatility of the STY kinase domain in harbouring specific substrate recognition motifs, binding interfaces, domain architectures and functional states makes it one of the most influential players in cell signalling and a desirable drug target.
Despite decades of studies, a comprehensive understanding of the kinase domain, and the features that dictate its catalytic activity and specificity is lacking. This is reflected by the fact that whereas kinases specifically bind and phosphorylate their cognate substrates, most drugs targeted at them are non-specific and beget cross-reactivity. This gap in understanding potentially ensues from an awry outlook of STY kinases from the viewpoint of sequences and structures alone. It is now well established that the function and regulation of a protein molecule, along with its stability and evolution, is closely related to its dynamics. In this premise, this thesis explores the mechanistic and dynamics underpinnings of STY kinases, and interprets them in the light of their multitude of functional responsibilities and specificities
In Chapter 1 of the thesis, we broadly discuss the complexity of cell signalling and the pivotal role of STY kinases in it. After a brief introduction to cell signalling in eukaryotes, several signal cascades mediated by different secondary messengers (camp, cGMP, DAG, IP3) are described. In these signal pathways, modularity is identified as a recurrent theme at all levels of hierarchy: within domain, within protein, within signalling pathway and across signalling pathways. One such modular regulation, protein phosphorylation, is discussed in detail and its catalytic enzyme STY kinase is introduced. An overview and historical perspective of the STY kinase superfamily is presented along with the review of literature pertaining to their sequence, structure and catalytic function characteristics. We note that in the active state, all STY kinases adopt a specific spatial conformation characterised by precise positioning of crucial structural motifs, while the inactive state is usually a case of some deviation from these structural constraints.
Chapter 2 addresses a fundamental question in the protein dynamics and function paradigm. If mobility and dynamics of a protein is intimately coupled to its function, how does it manifest in STY kinases? Is there a discernible inter-relationship be-tween the mobility of an STY kinase and its functional competence? To answer these questions, we collated 55 crystal structures of 14 STY kinases from diverse groups and families, and subjected their kinase catalytic domains to Gaussian network model (GNM) based normal mode analysis (NMA). GNM models the kinase structure as a 3-dimensional mass-spring system in a coarse-grained fashion, with masses/nodes at Ca atom positions. Proximate Ca nodes, within a 7 A˚ distance cut-off, are con-nested by identical virtual springs, resulting in a simplified network of Ca-Ca bonded and non-bonded interactions modelled as harmonic potentials. Based purely on the topology of mechanical constraints imposed by the springs, GNM analytically deter-mines the isotropic vibrational normal modes available to the kinase structure. This method approximates the energy of the protein structure harmonically, and thus any micro-motion of the kinase can be theoretically described by a linear cSombination of the calculated normal modes. It is known from previous studies that the modes of
low frequencies correspond to biologically feasible and meaningful motions like hinge movements, protein folding and catalysis.
We note that the multiple crystal structures analysed in each of the 14 STY kinases are identical in sequence and gross structural fold, and vary only in local backbone conformations corresponding to the functional state of the kinase (active/inactive). Upon examining the fluctuations of kinases in the normal mode of the least frequency (or, global mode), we found systematically higher structural fluctuations in the inactive states when compared to the corresponding active states. This observation held true within individual kinases and across all the 14 kinases. Taken together, a more number of residues have higher fluctuations in the inactive states (n = 1095), than those with higher fluctuations in the active states (n = 525; Chi-square test, p value < 0.05). This skewed fluctuation distribution is in corroboration with higher B-factors and con-formational energies of the inactive state crystal structures. Moreover, high fluctuation is observed across the different inactive forms, except a small fraction of DFG-“in” in-active conformations. Interestingly, the regions of differential fluctuation localised to activation loop, catalytic loop, aC-helix and aG-helix, which are implied in kinase function and regulation. Further investigation of 476 crystal structures of kinase com-plexes with other proteins revealed a remarkable correspondence of these regions of differential fluctuation to contact interfaces. We further verified that this differential fluctuation is not a trivial consequence of bound small molecules or mutations, but an inherent attribute of the kinase catalytic domains.
In Chapter 3, we verified the accuracy of differential fluctuation observed between the active and inactive STY kinases, as perceived from GNM based NMA, using the more rigorous method of molecular dynamics (MD) simulations. GNM is minimal-is tic in that the STY kinase catalytic domain is coarse-grained and reduced to a 3-dimensional mechanical network of Ca atom nodes. Thus, the role of side chains and their biophysical character, intra-protein interactions, mutations and bound factors are grossly overlooked. In this premise, we conducted all-atom MD simulations using AMBER ff14SB force-field of 6 structural variants of cAMP-dependent protein kinase (PKA) for 1 ms each. We chose 2 crystal structures of active and inactive PKA (PDB IDs 3FJQ and 1SYK respectively) whose kinase domains shared high structural similarity (gRMSD = 2.6 A)˚. They were modified in silico to obtain 6 starting structures for MD simulations: phosphorylated kinase domain in active and inactive states, kinase do-main along with its C-terminal tail in active and inactive states, active kinase domain bound to ATP/2Mg2+, and unphosphorylated inactive kinase domain.
In the absence of external domains, the inactive kinase domain conformation elicits higher mobility in terms of Ca RMSD and Ca RMSF than the active kinase domain. Of the 255 residues in PKA, remarkable 198 residues have higher Ca RMSF in the inactive state, with predominant contributions from ATP binding loop, catalytic loop and aG-helix. In the presence of C-terminal tail, the differential mobility of the kinase domain is exaggerated, with 241 out of 255 residues showing higher Ca RMSF in the inactive state. Upon close investigation, we found that in the presence of C-terminal tail, al-though the mobility of residues is generally suppressed in both the functional states, a few functional regions like activation loop and hinge residues experience higher Ca RMSF in the inactive state. This sheds light on the role of C-terminal tail in the dynam-ics of the activation loop, potentially operating through the hinge residues. Absence of phosphorylation in the inactive kinase domain increases the mobility of residues in general, except of those in the aG-helix. When bound to ATP/2Mg2+, active ki-nase domain (active-holo) showed higher mobility than the active-apo and inactive structures, contrary to the previous results and studies. Intrigued, we examined the simulation closely and found a transition of the active-holo structure to another con-formation, named s2, at 450 ns. Upon analysis of the trajectory before the transition, the active-holo form was indeed found to be more stable and less mobile than the inac-tive state(s). Thus, all the inactive variants are found to be consistently more agile and mobile than their active counterparts, in agreement with the results obtained using NMA.
Chapter 4 discusses the transition of the active-holo simulation to a new state, named s2, characterises its structural features and explores the possibility of its func-tional relevance. In the previous chapter, while attempting to verify the presence of differential mobility between various active and inactive forms of PKA through MD simulations, we chanced upon the transition of an active PKA state bound to ATP/2Mg2+ (active-holo) to s2 conformation. The s2 state has a Ca RMSD of up to 4.1 A˚ from the initial starting conformations, mainly contributed by the ATP binding loop, abs-helix, act-helix and age-helix, which are implicated in catalysis and substrate recognition. Once formed, s2 was stable and did not revert back to the active-hole starting structure or any other conformation. We calculated all-vs.-all Ca RMSDs of the conformations sampled during the simulation and identified 3 time periods: 0 - 200 ns of initial conformations similar to the starting structure, 201 - 500 ns of transition, and 501 - 1000 ns of s2 conformations. Principle component analysis (PCA) of the Ca spatial positions during the entire trajectory also categorically exposed two energy wells corresponding to the initial and s2 conformations in the first and second PCs (variance = 56%). Upon systematically comparing the conformers sampled in MD with every known kinase structure, no structure hit with Ca RMSD 2 A˚ was found for conformers sampled after 500 ns, deeming s2 as a novel and hitherto unknown conformation.
Investigation of persistent intra-protein interactions unique to the s2 state revealed two stabilising interactions: a salt bridge between K73 and E106 in the b-sheet behind the ATP binding cleft and a network of hydrophobic interactions anchoring act-helix to the age-helix. Aside from these defining interactions, s2 is also characterised by a higher density of intra-protein hydrogen bond network, which stabilises it further. PCA of the MD trajectory indicates the transition of active-hole to s2 to be a process with at least 2 steps, the first being the salt bridge formation. Evolutionary conservation analysis shows that the crucial residues involved in the s2-specific interactions are not reliably conserved across PKAs of other organisms. However, convergence to s2 may still be feasible through other courses of stabilising interactions. From a functional perspective, the s2 conformation opens up the age-helix away from the kinase core and mildly rearranges the catalytic cleft, thereby unmasking a hotspot for sub-strata binding that was absent in the initial structure. In an attempt to replicate the s2 conformation, we performed 4 repeat simulations of the same active-hole starting structure for 1 ms each. Although two of these independent simulations achieved the K73-E106 salt bridge, none of them cloned the complete extent of transition and con-mergence to s2. Instead, we sampled another set of novel conformations, s3, in one of the repeat simulations indicating a disposition for the ATP bound PKA to sample different conformations. Comparative analysis suggests a potential role of C-terminal tail in stabilising the active-hole conformation in physiological conditions.
Chapter 5 characterises the extent of conservation of structural fluctuations in ho-mologous STY kinases and interprets the observations in the light of their regulatory diversity. Upon establishing that structural fluctuations of STY kinases carry activity-specific information (Chapter 2) and affirming the same using MD simulations (Chap-ter 3), we hypothesised that the mobility of STY kinases is an important consider-action to understand the basis of their regulatory features as well. In that case, one would expect the structural fluctuations to be better conserved in closely related STY kinases than distantly related ones. To test our hypothesis, we collated 73 crystal structures containing an STY kinase domain in the active conformation and subjected them to GNM based NMA as described above. The global mode structural fluctuations of pairs of STY kinases of varying evolutionary divergence (same-protein, within-subfamily, within-family, within-group and across-groups) were analysed. We found that the closely related STY kinase pairs (of same-protein and within-subfamily cate-goriest) have more conserved and better correlated structural fluctuations than those that were distantly related (of within-group and across-group categories). This con-serration of flexibility did not trivially follow from sequence/structure conservation, since a substantial 65.4% of variation in fluctuations was not accounted by variations in sequences and/or structures.
Across the 73 active STY kinases belonging to different groups, we identified a conserved flexibility signature defined by low magnitude fluctuations of residues in and around the catalytic loop. Interestingly, we also identified sub-structural residue-specific fluctuation profiles characteristic of kinases of different categories. Specifically, fluctuation patterns that are statistically unique to kinase groups (AGC, TK) and families (PKA, CDK) were recognised. These fluctuation signatures localise in sites known to participate in protein-protein interactions typical of the kinase group and family concerned. Thus, we report for the first time that residues characterised by fluctuations that are differentially conserved within a group/family are involved in interactions specific to the group/family. Upon the validation of structural fluctuation as an indicative tool to understand kinase-specific interactions, we elucidate an application of this understanding. In SC kinase, we identified local regions around the age-helix to be exhibiting conserved differential fluctuations in comparison to its close relatives EGFR and Abl. Following from the learning that specific fluctuations are correlated with specific binding, we propose this as an attractive target for drug design, with minimal cross-reactivity. Overall, this chapter demonstrates the conservation of fluctuation in STY kinases and underscores the importance of consideration of fluctuations, over and above sequence and structural features, in understanding the roles of sites characteristic of kinases.
Chapter 6 documents the frequency of substitution of structural fluctuations in STY kinases over the course of divergent evolution. So far, we had established that structural fluctuations are evidently distinct in the varied functional states assumed by a single STY kinase (Chapter 2-3). In addition, fluctuations are differentially conserved within closely related kinases, but systematically vary across families (Chapter 5). In this chapter, we quantify the structural fluctuation variations in all residues of STY kinases put together. In a sense, this is the fluctuation space available for STY kinases across their functional states, binding modes, and regulatory mechanisms. To accomplish this, we systematically compiled all known eukaryotic kinase domain structures solved at resolutions better than 3 A˚. These structures were then divided into wild-type (harbouring no mutations and having typical amino acids at critical functional sites), pseudo-kinase (harbouring no mutations, but having unconventional amino acids at critical functional sites), disease mutant (harbouring mutations that have imp-plications in diseases) and mutant of unknown effect (harbouring mutations whose physiological manifestation is unknown) categories. Global mode structural fluctuations were determined for every kinase catalytic domain structure in each of the 4 enlisted categories.
Similar to Benioff and Benioff’s BLOSUM that summarised the probability of all possible amino acid substitutions in homologous proteins, we documented a ma tricks of fluctuation substitution frequency in the conserved regions of wild-type kinases (named FLOSUM). We observe a positive correlation between the mean and variance of structural fluctuations at equivalent residue positions in wild-type kinase structures (Spearman rank order correlation, r = 0.69, p value < 1e 139). This implies that the residues with low flexibility, like catalytic loop, do not adopt diverse fluctuations in different functional states or across kinases. Substitution with any other fluctuation is heavily disfavoured at the lower range of flexibility than at the higher range. While we did not detect apparent differences in the FLOSUMs of wild-type, disease mutant and mutants of unknown effect structures, there is a remarkable distinction in the FLOSUM of pseudo-kinases. Fluctuation substitutions that are traditionally unfavourable in wild-type kinases are freely allowed in pseudo-kinases, thus exhibit-in poor conservative substitution. Over and above the lack of conventional amino acids, poor conservation of structural fluctuations and favourable substitution of de-viand fluctuations could render auxiliary functional character to the kinase domain in pseudo-kinases, despite their structural similarity to STY kinases. Taken together, this study summarises the structural fluctuation landscape of STY kinases in the form of a substitution matrix, which can serve as a model of flexibility substitution during protein evolution.
Encouraged by structural fluctuations being differentially conserved in closely re-lasted kinases (Chapter 5) and conservatively substituted across kinases (Chapter 6), we extended this principle to the sequences of STY kinases in Chapter 7. This chapter reports the development of a method to predict the sites of functional specialisation in kinases, which differentiate one kinase from another, and applies it to all known STY kinase families. These are correlates of kinase-specific functional and regulatory attributes like specific protein-protein interactions, cognate substrate recognition and response to specific signals. Two cardinal properties of family-specific functional sites, viz., differential conservation and discriminatory ability, were used to identify them. We systematically compiled a data set of 5488 kinase catalytic domain sequences be-longing to 107 families. After aligning them into a single multiple sequence alignment, we comparatively analysed the amino acid distributions in topologically equivalent positions of different families. Based on 3 different analytical measures, physicochemical property, Shannon’s entropy and random probability, we scored the differential conservation of every alignment position in each family. By maximising the disc rim-inability between the kinase families, we integrated the results of the three measures and devised a single unified scoring scheme called ID score. This integrated scoring method could distinguish the 107 families from one another with an accuracy of 99.2%.
Each site in every STY kinase family was given a score in the range 0 to 1, with 0 indicating no functional specialisation and 1 indicating maximum functional spa-canalisation, by the ID score. Several validations of the method were carried out to assess its competence. First, we selected those residue positions which have consistently high ID scores across most families. Using these hotspot alignment positions that render specificity to the kinase, we clustered the kinase sequences into groups and families. We found that the ID score predicted sites clustered the kinases better than the traditional clustering using the entire alignment. Despite reduction in information, the increase in accuracy of clustering is feasible because of efficient filtering of non-discriminatory sites by ID score. Second, a linear discriminant classifier was observed to predict the kinase family, based on the ID score predicted sites, better than traditional methods. Third, family-specific protein-protein interaction sites in CDK and substrate recognising distal sites in MAPK were scored significantly higher than other residues by ID score (Two-tailed unpaired t-test, p value < 0.05). Fourth, family-specific oncogenic driver mutation sites in 8 different kinase families were identified confidently by the ID score. Finally, we demonstrate one feasible application of the ID score method in the prediction of specific protein-protein interaction sites. In summary, we developed an integrated discriminatory method to identify regions of functional specialisation in all known kinases, validated the results for known cases and elucidate a potential application of the method.
The learning from the entire thesis work is summarised in Chapter 8, which positions the work in the larger context of functioning of the kinase domain and the use of dynamics to interpret protein functions. The validity of the simple, yet use-full, NMA of proteins and complementary MD simulations to understand basic mechanistic and dynamic properties of proteins is highlighted. Similar to sequence and structure, dynamics is now recognised as a crucial feature holding information about protein function. The main learning of the thesis that the flexibility and mobility of STY kinases is conserved and conservatively substituted at different levels of hierarchy (different functional forms within a kinase, across kinase families and across the entire STY kinase superfamily) is discussed. The contributions of the work in fur-the ring the knowledge of specificity determinants in kinases, which dictate precise regulatory and control mechanisms, are presented.
Supplementary information helpful in understanding of the results of individual chapters, but could not be printed in the thesis due to its length, are provided in an optical disk attached to the thesis. The material in the optical disk is referred to in appropriate places in the individual chapters
|
140 |
Studies on ribosomal oxygenasesSekirnik, Rok January 2014 (has links)
The 2OG oxygenases comprise a superfamily of ferrous iron dependent dioxygenases with multiple biological roles, including in hypoxia sensing, transcriptional control, and splicing control. It was recently proposed that 2OG oxygenases catalyse the hydroxylation of ribosomal proteins in prokaryotes (ycfD) and in humans (NO66 and MINA53), raising the possibility that 2OG oxygenases also control translation. The work described in this thesis concerned investigations on the biochemical and functional aspects of prokaryotic and mammalian ribosomal protein hydroxylases (ROX) in vitro and in cells. An efficient chromatographic system linked to mass spectrometric analysis (LC-MS) was developed for studying the masses of individual ribosomal proteins (>90% coverage of ribosomal proteome) to ±1 Da accuracy. It was demonstrated that ycfD catalyses the hydroxylation of R81 on L16 in E. coli, in a manner dependent on atmospheric oxygen levels. YcfD deletion results in growth phenotype at low temperatures and in minimal medium, and in decreased global translation rates in minimal medium; ycfD deletion does not affect translational accuracy and ribosome assembly. Furthermore, ycfD-deletion results in increased sensitivity to the antibiotics chloramphenicol and lincomycin. Consistent with a 2OG-oxygenase mediated mechanism of antibiotic resistance, chloramphenicol sensitivity of the E. coli wild-type strain could be increased by inhibiting the activity of ycfD by removing co-factors required for catalytic activity (Fe(II) and O2), and, at least in part, by using a ycfD inhibitor, IOX1, which inhibits ycfD with IC<sub>50</sub> of 38 μM in vitro. The therapeutic potential of a post-translational modification mediating antibiotic resistance provides an opportunity for medicinal targeting of ribosome-modifying enzymes, for example ycfD, which may be more ‘druggable’ than the ribosome itself. In co-treatment with an existing antibiotic, such as chloramphenicol, a small molecule inhibitor would achieve a potentiated antibiotic effect. Structural aspects of ROX hydroxylation were pursued by characterising a thermophilic ROX-substrate complex; a ycfD homologue was identified in the thermophilic bacterium Rhodothermus marinus and shown to be a thermophilic 2OG oxygenase ycfD<sub>RM</sub>, acting on R82 of ribosomal protein L16<sub>RM</sub>. The activity of ycfD<sub>RM</sub> in cells was limited at high growth temperature and oxygen solubility was demonstrated as a likely limiting factor of ycfD<sub>RM</sub> activity, thus identifiying a potential 2OG oxygenase oxygen sensor in prokaryotes. A crystal structure of ycfD<sub>RM</sub> in complex with L16RM substrate fragment was determined to 3.0 Å resolution. Structural analyses suggested that ycfD<sub>RM</sub> contains 30% more hydrophobic interactions and 100% more salt-bridge interactions than ycfD<sub>EC</sub>, suggesting that these interactions are important for thermal stabilisation of ycfD<sub>RM</sub>. The structures reveal key interactions required for binding of ribosomal proteins. Substantial structural changes were observed in the presence of the substrate fragment, which implies induced-fit binding of the L16<sub>RM</sub> substrate. The work has informed further structural studies on the evolutionarily related human ROX, NO66 and MINA53, for which substrate structures have been obtained since the completion of the work. The LC-MS analysis of ribosomal proteins was extended to mouse and human cells to demonstrate that the human ROX homologue of ycfD, MINA53, hydroxylates the 60S ribosomal protein rpL27a in cells. It was demonstrated that rpL27a hydroxylation is widespread and found in all mouse organs analysed, as well as in cancer cell lines and in clinical cancer tissues. A partial or complete reduction of rpL27a hydroxylation was observed in a number of clinically identified MINA53 mutations from the COSMIC database of cancer mutations. Structural analysis suggested that mutations occur more frequently at structurally important regions of MINA53, including the βIV-βV insert in the core fold of MINA53. The identification of inhibiting clinical mutations suggests that rpL27a hydroxylation level could be used as a cancer mark, and in the future for selective inhibition by ribosomal antibiotics. The work presented in this thesis demonstrates that it is possible to selectively inhibit modified ribosomes; an inhibitor of unhydroxylated rpL27a could therefore, at least in principle, be active against the sub-set of tumours with inactivating mutation(s) of MINA53, but not normal tissue. Future work should therefore focus on identifying a selective inhibitor of unhydroxylated eukaryotic ribosomes which could be applied for treatment of cancers harbouring deactivating MINA53 mutations. The same approach could be applied to other ribosome modifications (to rRNA, ribosomal proteins, and ribosome-associate factors) that are different in cancer compared to normal cells.
|
Page generated in 0.0848 seconds