Spelling suggestions: "subject:"1protein design"" "subject:"1protein 1design""
41 |
Computational Molecular Engineering Nucleic Acid Binding Proteins and EnzymesReza, Faisal January 2010 (has links)
<p>Interactions between nucleic acid substrates and the proteins and enzymes that bind and catalyze them are ubiquitous and essential for reading, writing, replicating, repairing, and regulating the genomic code by the proteomic machinery. In this dissertation, computational molecular engineering furthered the elucidation of spatial-temporal interactions of natural nucleic acid binding proteins and enzymes and the creation of synthetic counterparts with structure-function interactions at predictive proficiency. We examined spatial-temporal interactions to study how natural proteins can process signals and substrates. The signals, propagated by spatial interactions between genes and proteins, can encode and decode information in the temporal domain. Natural proteins evolved through facilitating signaling, limiting crosstalk, and overcoming noise locally and globally. Findings indicate that fidelity and speed of frequency signal transmission in cellular noise was coordinated by a critical frequency, beyond which interactions may degrade or fail. The substrates, bound to their corresponding proteins, present structural information that is precisely recognized and acted upon in the spatial domain. Natural proteins evolved by coordinating substrate features with their own. Findings highlight the importance of accurate structural modeling. We explored structure-function interactions to study how synthetic proteins can complex with substrates. These complexes, composed of nucleic acid containing substrates and amino acid containing enzymes, can recognize and catalyze information in the spatial and temporal domains. Natural proteins evolved by balancing stability, solubility, substrate affinity, specificity, and catalytic activity. Accurate computational modeling of mutants with desirable properties for nucleic acids while maintaining such balances extended molecular redesign approaches. Findings demonstrate that binding and catalyzing proteins redesigned by single-conformation and multiple-conformation approaches maintained this balance to function, often as well as or better than those found in nature. We enabled access to computational molecular engineering of these interactions through open-source practices. We examined the applications and issues of engineering nucleic acid binding proteins and enzymes for nanotechnology, therapeutics, and in the ethical, legal, and social dimensions. Findings suggest that these access and applications can make engineering biology more widely adopted, easier, more effective, and safer.</p> / Dissertation
|
42 |
Protein surface charge of trypsinogen changes its activation patternBuettner, Karin, Kreisig, Thomas, Sträter, Norbert, Züchner, Thole 21 January 2015 (has links) (PDF)
Background: Trypsinogen is the inactive precursor of trypsin, a serine protease that cleaves proteins and peptides after arginine and lysine residues. In this study, human trypsinogen was used as a model protein to study the influence of electrostatic forces on protein–protein interactions. Trypsinogen is active only after its eight-amino-acid-long activation peptide has been cleaved off by another protease, enteropeptidase. Trypsinogen can also be autoactivated without the involvement of enteropeptidase. This autoactivation process can occur if a trypsinogen molecule is activated by another trypsin molecule and therefore is based on a protein–protein interaction. Results: Based on a rational protein design based on autoactivation-defective guinea pig trypsinogen, several amino acid residues, all located far away from the active site, were changed to modify the surface charge of human trypsinogen. The influence of the surface charge on the activation pattern of trypsinogen was investigated. The autoactivation properties of mutant trypsinogen were characterized in comparison to the recombinant wild-type enzyme. Surface-charged trypsinogen showed practically no autoactivation compared to the wild-type but could still be activated by enteropeptidase to the fully active trypsin. The kinetic parameters of surface-charged trypsinogen were comparable to the recombinant wild-type enzyme. Conclusion: The variant with a modified surface charge compared to the wild-type enzyme showed a complete different activation pattern. Our study provides an example how directed modification of the protein surface charge can be utilized for the regulation of functional protein–protein interactions, as shown here for human trypsinogen.
|
43 |
Catalysis and Site-Specific Modification of Glutathione Transferases Enabled by Rational DesignHåkansson Hederos, Sofia January 2005 (has links)
This thesis describes the rational design of a novel enzyme, a thiolester hydrolase, derived from human glutathione transferase (GST) A1-1 by the introduction of a single histidine residue. The first section of the thesis describes the design and the determination of the reaction mechanism. The design was based on the crystal structure of human GST A1-1 complexed with S-benzylglutathione. The resulting enzyme, A216H, catalyzed the hydrolysis of the non-natural substrate GSB, a thiolester of glutathione and benzoic acid. The reaction followed saturation kinetics with a kcat of 0.00078 min-1 and KM of 5 μM. The rate constant ratio, (kcat/KM)/kuncat, was found to be more than 107 M-1. The introduction of a single His residue in position 216 opened up a novel reaction pathway in human GST A1-1 and is a nice example of catalytic promiscuity. The substrate requirements were investigated and A216H was found to be selective since only two out of 18 GS-thiolesters tested were substrates for A216H. The reaction mechanism of the A216H-catalyzed hydrolysis of GSB was determined and found to proceed via an acyl intermediate at Y9. The hydrolysis was catalyzed by H216 that acts as a general base and the deacylation was found to be the rate-determining step. The Y9-intermediate could be selectively trapped by oxygen nucleophiles and primary alcohols, in particular 1-propanol and trifluoroethanol, were the most efficient. In addition, saturation kinetics was obtained in the acyl transfer reaction with 1-propanol indicating the presence of a second binding site in A216H. The second section of this thesis describes the site-specific covalent modification of human GST A1-1. The addition of GSB to the wild-type protein results in a site-specific benzoylation of only one tyrosine residue, Y9, out of ten present in the protein (one out of totally 51 nucleophiles). The reaction was tested with five GST classes (Alpha, Mu, Pi, Theta and Omega) and found to be specific for the Alpha class isoenzymes. The covalent modification reaction was further refined to target a single lysine residue, K216, providing a more stable linkage in the form of an amide bond. The reaction was found to be versatile and approximately 50% of the GS-thiolesters tested acylated K216, including a fluorophore. / <p>On the day of the public defence the status of article II was: Submitted and article IV was: In press.</p>
|
44 |
Robotics-inspired methods to enhance protein design / Méthodes inspirées de la robotique pour l’aide à la conception de protéinesDenarie, Laurent 12 April 2017 (has links)
La conception de protéines ayant des propriétés spécifiques représente un enjeu majeur pour la pharmacologie et les bio-technologies. Malgré les progrès des méthodes de CAO développées pour la conception de protéines, une limitation majeure des techniques existantes vient de la difficulté à prendre en compte la mobilité du squelette protéique, afin de mieux capturer l’ensemble des propriétés des protéines candidates et garantir la bonne stabilité de la protéine choisie dans la conformation voulue. De plus, si des méthodes de conception multi-états ont été proposées, elles ne permettent pas de garantir l’existence d’une trajectoire réaliste entre ces états. De ce fait, la conception de protéines devant permettre la transition entre plusieurs états reste un problème hors de la portée des méthodes actuelles. Cette thèse explore comment des algorithmes inspirés de la robotique peuvent être utilisés pour explorer l’espace conformationnel de manière efficace afin d’améliorer les méthodes de conception de protéines en prenant en compte de manière plus poussée la flexibilité de leur squelette. Ce travail pose également un premier jalon vers une méthode de conception adaptée à la réalisation d’un mouvement de la protéine. / The ability to design proteins with specific properties would yield great progress in pharmacology and bio-technologies. Methods to design proteins have been developed since a few decades and some relevant achievements have been made including de novo protein design. Yet, current approaches suffer some serious limitations. By not taking protein’s backbone motions into account, they fail at capturing some of the properties of the candidate design and cannot guarantee that the solution will in fact be stable for the goal conformation. Besides, although multi-states design methods have been proposed, they do not guarantee that a feasible trajectory between those states exists, which means that design problem involving state transitions are out of reach of the current methods. This thesis investigates how robotics-inspired algorithms can be used to efficiently explore the conformational landscape of a protein aiming to enhance protein design methods by introducing additional backbone flexibility. This work also provides first milestones towards protein motion design.
|
45 |
Prediction of designer-recombinases for DNA editing with generative deep learningSchmitt, Lukas Theo, Paszkowski-Rogacz, Maciej, Jug, Florian, Buchholz, Frank 04 June 2024 (has links)
Site-specific tyrosine-type recombinases are effective tools for genome engineering, with the first engineered variants having demonstrated therapeutic potential. So far, adaptation to new DNA target site selectivity of designerrecombinases has been achieved mostly through iterative cycles of directed molecular evolution. While effective, directed molecular evolution methods are laborious and time consuming. Here we present RecGen (Recombinase Generator), an algorithm for the intelligent generation of designerrecombinases. We gather the sequence information of over one million Crelike recombinase sequences evolved for 89 different target sites with whichwe train Conditional Variational Autoencoders for recombinase generation. Experimental validation demonstrates that the algorithm can predict recombinase sequences with activity on novel target-sites, indicating that RecGen is useful to accelerate the development of future designer-recombinases.
|
46 |
Computational De Novo Design of Peptide Binders for Modulating TAR DNA-Binding Protein 43 Aggregation in Neurodegenerative DiseasesHuang, Jinling January 2024 (has links)
The human TAR DNA-binding protein 43 (TDP-43) is crucial for regulating cellular processes such as transcription, RNA splicing, and mRNA transport and translation. However, its abnormal cytoplasmic aggregation disrupts these functions and contributes to the development of many neurodegenerative diseases. The aim of this study is to design and evaluate cyclic peptide binders capable of modulating the TDP-43 aggregation. Using the AfDesign tool, we generated 84 cyclic peptide binders of varying lengths (10 to 13 amino acids). Analysis revealed 17 binders with pLDDT scores above 70, indicating stable conformation, and 43 binders with pAE scores below 10, suggesting strong binding affinity to the target protein. Further binding affinity analysis with PRODIGY confirmed these results, identifying binders with low dissociation constant. Molecular dynamics (MD) simulations indicated that peptide binders effectively delayed TDP-43 aggregation. However, permeability studies using Steered Molecular Dynamics (SMD) simulation showed that the designed binders had a low permeability coefficient (2.18x10-15 cm/sec), significantly lower than the benchmark for effective permeability. This highlights the need for further optimization to enhance the binders’ permeability. Future work will involve scaling up the design process, improving screening techniques, and employing advanced simulation methods to achieve better insights and more effective peptide binders.
|
47 |
Molecular Simulation of Mutation Effects on Protein Folding and FunctionNovack, Dylan, 0000-0003-1434-0316 06 1900 (has links)
The amino acid sequence of a protein encodes its folding, the reaction by which a peptide self-assembles into its native functional shape. A folded protein will then go on to carry on biological functions such as ligand binding, signaling, mechanical functions, or biomolecular catalysis. While much experimental work has been done to elucidate protein structure and conformational dynamics, Molecular Dynamics (MD) simulations have been necessary to provide atomic-level details of biomolecular dynamics. A challenge in using MD is the the large computational cost required to reach biologically relevant timescales when integration steps are typically limited to a few femtoseconds. To address this challenge, specialized hardware such as the ANTON supercomputer or distributed computing platforms like Folding@home can be used to collect milliseconds of aggregate trajectory data. From these datasets, kinetic network models called Markov state models (MSMs) can be constructed to infer long timescale dynamics from ensembles of short trajectories. These models can be analyzed in a human interpretable way and make physics-based connections to experimental observables. This dissertation describes how we have used MD simulations and MSMs to model protein folding reactions and protein-protein binding reactions to better understand mutation effects on our systems of interest.
The first chapter of this dissertation describes MD simulations FOXO1 FKH domain folding, which we used MSMs to characterize at atomic resolution. To predict how mutations found in diffuse large B-cell lymphoma (DLBCL) cell lines effect protein stability, we developed an MSM-based hydrophobic free energy of transfer (HT) model to estimate mutation effects. Our HT model results agree better with experiment than other state-of-the-art computational methods. Chapter two describes how we have used approximately 43000 relative binding free energy calculations via the expanded ensemble (EE) method to perform in silico site saturation mutagenesis on miniprotein binders to the highly conserved influenza A H1 hemagglutinin stem region (HA2) de novo designed by the Baker [66]. These miniproteins were selected through an exhaustive design process with iterations of computational design, high throughput affinity screens, and site saturation mutagenesis. We compare our EE SSM method with inferred relative affinities from Chevalier et al.[66], as well as with the state-of-the-art Rosetta method Flex ddG. While Flex ddG predictions are more accurate on average, they are highly conservative. In contrast, EE predictions can better classify stabilizing and destabilizing mutations. We also use a Shannon entropy based method to identify residue positions that are more susceptible to mutation. This work suggests that simulation-based free energy methods can provide predictive information for in silico affinity maturation of designed miniproteins, with many feasible improvements to the efficiency and accuracy within reach. In the final chapter, we atttempt to model the complete binding reactions of the 6 miniproteins mentioned above. We used unbiased simulations to build standard msms and , in combination with biased simulations, multiensemble markov models (MEMMs) of binding for each wild type and affinity matured pair. The unbiased MSMs show that the affinity matured miniproteins prefer different bound states than the wild type miniproteins. Additionally, they provide physically realistic k_on_s and a macroscopic 3 state pathway through an encounter complex. We characterize each of those states and use an contact map based structural similarity index measure (SSIM) and residue-wise Kullback-Leibler divergence method to better understand the differences in the bound states between affinity matured and wild type construct. Interestingly, while our biased simulations do see unbinding transitions, in estimating the MEMM, they overweight the unbinding reaction and unbound state, leading to models that do not make physical sense. This demonstrates that more sensitive enhances sampling techniques may be necessary for building MEMMs. The final two chapters of this dissertation present new methodologies for computational protein design, making great strides towards a dynamic understanding of how proteins bind their targets and how mutations effect those reactions. / Chemistry
|
48 |
Inferences on Structure and Function of Proteins from Sequence Data : Development of Methods and ApplicationsMudgal, Richa January 2015 (has links) (PDF)
Structural and functional annotation of sequences of putative proteins encoded in the newly sequenced genomes pose an important challenge. While much progress has been made towards high throughput experimental techniques for structure determination and functional assignment to proteins, most of the current genome-wide annotation systems rely on computational methods to derive cues on structure and function based on relationship with related proteins of known
structure and/or function. Evolutionary pressure on proteins, forces the retention of sequence features that are important for structure and function. Thus, if it can be established that two proteins have descended from a common ancestor, then it can be inferred that the structural fold
and biological function of the two proteins would be similar. Homology based information
transfer from one protein to another has played a central role in the understanding of evolution of protein structures, functions and interactions. Many algorithmic improvements have been developed over the past two decades to recognize homologues of a protein from sequence-based
searches alone, but there are still a large number of proteins without any functional annotation. The sensitivity of the available methods can be further enhanced by indirect comparisons with the help of intermediately-related sequences which link related families. However, sequence-based
homology searches in the current protein sequence space are often restricted to the family members, due to the paucity of natural intermediate sequences that can act as linkers in detecting remote homologues. Thus a major goal of this thesis is to develop computational methods to fill up the sparse regions in the protein sequence space with computationally designed protein-like
sequences and thereby create a continuum of protein sequences, which could aid in detecting remote homologues. Such designed sequences are further assessed for their effectiveness in detection of distant evolutionary relationships and functional annotation of proteins with unknown
structure and function. Another important aspect in structural bioinformatics is to gain a good understanding of protein sequence - structure - function paradigm. Functional annotations by comparisons of protein sequences can be further strengthened with the addition of structural information; however, instances of functional divergence and convergence may lead to functional
mis-annotations. Therefore, a systematic analysis is performed on the fold–function associations using binding site information and their inter-relationships using binding site similarity networks.
Chapter 1 provides a background on proteins, their evolution, classification and structural and functional features. This chapter also describes various methods for detection of remote similarities and the role of protein sequence design methods in detection of distant relatives for
protein annotation. Pitfalls in prediction of protein function from sequence and structure are also discussed followed by an outline of the thesis.
Chapter 2 addresses the problem of paucity of available protein sequences that can act as linkers between distantly related proteins/families and help in detection of distant evolutionary relationships. Previous efforts in protein sequence design for remote homology detection and design of sequences corresponding to specific protein families are discussed. This chapter describes a novel methodology to computationally design intermediately-related protein sequences between two related families and thus fill-in the gaps in the sequence space between the related families. Protein families as defined in SCOP database are represented as position specific scoring matrices (PSSMs) and these profiles of related protein families within a fold are aligned using AlignHUSH -a profile-profile alignment method. Guided by this alignment, the frequency distribution of the amino acids in the two families are combined and for each aligned position a residue is selected based on the combined probability to occur in the alignment positions of two families. Each computationally designed sequence is then subjected to RPS-BLAST searches against an all profile pool representing all protein families. Artificial sequences that detect both the parent profiles with no hits corresponding to other folds qualify as ‘designed intermediate sequences’. Various scoring schemes and divergence levels for the design of protein-like sequences are investigated such that these designed sequences intersperse between two related families, thereby creating a continuum in sequence space. The method is then applied on a large scale for all folds with two or more families and resulted in the design of 3,611,010 intermediately-related sequences for 27,882 profile-profile alignments corresponding to 374 folds. Such designed sequences are generic in nature and can be augmented in any sequence database of natural protein sequences. Such enriched databases can then be queried using any sequence-based remote homology detection method to detect distant relatives.
The next chapter (Chapter 3) explores the ability of these designed intermediate sequences to act as linkers of two related families and aid in detection of remote homologues. To assess the applicability of these designed sequences two types of databases have been generated, namely a CONTROL database containing protein sequences from natural sequence databases and an AUGMENTED database in which designed sequences are included in the database of natural sequences. Detailed assessments of the utility of such designed sequences using traditional sequence-based searches in the AUGMENTED database showed an enhanced detection of remote homologues for almost 74% of the folds. For over 3,000 queries, it is demonstrated that designed sequences are positioned as suitable linkers, which mediate connections between distantly related proteins. Using examples from known distant evolutionary relationships, we demonstrate that homology searches in augmented databases show an increase of up to 22% in the number of /correct evolutionary relationships "discovered". Such connections are reported with high sensitivities and very low false positive rates. Interestingly, they fill-in void and sparse regions in sequence space and relate distant proteins not only through multiple routes but also through
SCOP-NrichD database, SUPFAM+ database, SUPERFAMILY database, protein domain library queried by pDomTHREADER and HHsearch against HMM library of SCOP families. This approach detected evolutionary relationships for almost 20% of all the families with no known structure or function. Detailed report of predictions for 614 DUFs, their fold and species distribution are provided in this chapter. These predictions are then enriched with GO terms and enzyme information wherever available. A detailed discussion is provided for few of the interesting assignments: DUF1636, DUF1572 and DUF2092 which are functionally annotated as thioredoxin-like 2Fe-2S ferredoxin, putative metalloenzyme and lipoprotein localization factors respectively. These 614 novel structure-function relationships of which 193 are supported by consensus between at least two of the five methods, can be accessed from http://proline.biochem.iisc.ernet.in/RHD_DUFS/.
Protein functions can be appreciated better in the light of evolutionary information from their structures. Chapter 6 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important to obtain a better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Many structural genomics initiative projects have made considerable efforts in solving structures and bridging the growing gap between protein sequences and their structures. The results of such experiments suggest that often the newly solved structure using X-ray crystallography or NMR methods has structural similarity to a protein with already known structure. These relationships often remain undetected due to unavailability of structural information. Therefore, SUPFAM+ database aims to detect such distant relationships between Pfam families by mapping the Pfam families and SCOP domain families. The work presented in this chapter describes the generation of SUPFAM+ database using a sensitive AlignHUSH method to uncover hidden relationships. Firstly, Pfam families are queried against a profile database of SCOP families to derived Pfam-SCOP associations, and then Pfam families are queried against Pfam database to derive Pfam-Pfam relationships. Pfam families that remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families that are already mapped to a SCOP family. The criteria are kept stringent for these mappings to minimize the rate of false positives. In case of a Pfam family mapping to two or more SCOP superfamilies, a decision tree is implemented to assign the Pfam family to a single SCOP superfamily. Using these direct and indirect evolutionary relationships present in the SCOP database, associations between Pfam families are derived. Therefore, relationship between two Pfam families that do not have significant sequence similarity can be identified if both are related to same SCOP superfamily. Almost 36% of the Pfam families could be mapped to SCOP families through direct or indirect association. These Pfam-SCOP associations are grouped into 1,646 different superfamilies and cataloguing changes that occur in the binding sites between two functions, which are analysed in this study to trace possible routes between different functions in evolutionarily related enzymes.
The main conclusions of the entire thesis are summarized in Chapter 8, contributing in the area of remote homology detection from sequence information alone and understanding the ‘sequence-structure-function’ paradigm from a binding site perspective. The chapter illustrates the importance of the work presented here in the post-genomic era. The development of the algorithm for the design of ‘intermediately-related sequences’ that could serve as effective linkers in remote homology detection, its subsequent large scale assessment and amenability to be augmented into any protein sequence database and exploration by any sequence-based search method is highlighted. Databases in the NrichD resource are made available in the public domain along with a portal to design artificial sequence for or between protein families. This thesis also provides useful and meaningful predictions for protein families with yet unknown structure and function using NrichD database as well as four other state-of-the-art sequence-based remote homology detection methods. A different aspect addressed in this thesis provides a fundamental understanding of the relationships between protein structure and functions. Evolutionary relationships between functional families are identified using the inherent structural information for these families and fold-function relationships are studied from a perspective of similarities in their binding sites. Such studies help in the area of functional annotation, polypharmacology and protein engineering.
Chapter 2 addresses the problem of paucity of available protein sequences that can act as linkers between distantly related proteins/families and help in detection of distant evolutionary relationships. Previous efforts in protein sequence design for remote homology detection and design of sequences corresponding to specific protein families are discussed. This chapter describes a novel methodology to computationally design intermediately-related protein sequences between two related families and thus fill-in the gaps in the sequence space between the related families. Protein families as defined in SCOP database are represented as position specific scoring matrices (PSSMs) and these profiles of related protein families within a fold are aligned using AlignHUSH -a profile-profile alignment method. Guided by this alignment, the frequency distribution of the amino acids in the two families are combined and for each aligned position a residue is selected based on the combined probability to occur in the alignment positions of two families. Each computationally designed sequence is then subjected to RPS-BLAST searches against an all profile pool representing all protein families. Artificial sequences that detect both the parent profiles with no hits corresponding to other folds qualify as ‘designed intermediate sequences’. Various scoring schemes and divergence levels for the design of protein-like sequences are investigated such that these designed sequences intersperse between two related families, thereby creating a continuum in sequence space. The method is then applied on a large scale for all folds with two or more families and resulted in the design of 3,611,010 intermediately-related sequences for 27,882 profile-profile alignments corresponding to 374 folds. Such designed sequences are generic in nature and can be augmented in any sequence database of natural protein sequences. Such enriched databases can then be queried using any sequence-based remote homology detection method to detect distant relatives.
The next chapter (Chapter 3) explores the ability of these designed intermediate sequences to act as linkers of two related families and aid in detection of remote homologues. To assess the applicability of these designed sequences two types of databases have been generated, namely a CONTROL database containing protein sequences from natural sequence databases and an AUGMENTED database in which designed sequences are included in the database of natural sequences. Detailed assessments of the utility of such designed sequences using traditional sequence-based searches in the AUGMENTED database showed an enhanced detection of remote homologues for almost 74% of the folds. For over 3,000 queries, it is demonstrated that designed sequences are positioned as suitable linkers, which mediate connections between distantly related proteins. Using examples from known distant evolutionary relationships, we demonstrate that homology searches in augmented databases show an increase of up to 22% in the number of /correct evolutionary relationships "discovered". Such connections are reported with high sensitivities and very low false positive rates. Interestingly, they fill-in void and sparse regions in sequence space and relate distant proteins not only through multiple routes but also through
SCOP-NrichD database, SUPFAM+ database, SUPERFAMILY database, protein domain library queried by pDomTHREADER and HHsearch against HMM library of SCOP families. This approach detected evolutionary relationships for almost 20% of all the families with no known structure or function. Detailed report of predictions for 614 DUFs, their fold and species distribution are provided in this chapter. These predictions are then enriched with GO terms and enzyme information wherever available. A detailed discussion is provided for few of the interesting assignments: DUF1636, DUF1572 and DUF2092 which are functionally annotated as thioredoxin-like 2Fe-2S ferredoxin, putative metalloenzyme and lipoprotein localization factors respectively. These 614 novel structure-function relationships of which 193 are supported by consensus between at least two of the five methods, can be accessed from http://proline.biochem.iisc.ernet.in/RHD_DUFS/.
Protein functions can be appreciated better in the light of evolutionary information from their structures. Chapter 6 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important to obtain a better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Many structural genomics initiative projects have made considerable efforts in solving structures and bridging the growing gap between protein sequences and their structures. The results of such experiments suggest that often the newly solved structure using X-ray crystallography or NMR methods has structural similarity to a protein with already known structure. These relationships often remain undetected due to unavailability of structural information. Therefore, SUPFAM+ database aims to detect such distant relationships between Pfam families by mapping the Pfam families and SCOP domain families. The work presented in this chapter describes the generation of SUPFAM+ database using a sensitive AlignHUSH method to uncover hidden relationships. Firstly, Pfam families are queried against a profile database of SCOP families to derived Pfam-SCOP associations, and then Pfam families are queried against Pfam database to derive Pfam-Pfam relationships. Pfam families that remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families that are already mapped to a SCOP family. The criteria are kept stringent for these mappings to minimize the rate of false positives. In case of a Pfam family mapping to two or more SCOP superfamilies, a decision tree is implemented to assign the Pfam family to a single SCOP superfamily. Using these direct and indirect evolutionary relationships present in the SCOP database, associations between Pfam families are derived. Therefore, relationship between two Pfam families that do not have significant sequence similarity can be identified if both are related to same SCOP superfamily. Almost 36% of the Pfam families could be mapped to SCOP families through direct or indirect association. These Pfam-SCOP associations are grouped into 1,646 different superfamilies and cataloguing changes that occur in the binding sites between two functions, which are analysed in this study to trace possible routes between different functions in evolutionarily related enzymes.
The main conclusions of the entire thesis are summarized in Chapter 8, contributing in the area of remote homology detection from sequence information alone and understanding the ‘sequence-structure-function’ paradigm from a binding site perspective. The chapter illustrates the importance of the work presented here in the post-genomic era. The development of the algorithm for the design of ‘intermediately-related sequences’ that could serve as effective linkers in remote homology detection, its subsequent large scale assessment and amenability to be augmented into any protein sequence database and exploration by any sequence-based search method is highlighted. Databases in the NrichD resource are made available in the public domain along with a portal to design artificial sequence for or between protein families. This thesis also provides useful and meaningful predictions for protein families with yet unknown structure and function using NrichD database as well as four other state-of-the-art sequence-based remote homology detection methods. A different aspect addressed in this thesis provides a fundamental understanding of the relationships between protein structure and functions. Evolutionary relationships between functional families are identified using the inherent structural information for these families and fold-function relationships are studied from a perspective of similarities in their binding sites. Such studies help in the area of functional annotation, polypharmacology and protein engineering.
|
49 |
Topology-based Sequence Design For Proteins Structures And Statistical Potentials Sensitive To Local EnvironmentsJha, Anupam Nath 11 1900 (has links) (PDF)
Proteins, which regulate most of the biological activities, perform their functions through their unique three-dimensional structures. The folding process of this three dimensional structure from one dimensional sequence is not well understood. The available facts infer that the protein structures are mostly conserved while sequences are more tolerant to mutations
i.e. a number of sequences can adopt the same fold. These arch of optimal sequences for a chosen conformation is known as inverse protein folding and this thesis takes this approach to solve the enigmatic problem.
This thesis presents a protein sequence design method based on the native state topology of protein structure. The structural importance of the amino acid positions has been converted into the topological parameter of the protein conformation. This scheme of extraction of topology of structures has been successfully applied on three dimensional lattice structures and in turn sequences with minimum energy for a given structure are obtained. This technique along with the reduced amino cid alphabet(A reduced amino acid alphabet is any clustering of twenty amino acids based on some measure of the irrelative similarity) has been applied on the protein structures and hence designed optimal amino acid sequences for a given structure. These designed sequences are energetically much better than the native amino acid sequence. The utility of this method is further confirmed by showing the similarity between naturally occurring and the designed sequences. In summary, a computationally efficient method of designing optimal sequences for a given structure is given.
The physical interaction energy between the amino acids is an important part of study of protein-protein interaction, structure prediction, modeling and docking etc. The local environment of amino acids makes a difference between the same amino acid pairs in the protein structure and so the pair-wise interaction energy of amino acid residues should depend on the irrespective environment. A local environment depended knowledge based potential energy function is developed in this thesis. Two different environments, one of these is the local degree (number of contacts) and the other is the secondary structural element of amino acids, have been considered. The investigations have shown that the environment-based interaction preferences for amino acids is able to provide good potential energy functions which perform exceedingly well in discriminating the native structure from the structures with random interactions.
Further, the membrane proteins are located in a completely different physico-chemical environment with different amino acid composition than the water soluble proteins. This work provides reliable potential energy functions which take care of different environment for the investigation(model/predict) of the structure of helical membrane proteins. Three different environments, parallel and perpendicular to the lipid bilayer and number of amino acid contacts, are explored to analyze the environmental effects on the potential functions. These environment dependent scoring functions perform exceedingly well indiscriminating the native sequence from a set of random sequences.
Hydrophobicity of amino acids is a measure of buriedness or exposure to the aqueous environment. The lack of uniformity within the protein environment gives rise to the different values of hydrophobicity for the same amino acids, which completely depends on its location inside the protein.The contact based environment dependent hydrophobicity values of all amino acids, separately for globular and membrane proteins, have also been evaluated in this thesis.
Apart from developing scoring functions, the packing of helices in membrane proteins is investigated by an approach based on the local backbone geometry and side chain atom-atom contacts of amino acids. A parameter defined in this study is able to capture the essential features of inter-helical packing, which may prove to be useful in modeling of helical membrane proteins.
In conclusion, this thesis has described a novel technique to design the energetically minimized amino acid sequences which can fold in to a given conformation. Also the environment dependent interaction preference of amino acids in globular proteins is captured an efficient manner. Specially, the environment dependent scoring function for helical membrane proteins is a first successful attempt in this direction.
|
50 |
Characterization of the Protein Lysine Methyltransferase SMYD2Lanouette, Sylvain January 2015 (has links)
Our understanding of protein lysine methyltransferases and their substrates remains limited despite their importance as regulators of the proteome. The SMYD (SET and MYND domain) methyltransferase family plays pivotal roles in various cellular processes, including transcriptional regulation and embryonic development. Among them, SMYD2 is associated with oesophageal squamous cell carcinoma, bladder cancer and leukemia as well as with embryonic development. Initially identified as a histone methyltransferase, SMYD2 was later reported to methylate p53, the retinoblastoma protein pRb and the estrogen receptor ERalpha and to regulate their activity. Our proteomic and biochemical analyses demonstrated that SMYD2 also methylates the molecular chaperone HSP90 on K209 and K615. We also showed that HSP90 methylation is regulated by HSP90 co-chaperones, pH, and the demethylase LSD1. Further methyltransferase assays demonstrated that SMYD2 methylates lysine K* in proteins which include the sequence [LFM]-₁-K*-[AFYMSHRK]+₁-[LYK]+₂. This motif allowed us to show that SMYD2 methylates the transcriptional co-repressor SIN3B, the RNA helicase DHX15 and the myogenic transcription factors SIX1 and SIX2. Finally, muscle cell models suggest that SMYD2 methyltransferase activity plays a role in preventing premature myogenic differentiation of proliferating myoblasts by repressing muscle-specific genes. Our work thus shows that SMYD2 methyltransferase activity targets a broad array of substrates in vitro and in situ and is regulated by intricate mechanisms.
|
Page generated in 0.0524 seconds