• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 29
  • 9
  • 7
  • 2
  • 2
  • Tagged with
  • 57
  • 57
  • 19
  • 14
  • 12
  • 11
  • 9
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Computational Molecular Engineering Nucleic Acid Binding Proteins and Enzymes

Reza, Faisal January 2010 (has links)
<p>Interactions between nucleic acid substrates and the proteins and enzymes that bind and catalyze them are ubiquitous and essential for reading, writing, replicating, repairing, and regulating the genomic code by the proteomic machinery. In this dissertation, computational molecular engineering furthered the elucidation of spatial-temporal interactions of natural nucleic acid binding proteins and enzymes and the creation of synthetic counterparts with structure-function interactions at predictive proficiency. We examined spatial-temporal interactions to study how natural proteins can process signals and substrates. The signals, propagated by spatial interactions between genes and proteins, can encode and decode information in the temporal domain. Natural proteins evolved through facilitating signaling, limiting crosstalk, and overcoming noise locally and globally. Findings indicate that fidelity and speed of frequency signal transmission in cellular noise was coordinated by a critical frequency, beyond which interactions may degrade or fail. The substrates, bound to their corresponding proteins, present structural information that is precisely recognized and acted upon in the spatial domain. Natural proteins evolved by coordinating substrate features with their own. Findings highlight the importance of accurate structural modeling. We explored structure-function interactions to study how synthetic proteins can complex with substrates. These complexes, composed of nucleic acid containing substrates and amino acid containing enzymes, can recognize and catalyze information in the spatial and temporal domains. Natural proteins evolved by balancing stability, solubility, substrate affinity, specificity, and catalytic activity. Accurate computational modeling of mutants with desirable properties for nucleic acids while maintaining such balances extended molecular redesign approaches. Findings demonstrate that binding and catalyzing proteins redesigned by single-conformation and multiple-conformation approaches maintained this balance to function, often as well as or better than those found in nature. We enabled access to computational molecular engineering of these interactions through open-source practices. We examined the applications and issues of engineering nucleic acid binding proteins and enzymes for nanotechnology, therapeutics, and in the ethical, legal, and social dimensions. Findings suggest that these access and applications can make engineering biology more widely adopted, easier, more effective, and safer.</p> / Dissertation
42

Protein surface charge of trypsinogen changes its activation pattern

Buettner, Karin, Kreisig, Thomas, Sträter, Norbert, Züchner, Thole 21 January 2015 (has links) (PDF)
Background: Trypsinogen is the inactive precursor of trypsin, a serine protease that cleaves proteins and peptides after arginine and lysine residues. In this study, human trypsinogen was used as a model protein to study the influence of electrostatic forces on protein–protein interactions. Trypsinogen is active only after its eight-amino-acid-long activation peptide has been cleaved off by another protease, enteropeptidase. Trypsinogen can also be autoactivated without the involvement of enteropeptidase. This autoactivation process can occur if a trypsinogen molecule is activated by another trypsin molecule and therefore is based on a protein–protein interaction. Results: Based on a rational protein design based on autoactivation-defective guinea pig trypsinogen, several amino acid residues, all located far away from the active site, were changed to modify the surface charge of human trypsinogen. The influence of the surface charge on the activation pattern of trypsinogen was investigated. The autoactivation properties of mutant trypsinogen were characterized in comparison to the recombinant wild-type enzyme. Surface-charged trypsinogen showed practically no autoactivation compared to the wild-type but could still be activated by enteropeptidase to the fully active trypsin. The kinetic parameters of surface-charged trypsinogen were comparable to the recombinant wild-type enzyme. Conclusion: The variant with a modified surface charge compared to the wild-type enzyme showed a complete different activation pattern. Our study provides an example how directed modification of the protein surface charge can be utilized for the regulation of functional protein–protein interactions, as shown here for human trypsinogen.
43

Catalysis and Site-Specific Modification of Glutathione Transferases Enabled by Rational Design

Håkansson Hederos, Sofia January 2005 (has links)
This thesis describes the rational design of a novel enzyme, a thiolester hydrolase, derived from human glutathione transferase (GST) A1-1 by the introduction of a single histidine residue. The first section of the thesis describes the design and the determination of the reaction mechanism. The design was based on the crystal structure of human GST A1-1 complexed with S-benzylglutathione. The resulting enzyme, A216H, catalyzed the hydrolysis of the non-natural substrate GSB, a thiolester of glutathione and benzoic acid. The reaction followed saturation kinetics with a kcat of 0.00078 min-1 and KM of 5 μM. The rate constant ratio, (kcat/KM)/kuncat, was found to be more than 107 M-1. The introduction of a single His residue in position 216 opened up a novel reaction pathway in human GST A1-1 and is a nice example of catalytic promiscuity. The substrate requirements were investigated and A216H was found to be selective since only two out of 18 GS-thiolesters tested were substrates for A216H. The reaction mechanism of the A216H-catalyzed hydrolysis of GSB was determined and found to proceed via an acyl intermediate at Y9. The hydrolysis was catalyzed by H216 that acts as a general base and the deacylation was found to be the rate-determining step. The Y9-intermediate could be selectively trapped by oxygen nucleophiles and primary alcohols, in particular 1-propanol and trifluoroethanol, were the most efficient. In addition, saturation kinetics was obtained in the acyl transfer reaction with 1-propanol indicating the presence of a second binding site in A216H. The second section of this thesis describes the site-specific covalent modification of human GST A1-1. The addition of GSB to the wild-type protein results in a site-specific benzoylation of only one tyrosine residue, Y9, out of ten present in the protein (one out of totally 51 nucleophiles). The reaction was tested with five GST classes (Alpha, Mu, Pi, Theta and Omega) and found to be specific for the Alpha class isoenzymes. The covalent modification reaction was further refined to target a single lysine residue, K216, providing a more stable linkage in the form of an amide bond. The reaction was found to be versatile and approximately 50% of the GS-thiolesters tested acylated K216, including a fluorophore. / <p>On the day of the public defence the status of article II was: Submitted and article IV was: In press.</p>
44

Robotics-inspired methods to enhance protein design / Méthodes inspirées de la robotique pour l’aide à la conception de protéines

Denarie, Laurent 12 April 2017 (has links)
La conception de protéines ayant des propriétés spécifiques représente un enjeu majeur pour la pharmacologie et les bio-technologies. Malgré les progrès des méthodes de CAO développées pour la conception de protéines, une limitation majeure des techniques existantes vient de la difficulté à prendre en compte la mobilité du squelette protéique, afin de mieux capturer l’ensemble des propriétés des protéines candidates et garantir la bonne stabilité de la protéine choisie dans la conformation voulue. De plus, si des méthodes de conception multi-états ont été proposées, elles ne permettent pas de garantir l’existence d’une trajectoire réaliste entre ces états. De ce fait, la conception de protéines devant permettre la transition entre plusieurs états reste un problème hors de la portée des méthodes actuelles. Cette thèse explore comment des algorithmes inspirés de la robotique peuvent être utilisés pour explorer l’espace conformationnel de manière efficace afin d’améliorer les méthodes de conception de protéines en prenant en compte de manière plus poussée la flexibilité de leur squelette. Ce travail pose également un premier jalon vers une méthode de conception adaptée à la réalisation d’un mouvement de la protéine. / The ability to design proteins with specific properties would yield great progress in pharmacology and bio-technologies. Methods to design proteins have been developed since a few decades and some relevant achievements have been made including de novo protein design. Yet, current approaches suffer some serious limitations. By not taking protein’s backbone motions into account, they fail at capturing some of the properties of the candidate design and cannot guarantee that the solution will in fact be stable for the goal conformation. Besides, although multi-states design methods have been proposed, they do not guarantee that a feasible trajectory between those states exists, which means that design problem involving state transitions are out of reach of the current methods. This thesis investigates how robotics-inspired algorithms can be used to efficiently explore the conformational landscape of a protein aiming to enhance protein design methods by introducing additional backbone flexibility. This work also provides first milestones towards protein motion design.
45

Inferences on Structure and Function of Proteins from Sequence Data : Development of Methods and Applications

Mudgal, Richa January 2015 (has links) (PDF)
Structural and functional annotation of sequences of putative proteins encoded in the newly sequenced genomes pose an important challenge. While much progress has been made towards high throughput experimental techniques for structure determination and functional assignment to proteins, most of the current genome-wide annotation systems rely on computational methods to derive cues on structure and function based on relationship with related proteins of known structure and/or function. Evolutionary pressure on proteins, forces the retention of sequence features that are important for structure and function. Thus, if it can be established that two proteins have descended from a common ancestor, then it can be inferred that the structural fold and biological function of the two proteins would be similar. Homology based information transfer from one protein to another has played a central role in the understanding of evolution of protein structures, functions and interactions. Many algorithmic improvements have been developed over the past two decades to recognize homologues of a protein from sequence-based searches alone, but there are still a large number of proteins without any functional annotation. The sensitivity of the available methods can be further enhanced by indirect comparisons with the help of intermediately-related sequences which link related families. However, sequence-based homology searches in the current protein sequence space are often restricted to the family members, due to the paucity of natural intermediate sequences that can act as linkers in detecting remote homologues. Thus a major goal of this thesis is to develop computational methods to fill up the sparse regions in the protein sequence space with computationally designed protein-like sequences and thereby create a continuum of protein sequences, which could aid in detecting remote homologues. Such designed sequences are further assessed for their effectiveness in detection of distant evolutionary relationships and functional annotation of proteins with unknown structure and function. Another important aspect in structural bioinformatics is to gain a good understanding of protein sequence - structure - function paradigm. Functional annotations by comparisons of protein sequences can be further strengthened with the addition of structural information; however, instances of functional divergence and convergence may lead to functional mis-annotations. Therefore, a systematic analysis is performed on the fold–function associations using binding site information and their inter-relationships using binding site similarity networks. Chapter 1 provides a background on proteins, their evolution, classification and structural and functional features. This chapter also describes various methods for detection of remote similarities and the role of protein sequence design methods in detection of distant relatives for protein annotation. Pitfalls in prediction of protein function from sequence and structure are also discussed followed by an outline of the thesis. Chapter 2 addresses the problem of paucity of available protein sequences that can act as linkers between distantly related proteins/families and help in detection of distant evolutionary relationships. Previous efforts in protein sequence design for remote homology detection and design of sequences corresponding to specific protein families are discussed. This chapter describes a novel methodology to computationally design intermediately-related protein sequences between two related families and thus fill-in the gaps in the sequence space between the related families. Protein families as defined in SCOP database are represented as position specific scoring matrices (PSSMs) and these profiles of related protein families within a fold are aligned using AlignHUSH -a profile-profile alignment method. Guided by this alignment, the frequency distribution of the amino acids in the two families are combined and for each aligned position a residue is selected based on the combined probability to occur in the alignment positions of two families. Each computationally designed sequence is then subjected to RPS-BLAST searches against an all profile pool representing all protein families. Artificial sequences that detect both the parent profiles with no hits corresponding to other folds qualify as ‘designed intermediate sequences’. Various scoring schemes and divergence levels for the design of protein-like sequences are investigated such that these designed sequences intersperse between two related families, thereby creating a continuum in sequence space. The method is then applied on a large scale for all folds with two or more families and resulted in the design of 3,611,010 intermediately-related sequences for 27,882 profile-profile alignments corresponding to 374 folds. Such designed sequences are generic in nature and can be augmented in any sequence database of natural protein sequences. Such enriched databases can then be queried using any sequence-based remote homology detection method to detect distant relatives. The next chapter (Chapter 3) explores the ability of these designed intermediate sequences to act as linkers of two related families and aid in detection of remote homologues. To assess the applicability of these designed sequences two types of databases have been generated, namely a CONTROL database containing protein sequences from natural sequence databases and an AUGMENTED database in which designed sequences are included in the database of natural sequences. Detailed assessments of the utility of such designed sequences using traditional sequence-based searches in the AUGMENTED database showed an enhanced detection of remote homologues for almost 74% of the folds. For over 3,000 queries, it is demonstrated that designed sequences are positioned as suitable linkers, which mediate connections between distantly related proteins. Using examples from known distant evolutionary relationships, we demonstrate that homology searches in augmented databases show an increase of up to 22% in the number of /correct evolutionary relationships "discovered". Such connections are reported with high sensitivities and very low false positive rates. Interestingly, they fill-in void and sparse regions in sequence space and relate distant proteins not only through multiple routes but also through SCOP-NrichD database, SUPFAM+ database, SUPERFAMILY database, protein domain library queried by pDomTHREADER and HHsearch against HMM library of SCOP families. This approach detected evolutionary relationships for almost 20% of all the families with no known structure or function. Detailed report of predictions for 614 DUFs, their fold and species distribution are provided in this chapter. These predictions are then enriched with GO terms and enzyme information wherever available. A detailed discussion is provided for few of the interesting assignments: DUF1636, DUF1572 and DUF2092 which are functionally annotated as thioredoxin-like 2Fe-2S ferredoxin, putative metalloenzyme and lipoprotein localization factors respectively. These 614 novel structure-function relationships of which 193 are supported by consensus between at least two of the five methods, can be accessed from http://proline.biochem.iisc.ernet.in/RHD_DUFS/. Protein functions can be appreciated better in the light of evolutionary information from their structures. Chapter 6 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important to obtain a better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Many structural genomics initiative projects have made considerable efforts in solving structures and bridging the growing gap between protein sequences and their structures. The results of such experiments suggest that often the newly solved structure using X-ray crystallography or NMR methods has structural similarity to a protein with already known structure. These relationships often remain undetected due to unavailability of structural information. Therefore, SUPFAM+ database aims to detect such distant relationships between Pfam families by mapping the Pfam families and SCOP domain families. The work presented in this chapter describes the generation of SUPFAM+ database using a sensitive AlignHUSH method to uncover hidden relationships. Firstly, Pfam families are queried against a profile database of SCOP families to derived Pfam-SCOP associations, and then Pfam families are queried against Pfam database to derive Pfam-Pfam relationships. Pfam families that remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families that are already mapped to a SCOP family. The criteria are kept stringent for these mappings to minimize the rate of false positives. In case of a Pfam family mapping to two or more SCOP superfamilies, a decision tree is implemented to assign the Pfam family to a single SCOP superfamily. Using these direct and indirect evolutionary relationships present in the SCOP database, associations between Pfam families are derived. Therefore, relationship between two Pfam families that do not have significant sequence similarity can be identified if both are related to same SCOP superfamily. Almost 36% of the Pfam families could be mapped to SCOP families through direct or indirect association. These Pfam-SCOP associations are grouped into 1,646 different superfamilies and cataloguing changes that occur in the binding sites between two functions, which are analysed in this study to trace possible routes between different functions in evolutionarily related enzymes. The main conclusions of the entire thesis are summarized in Chapter 8, contributing in the area of remote homology detection from sequence information alone and understanding the ‘sequence-structure-function’ paradigm from a binding site perspective. The chapter illustrates the importance of the work presented here in the post-genomic era. The development of the algorithm for the design of ‘intermediately-related sequences’ that could serve as effective linkers in remote homology detection, its subsequent large scale assessment and amenability to be augmented into any protein sequence database and exploration by any sequence-based search method is highlighted. Databases in the NrichD resource are made available in the public domain along with a portal to design artificial sequence for or between protein families. This thesis also provides useful and meaningful predictions for protein families with yet unknown structure and function using NrichD database as well as four other state-of-the-art sequence-based remote homology detection methods. A different aspect addressed in this thesis provides a fundamental understanding of the relationships between protein structure and functions. Evolutionary relationships between functional families are identified using the inherent structural information for these families and fold-function relationships are studied from a perspective of similarities in their binding sites. Such studies help in the area of functional annotation, polypharmacology and protein engineering. Chapter 2 addresses the problem of paucity of available protein sequences that can act as linkers between distantly related proteins/families and help in detection of distant evolutionary relationships. Previous efforts in protein sequence design for remote homology detection and design of sequences corresponding to specific protein families are discussed. This chapter describes a novel methodology to computationally design intermediately-related protein sequences between two related families and thus fill-in the gaps in the sequence space between the related families. Protein families as defined in SCOP database are represented as position specific scoring matrices (PSSMs) and these profiles of related protein families within a fold are aligned using AlignHUSH -a profile-profile alignment method. Guided by this alignment, the frequency distribution of the amino acids in the two families are combined and for each aligned position a residue is selected based on the combined probability to occur in the alignment positions of two families. Each computationally designed sequence is then subjected to RPS-BLAST searches against an all profile pool representing all protein families. Artificial sequences that detect both the parent profiles with no hits corresponding to other folds qualify as ‘designed intermediate sequences’. Various scoring schemes and divergence levels for the design of protein-like sequences are investigated such that these designed sequences intersperse between two related families, thereby creating a continuum in sequence space. The method is then applied on a large scale for all folds with two or more families and resulted in the design of 3,611,010 intermediately-related sequences for 27,882 profile-profile alignments corresponding to 374 folds. Such designed sequences are generic in nature and can be augmented in any sequence database of natural protein sequences. Such enriched databases can then be queried using any sequence-based remote homology detection method to detect distant relatives. The next chapter (Chapter 3) explores the ability of these designed intermediate sequences to act as linkers of two related families and aid in detection of remote homologues. To assess the applicability of these designed sequences two types of databases have been generated, namely a CONTROL database containing protein sequences from natural sequence databases and an AUGMENTED database in which designed sequences are included in the database of natural sequences. Detailed assessments of the utility of such designed sequences using traditional sequence-based searches in the AUGMENTED database showed an enhanced detection of remote homologues for almost 74% of the folds. For over 3,000 queries, it is demonstrated that designed sequences are positioned as suitable linkers, which mediate connections between distantly related proteins. Using examples from known distant evolutionary relationships, we demonstrate that homology searches in augmented databases show an increase of up to 22% in the number of /correct evolutionary relationships "discovered". Such connections are reported with high sensitivities and very low false positive rates. Interestingly, they fill-in void and sparse regions in sequence space and relate distant proteins not only through multiple routes but also through SCOP-NrichD database, SUPFAM+ database, SUPERFAMILY database, protein domain library queried by pDomTHREADER and HHsearch against HMM library of SCOP families. This approach detected evolutionary relationships for almost 20% of all the families with no known structure or function. Detailed report of predictions for 614 DUFs, their fold and species distribution are provided in this chapter. These predictions are then enriched with GO terms and enzyme information wherever available. A detailed discussion is provided for few of the interesting assignments: DUF1636, DUF1572 and DUF2092 which are functionally annotated as thioredoxin-like 2Fe-2S ferredoxin, putative metalloenzyme and lipoprotein localization factors respectively. These 614 novel structure-function relationships of which 193 are supported by consensus between at least two of the five methods, can be accessed from http://proline.biochem.iisc.ernet.in/RHD_DUFS/. Protein functions can be appreciated better in the light of evolutionary information from their structures. Chapter 6 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important to obtain a better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Many structural genomics initiative projects have made considerable efforts in solving structures and bridging the growing gap between protein sequences and their structures. The results of such experiments suggest that often the newly solved structure using X-ray crystallography or NMR methods has structural similarity to a protein with already known structure. These relationships often remain undetected due to unavailability of structural information. Therefore, SUPFAM+ database aims to detect such distant relationships between Pfam families by mapping the Pfam families and SCOP domain families. The work presented in this chapter describes the generation of SUPFAM+ database using a sensitive AlignHUSH method to uncover hidden relationships. Firstly, Pfam families are queried against a profile database of SCOP families to derived Pfam-SCOP associations, and then Pfam families are queried against Pfam database to derive Pfam-Pfam relationships. Pfam families that remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families that are already mapped to a SCOP family. The criteria are kept stringent for these mappings to minimize the rate of false positives. In case of a Pfam family mapping to two or more SCOP superfamilies, a decision tree is implemented to assign the Pfam family to a single SCOP superfamily. Using these direct and indirect evolutionary relationships present in the SCOP database, associations between Pfam families are derived. Therefore, relationship between two Pfam families that do not have significant sequence similarity can be identified if both are related to same SCOP superfamily. Almost 36% of the Pfam families could be mapped to SCOP families through direct or indirect association. These Pfam-SCOP associations are grouped into 1,646 different superfamilies and cataloguing changes that occur in the binding sites between two functions, which are analysed in this study to trace possible routes between different functions in evolutionarily related enzymes. The main conclusions of the entire thesis are summarized in Chapter 8, contributing in the area of remote homology detection from sequence information alone and understanding the ‘sequence-structure-function’ paradigm from a binding site perspective. The chapter illustrates the importance of the work presented here in the post-genomic era. The development of the algorithm for the design of ‘intermediately-related sequences’ that could serve as effective linkers in remote homology detection, its subsequent large scale assessment and amenability to be augmented into any protein sequence database and exploration by any sequence-based search method is highlighted. Databases in the NrichD resource are made available in the public domain along with a portal to design artificial sequence for or between protein families. This thesis also provides useful and meaningful predictions for protein families with yet unknown structure and function using NrichD database as well as four other state-of-the-art sequence-based remote homology detection methods. A different aspect addressed in this thesis provides a fundamental understanding of the relationships between protein structure and functions. Evolutionary relationships between functional families are identified using the inherent structural information for these families and fold-function relationships are studied from a perspective of similarities in their binding sites. Such studies help in the area of functional annotation, polypharmacology and protein engineering.
46

Topology-based Sequence Design For Proteins Structures And Statistical Potentials Sensitive To Local Environments

Jha, Anupam Nath 11 1900 (has links) (PDF)
Proteins, which regulate most of the biological activities, perform their functions through their unique three-dimensional structures. The folding process of this three dimensional structure from one dimensional sequence is not well understood. The available facts infer that the protein structures are mostly conserved while sequences are more tolerant to mutations i.e. a number of sequences can adopt the same fold. These arch of optimal sequences for a chosen conformation is known as inverse protein folding and this thesis takes this approach to solve the enigmatic problem. This thesis presents a protein sequence design method based on the native state topology of protein structure. The structural importance of the amino acid positions has been converted into the topological parameter of the protein conformation. This scheme of extraction of topology of structures has been successfully applied on three dimensional lattice structures and in turn sequences with minimum energy for a given structure are obtained. This technique along with the reduced amino cid alphabet(A reduced amino acid alphabet is any clustering of twenty amino acids based on some measure of the irrelative similarity) has been applied on the protein structures and hence designed optimal amino acid sequences for a given structure. These designed sequences are energetically much better than the native amino acid sequence. The utility of this method is further confirmed by showing the similarity between naturally occurring and the designed sequences. In summary, a computationally efficient method of designing optimal sequences for a given structure is given. The physical interaction energy between the amino acids is an important part of study of protein-protein interaction, structure prediction, modeling and docking etc. The local environment of amino acids makes a difference between the same amino acid pairs in the protein structure and so the pair-wise interaction energy of amino acid residues should depend on the irrespective environment. A local environment depended knowledge based potential energy function is developed in this thesis. Two different environments, one of these is the local degree (number of contacts) and the other is the secondary structural element of amino acids, have been considered. The investigations have shown that the environment-based interaction preferences for amino acids is able to provide good potential energy functions which perform exceedingly well in discriminating the native structure from the structures with random interactions. Further, the membrane proteins are located in a completely different physico-chemical environment with different amino acid composition than the water soluble proteins. This work provides reliable potential energy functions which take care of different environment for the investigation(model/predict) of the structure of helical membrane proteins. Three different environments, parallel and perpendicular to the lipid bilayer and number of amino acid contacts, are explored to analyze the environmental effects on the potential functions. These environment dependent scoring functions perform exceedingly well indiscriminating the native sequence from a set of random sequences. Hydrophobicity of amino acids is a measure of buriedness or exposure to the aqueous environment. The lack of uniformity within the protein environment gives rise to the different values of hydrophobicity for the same amino acids, which completely depends on its location inside the protein.The contact based environment dependent hydrophobicity values of all amino acids, separately for globular and membrane proteins, have also been evaluated in this thesis. Apart from developing scoring functions, the packing of helices in membrane proteins is investigated by an approach based on the local backbone geometry and side chain atom-atom contacts of amino acids. A parameter defined in this study is able to capture the essential features of inter-helical packing, which may prove to be useful in modeling of helical membrane proteins. In conclusion, this thesis has described a novel technique to design the energetically minimized amino acid sequences which can fold in to a given conformation. Also the environment dependent interaction preference of amino acids in globular proteins is captured an efficient manner. Specially, the environment dependent scoring function for helical membrane proteins is a first successful attempt in this direction.
47

Characterization of the Protein Lysine Methyltransferase SMYD2

Lanouette, Sylvain January 2015 (has links)
Our understanding of protein lysine methyltransferases and their substrates remains limited despite their importance as regulators of the proteome. The SMYD (SET and MYND domain) methyltransferase family plays pivotal roles in various cellular processes, including transcriptional regulation and embryonic development. Among them, SMYD2 is associated with oesophageal squamous cell carcinoma, bladder cancer and leukemia as well as with embryonic development. Initially identified as a histone methyltransferase, SMYD2 was later reported to methylate p53, the retinoblastoma protein pRb and the estrogen receptor ERalpha and to regulate their activity. Our proteomic and biochemical analyses demonstrated that SMYD2 also methylates the molecular chaperone HSP90 on K209 and K615. We also showed that HSP90 methylation is regulated by HSP90 co-chaperones, pH, and the demethylase LSD1. Further methyltransferase assays demonstrated that SMYD2 methylates lysine K* in proteins which include the sequence [LFM]-₁-K*-[AFYMSHRK]+₁-[LYK]+₂. This motif allowed us to show that SMYD2 methylates the transcriptional co-repressor SIN3B, the RNA helicase DHX15 and the myogenic transcription factors SIX1 and SIX2. Finally, muscle cell models suggest that SMYD2 methyltransferase activity plays a role in preventing premature myogenic differentiation of proliferating myoblasts by repressing muscle-specific genes. Our work thus shows that SMYD2 methyltransferase activity targets a broad array of substrates in vitro and in situ and is regulated by intricate mechanisms.
48

Protein surface charge of trypsinogen changes its activation pattern

Buettner, Karin, Kreisig, Thomas, Sträter, Norbert, Züchner, Thole January 2014 (has links)
Background: Trypsinogen is the inactive precursor of trypsin, a serine protease that cleaves proteins and peptides after arginine and lysine residues. In this study, human trypsinogen was used as a model protein to study the influence of electrostatic forces on protein–protein interactions. Trypsinogen is active only after its eight-amino-acid-long activation peptide has been cleaved off by another protease, enteropeptidase. Trypsinogen can also be autoactivated without the involvement of enteropeptidase. This autoactivation process can occur if a trypsinogen molecule is activated by another trypsin molecule and therefore is based on a protein–protein interaction. Results: Based on a rational protein design based on autoactivation-defective guinea pig trypsinogen, several amino acid residues, all located far away from the active site, were changed to modify the surface charge of human trypsinogen. The influence of the surface charge on the activation pattern of trypsinogen was investigated. The autoactivation properties of mutant trypsinogen were characterized in comparison to the recombinant wild-type enzyme. Surface-charged trypsinogen showed practically no autoactivation compared to the wild-type but could still be activated by enteropeptidase to the fully active trypsin. The kinetic parameters of surface-charged trypsinogen were comparable to the recombinant wild-type enzyme. Conclusion: The variant with a modified surface charge compared to the wild-type enzyme showed a complete different activation pattern. Our study provides an example how directed modification of the protein surface charge can be utilized for the regulation of functional protein–protein interactions, as shown here for human trypsinogen.
49

Computational protein design : un outil pour l'ingénierie des protéines et la biologie synthétique / Computational protein design : a tool for protein engineering and synthetic biology

Mignon, David 20 December 2017 (has links)
Le « Computational protein design » ou CPD est la recherche des séquences d’acides aminés compatibles avec une structure protéique ciblée. L’objectif est de concevoir une fonction nouvelle et/ou d’ajouter un nouveau comportement. Le CPD est en développement dans de notre laboratoire depuis plusieurs années, avec le logiciel Proteus qui a plusieurs succès à son actif.Notre approche utilise un modèle énergétique basé sur la physique et s’appuie sur la différence d’énergie entre l’état plié et l’état déplié de la protéine. Au cours de cette thèse, nous avons enrichi Proteus sur plusieurs points, avec notamment l’ajout d’une méthode d’exploration Monte Carlo avec échange de répliques ou REMC. Nous avons comparé trois méthodes stochastiques pour l’exploration de l’espace de la séquence : le REMC, le Monte Carlo simple et une heuristique conçue pour le CPD, le «Multistart Steepest Descent » ou MSD. Ces comparaisons portent sur neuf protéines de trois familles de structures : SH2, SH3 et PDZ. En utilisant les techniques d’exploration ci-dessus, nous avons été en mesure d’identifier la conformation du minimum global d’énergie ou GMEC pour presque tous les tests dans lesquels jusqu’à 10 positions de la chaîne polypeptidique étaient libres de muter (les autres conservant leurs types natifs). Pour les tests avec 20 positions libres de muter, le GMEC a été identifié dans 2/3 des cas. Globalement, le REMC et le MSD donnent de très bonnes séquences en termes d’énergie, souvent identiques ou très proches du GMEC. Le MSD a obtenu les meilleurs résultats sur les tests à 30 positions mutables. Le REMC avec huit répliques et des paramètres optimisés a donné le plus souvent le meilleur résultat lorsque toutes les positions peuvent muter. De plus, comparé à une énumération exacte des séquences de faible énergie, le REMC fournit un échantillon de séquences de grande diversité.Dans la seconde partie de ce travail, nous avons testé notre modèle pour la conception de domaines PDZ. Pour l’état plié,nous avons utilisé deux variantes d’un modèle de solvant GB. La première utilise une frontière diélectrique protéine/solvant effective moyenne ; la seconde, plus rigoureuse, utilise une frontière exacte qui fluctue le long de la trajectoire MC. Pour caractériser l’état déplié, nous utilisons un ensemble de potentiels chimiques d’acide aminé ou énergies de références. Ces énergies de références sont déterminées par maximisation d’une fonction de vraisemblance afin de reproduire les fréquences d’acides aminés des domaines PDZ naturels. Les séquences conçues par Proteus ont été comparées aux séquences naturelles. Nos séquences sont globalement similaires aux séquences Pfam, au sens des scoresBLOSUM40, avec des scores particulièrement élevés pour les résidus au cœur de la protéine. La variante de GB la plus rigoureuse donne toujours des séquences similaires à des homologues naturels modérément éloignés et l’outil de reconnaissance de plis Super family appliqué à ces séquences donne une reconnaissance parfaite. Nos séquences ont également été comparées à celles du logiciel Rosetta. La qualité, selon les mêmes critères que précédemment, est très comparable, mais les séquences Rosetta présentent moins de mutations que les séquences Proteus. / Computational Protein Design, or CPD is the search for the amino acid sequences compatible with a targeted protein structure. The goal is to design a new function and/or add a new behavior. CPD has been developed in our laboratory for several years, with the software Proteus which has several successes to its credit. Our approach uses a physics-based energy model, and relies on the energy difference between the folded and unfolded states of the protein. During this thesis, we enriched Proteus on several points, including the addition of a Monte Carlo exploration method with Replica Exchange or REMC. We compared extensively three stochastic methods for the exploration of sequence space: REMC, plain Monte Carlo and a heuristic designed for CPD: Multistart Steepest Descent or MSD.These comparisons concerned nine proteins from three structural families: SH2, SH3 and PDZ. Using the exploration techniques above, we were able to identify the Global Minimum EnergyConformation, or GMEC for nearly all the test cases where up to10 positions of the polypeptide chain were free to mutate (the others retaining their native types). For the tests where 20positions were free to mutate, the GMEC was identified in 2/3 of the cases. Overall, REMC and MSD give very good sequences in terms of energy, often identical or very close to the GMEC. MSDperformed best in the tests with 30 mutating positions. REMCwith eight replicas and optimized parameters often gave the best result when all positions could mutate. Moreover, compared to an exact enumeration of the low energy sequences, REMC provided a sample of sequences with a high sequence diversity.In the second part of this work, we tested our CPD model forPDZ domain design. For the folded state, we used two variants ofa GB solvent model. The first used a mean, effective protein/solvent dielectric boundary; the second one, more rigorous, used an exact boundary that flucutated over the MCtrajectory. To characterize the unfolded state, we used a set of amino acid chemical potentials or reference energies. These reference energies were determined by maximizing a likelihoodfunction so as to reproduce the amino acid frequencies in naturalPDZ domains. The sequences designed by Proteus were compared to the natural sequences. Our sequences are globally similar to the Pfam sequences, in the sense of the BLOSUM40scores, with especially high scores for the residues in the core ofthe protein. The more rigorous GB variant always gives sequences similar to moderately distant natural homologues and perfect recognition by the the Super family fold recognition tool.Our sequences were also compared to those produced by the Rosetta software. The quality, according to the same criteria as before, was very similar, but the Rosetta sequences exhibit fewer mutations than the Proteus sequences.
50

Étude computationnelle du domaine PDZ de Tiam1 / Computational study of the Tiam1 PDZ domain

Panel, Nicolas 07 November 2017 (has links)
Les interactions protéine-protéine sont souvent contrôlées par de petits domaines protéiques qui régulent les chemins de signalisation au sein des cellules eucaryotes. Les domaines PDZ sont parmi les domaines les plus répandus et les plus étudiés. Ils reconnaissent spécifiquement les 4 à 10 acides aminés C-terminaux de leurs partenaires. Tiam1 est un facteur d'échange de GTP de la protéine Rac1 qui contrôle la migration et la prolifération cellulaire et dont le domaine PDZ lie les protéines Syndecan-1 (Sdc1), Caspr4 et Neurexine. Des petits peptides ou des molécules peptidomimétiques peuvent potentiellement inhiber ou moduler son activité et être utilisés à des fins thérapeutiques. Nous avons appliqué des approches de dessin computationnel de protéine (CPD) et de calcul d'énergie libre par simulations dynamique moléculaire (DM) pour comprendre et modifier sa spécificité. Le CPD utilise un modèle structural et une fonction d'énergie pour explorer l'espace des séquences et des structures et identifier des variants protéiques ou peptidiques stables et fonctionnels. Nous avons utilisé le programme de CPD Proteus, développé au laboratoire, pour redessiner entièrement le domaine PDZ de Tiam1. Les séquences générées sont similaires à celles des domaines PDZ naturels, avec des scores de similarité et de reconnaissance de pli comparables au programme Rosetta, un outil de CPD très utilisé. Des séquences contenant environ 60 positions mutées sur 90, ont été testées par simulations de DM et des mesures biophysiques. Quatre des cinq séquences testées expérimentalement (par nos collaborateurs) montrent un dépliement réversible autour de 50°C. Proteus a également déterminer correctement la spécificité de la liaison de quelques variants protéiques et peptidiques. Pour étudier plus finement la spécificité, nous avons paramétré un modèle d'énergie libre semi-empirique de Poisson-Boltzmann ayant la forme d'une énergie linéaire d'interaction, ou PB/LIE, appliqué à des conformations issues de simulations de DM en solvant explicite de complexes PDZ:peptide. Avec trois paramètres ajustables, le modèle reproduit correctement les affinités expérimentales de 41 variants, avec une erreur moyenne absolue de 0,4~kcal/mol, et donne des prédictions pour 10 nouveaux variants. Le modèle PB/LIE a ensuite comparé à la méthode non-empirique de calcul d'énergie libre par simulations alchimiques, qui n'a pas de paramètre ajustable et qui prédit correctement l'affinité de 12 complexes Tiam1:peptide. Ces outils et les résultats obtenus devraient nous permettre d'identifier des peptides inhibiteurs et auront d'importantes retombées pour l'ingénierie des interactions PDZ:peptide. / Small protein domains often direct protein-protein interactions and regulate eukaryotic signalling pathways. PDZ domains are among the most widespread and best-studied. They specifically recognize the 4-10 C-terminal amino acids of target proteins. Tiam1 is a Rac GTP exchange factor that helps control cellmigration and proliferation and whose PDZ domain binds the proteins syndecan-1 (Sdc1), Caspr4, and Neurexin. Short peptides and peptidomimetics can potentially inhibit or modulate its action and act as bioreagents or therapeutics. We used computational protein design (CPD) and molecular dynamics (MD) free energy simulations to understand and engineer its peptide specificity. CPD uses a structural model and an energy function to explore the space of sequences and structures and identify stable and functional protein or peptide variants. We used our in-house Proteus CPD package to completely redesign the Tiam1 PDZ domain. The designed sequences were similar to natural PDZ domains, with similarity and fold recognition scores comarable to the widely-used Rosetta CPD package. Selected sequences, containing around 60 mutated positions out of 90, were tested by microsecond MD simulations and biophysical experiments. Four of five sequences tested experimentally (by our collaborators) displayed reversible unfolding around 50°C. Proteus also accurately scored the binding specificity of several protein and peptide variants. As a more refined model for specificity, we parameterized a semi-empirical free energy model of the Poisson-Boltzmann Linear Interaction Energy or PB/LIE form, which scores conformations extracted from explicit solvent MD simulations of PDZ:peptide complexes. With three adjustable parameters, the model accurately reproduced the experimental binding affinities of 41 variants, with a mean unsigned error of just 0.4 kcal/mol, andgave predictions for 10 new variants. The PB/LIE model was tested further by comparing to non-empirical, alchemical, MD free energy simulations, which have no adjustable parameters and were found to give chemical accuracy for 12 Tiam1:peptide complexes. The tools and insights obtained should help discover new tight binding peptides or peptidomimetics and have broad implications for engineering PDZ:peptide interactions.

Page generated in 0.053 seconds