Spelling suggestions: "subject:"amino acid dequence"" "subject:"amino acid 1sequence""
51 |
Protein Structure Prediction Based on Neural NetworksZhao, Jing January 2013 (has links)
Proteins are the basic building blocks of biological organisms, and are responsible for a variety of functions within them. Proteins are composed of unique amino acid sequences. Some has only one sequence, while others contain several sequences that are combined together. These combined amino acid sequences fold to form a unique three-dimensional (3D) shape. Although the sequences may fold proteins into different 3D shapes in diverse environments, proteins with similar amino acid sequences typically have similar 3D shapes and functions. Knowledge of the 3D shape of a protein is important in both protein function analysis and drug design, for example when assessing the toxicity reduction associated with a given drug. Due to the complexity of protein 3D shapes and the close relationship between shapes and functions, the prediction of protein 3D shapes has become an important topic in bioinformatics.
This research introduces a new approach to predict proteins’ 3D shapes, utilizing a multilayer artificial neural network. Our novel solution allows one to learn and predict the representations of the 3D shape associated with a protein by starting directly from its amino acid sequence descriptors. The input of the artificial neural network is a set of amino acid sequence descriptors we created based on a set of probability density functions. In our algorithm, the probability density functions are calculated by the correlation between the constituent amino acids, according to the substitution matrix. The output layer of the network is formed by 3D shape descriptors provided by an information retrieval system, called CAPRI. This system contains the pose invariant 3D shape descriptors, and retrieves proteins having the closest structures. The network is trained by proteins with known amino acid sequences and 3D shapes. Once the network has been trained, it is able to predict the 3D shape descriptors of the query protein. Based on the predicted 3D shape descriptors, the CAPRI system allows the retrieval of known proteins with 3D shapes closest to the query protein. These retrieved proteins may be verified as to whether they are in the same family as the query protein, since proteins in the same family generally have similar 3D shapes.
The search for similar 3D shapes is done against a database of more than 45,000 known proteins. We present the results when evaluating our approach against a number of protein families of various sizes. Further, we consider a number of different neural network architectures and optimization algorithms. When the neural network is trained with proteins that are from large families where the proteins in the same family have similar amino acid sequences, the accuracy for finding proteins from the same family is 100%. When we employ proteins whose family members have dissimilar amino acid sequences, or those from a small protein family, in which case, neural networks with one hidden layer produce more promising results than networks with two hidden layers, and the performance may be improved by increasing the number of hidden nodes when the networks have one hidden layer.
|
52 |
Expression and purification of the novel protein domain DWNNLutya, Portia Thandokazi January 2002 (has links)
Magister Scientiae - MSC (Biochemistry) / Proteins play an important role in cells, as the morphology, function and activities of the cell depend on the proteins they express. The key to understanding how different proteins function lies in an understanding of the molecular structure. The overall aim of this thesis was the determination of the structure of DWNN domains. This thesis described the preparation of samples of human DWNN suitable for structural analysis by nuclear magnetic resonance spectroscopy (NMR), as well as NMR analysis. / South Africa
|
53 |
Machine Learning Applications in Proteins: Interaction Prediction and Structure PredictionSun, Mengzhen January 2021 (has links)
This thesis focuses on the two research projects which have applied machine learning techniques to the protein-related topics. The first project is to use protein sequences and the interaction graph to address the protein-protein interaction prediction problem. The second project is to leverage the sequences of protein loops within and beyond homologs to predict the protein loop structures. In the protein-protein interaction prediction project, we applied the pretrained language models, which were trained on large sets of protein sequences, as one of the protein feature extraction methods. Another feature extraction method is the graph learning on the protein interaction graph. The graph learning embeddings and the language model embeddings were fed into classification models to predict if two proteins are interacting or not. We trained and tested our methods on the S. cerevisiae dataset and the human dataset. Our results are comparable to or better than other state-of-art methods, with the advantages that our method is faster at the sample preparation step and has a larger application scope for requiring only protein sequences. We also did experiments with datasets from different similarity cutoffs between the train and test set of the human dataset, and our method has shown an effective prediction ability even with a strict similarity cutoff.
In the protein loop prediction project, we utilized the attention-based encoder-decoder language models to predict the protein loop inter-residue distances from the protein loop sequences. We fed the model with the loop sequences and received arrays of numbers representing the distances between each C_α pair in the loops. We utilized two different strategies to reconstruct the loops from the predicted distances. One was firstly to calculate the C_α coordinates from the predicted distances, and then apply a fast full-atom reconstruction method starting from C_α coordinates to build the local loop structures. Our local loop structure prediction results of this method are very competitive with low local RMSDs, especially with the lowest standard deviations. The second method was to integrate the predicted inter-residue distances as constraints to the de novo loop prediction method PLOP (Jacobson et al. 2004). We tested the loop reconstruction process on the 8-res and 12-res loop benchmark sets. This method has the best performance compared to other state-of-art methods, and the incorporation of such machine learning step decreased the computing time of the standalone PLOP program.
|
54 |
Molecular Cloning and Characterization of Mouse Mast Cell ChymasesChu, Wei, Johnson, David A., Musich, Phillip R. 22 May 1992 (has links)
Mouse mast cell chymases are granule-associated serine proteinases with chymotrypsin-like substrate specificities. cDNAs for two new chymases were isolated from a cDNA library constructec using mRNA from ABFTL-6 mouse mast cells by screening with a rat mast cell proteinase cDNA. The deduced amino acid sequence of mouse cymase 1 consists fo a 226 amino acid catalytic portion and a 21 amino acid preprosequence. Chymase 1 is unusual in that an Asn occurs in the substrate binding pocket, a feature that has not been observed in any other serine proteinase. Also, chymase 1 is expected to have a large positive charge (+13) at physiological pH. A partial cDNA for chymase 2 encodes 177 residues of the carboxy terminal portion of a second proteinase distinct from chymase 1. Chymase 2 cDNA contains highly conserved intron/exon junction, a high positive charge (+17) and a novel, second potential N-glycosylation site. Transcripts for both chymases are found in ABFTL-6 mast cells, but only chymase 2 mRNA is in mouse connective tissue mast cells. These data suggest that these chymases have distinct enzymatic properties and tissue-specific patterns of gene expression.
|
55 |
Finding motif pairs from protein interaction networksSiu, Man-hung., 蕭文鴻. January 2008 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
56 |
Expression and purification of the novel protein domain DWNN.Lutya, Portia Thandokazi January 2002 (has links)
Proteins play an important role in cells, as the morphology, function and activities of the cell depend on the proteins they express. The key to understanding how different proteins function lies in an understanding of the molecular structure. The overall aim of this thesis was the determination of the structure of DWNN domains. This thesis described the preparation of samples of human DWNN suitable for structural analysis by nuclear magnetic resonance spectroscopy (NMR), as well as NMR analysis.
|
57 |
Estrutura molecular comparativa dos isoinibidores de α-Amilase do feijão (Phaseolus vulgaris) / Molecular structure of alpha-amylase isoinibitors from beans (Phaseolus vulgaris): a comparativeFinardi Filho, Flavio 03 April 1991 (has links)
Foram purificados dois isoinibidores de α-amilase (IAs) do feijão preto, Phaseolus vulgaris, cv. Rico 23, através de extração aquosa, precipitação fracionada com sulfato de amônio, diálise contra água destilada e cromatografias em colunas de troca iônica, interação hidrofóbica e peneira molecular. Foram analisadas as caracteristicas químicas dos inibidores (I-1 e I-2), apresentando: 9,9 e 12,1% de carboidratos, 58 e 51 KDaltons de peso molecular, 4,70 e 4,65 como pontos isoelétricos e 13,4 e 36,7 de hidrofobicidade superficial, respectivamente. Também foram analisados o conteúdo de amino ácidos, os mapas trípticos, bem como as condições de dissociação molecular e mudanças conformacionais por agentes químicos e temperatura, avaliadas em SDS-PAGE, calorimetria diferencial e emissão de fluorescência. A separação das subunidades peptídicas foi realizada por cromatografia em DEAE-celulose-uréia 6M, procedendo-se o seqüenciamento parcial de 3 peptídeos isolados. A confrontação das seqüências obtidas com a seqüência da var. Greensleaves revelou homologias de até 59%. Porém, diferenças marcantes entre o I-1 e o I-2 demonstram que essas proteínas devem ser sintetizadas a partir de DNAs distintos. As proteínas isoladas são comprovadamente isoinibidores pertencentes a uma nova família de inibidores de α-amilase específica dos Phaseolus vulgaris. / Two α-amylase isoinhibitors, I-1 and I-2, were purified from black beans Phaseolus vulgaris, cv. Rico 23. Both are glycoproteins acting on mammalian and insect α-amylases, having similar isoelectric points, 4.70 and 4.65, and aminoacid composition, but showing different molecular weights, 58 and 51 KDaltons, and surface hydrophobicity, 13.4 and 36.7, respectivily. Conformational changes due to dissociating agents and temperature were detected by SDS-PAGE, DSC and by following the fluorescence emission. The peptides dissociated by urea-SDS-β-mercaptoethanol were isolated on a urea-DEAE-celulose column. Three of the isolated peptides were sequenced showing up to 59% homology with the deduced sequence from the inhibitor of the var. Greensleaves. In spite of the similarity, they seem to be sythesizes by different DNAs.
|
58 |
NMR characterization guides the design of beta hairpins and sheets while providing insights into folding cooperativity and dynamics /Hudson, Frederick Michael Lewis. January 2006 (has links)
Thesis (Ph. D.)--University of Washington, 2006. / Vita. Includes bibliographical references (leaves 143-156).
|
59 |
Dependence of secondary structure of biopolymers on environment : a circular dichroism study of equivocal amino acid sequences in proteins and of left-handed DNAZhong, Lingxiu 07 April 1992 (has links)
Graduation date: 1992
|
60 |
Non-repetitive Structures In Proteins : Effects Of Side-chain And Solvent Interactions With The BackboneNarayanan, Eswar 04 1900 (has links)
The work presented in this thesis deals with the analysis of protein crystal structures with an emphasis on the stereochemical aspects of the folded conformation of proteins. The various analyses described have been performed on a data-set of 250 high resolution and non-homologous protein structures derived from the Protein Data Bank. The overall objective of the work has been to analyse conformational features of the non-secondary structural regions in proteins and identify structural motifs present therein. The results can be useful in the three-dimensional modelling of proteins, altering the stability of proteins, design of peptide mimics and in understanding the structural rules that guide protein folding.
The contents of this thesis can be broadly classified into three parts, (a) Conformational preferences of amino acid residues to occur in the partially allowed regions of the Ramachandran map, (b) conformational features of structural motifs formed by side-chain/main-chain hydrogen bonds by polar residues and (c) analysis and characteristic features of isolated β-strands.
Chapter 1 of the thesis gives an introduction, briefly discussing the conformation of polypeptide chains, structural features of globular proteins and applications of protein structural analysis etc.
Chapter 2 describes the occurrence of left-handed α-helical conformation in protein structures. A data-set of 250 high resolution (< 2.0A) non-homologous protein crystal structures derived from the Protein Data Bank (PDB) has been analysed for occurrences of left-handed α-helical (αL) conformations. A total of 2,573 αL residues were identified from the data-set. About 59% of the observed examples of at conformations were found to be glycyl residues and about 41% non-glycyl. Continuous long stretches of αL residues are seldom found in protein structures. They are most commonly found as singlets represented by 78% of the observed αL examples. The doublets, triplets and quadruplets account for a very minor fraction of the observed examples. There is only a single example of a stretch of four contiguous
αL residues, from the protein thermolysin, which forms a single turn of a left-handed α-helix.
A majority of the αL residues are nevertheless part of well-defined substructures in proteins. They play singular roles as part of β-turns and helix termination sites in maintaining the characteristic main-chain hydrogen bonds needed for the stability of these structures. They are also found to be effective in the termination of β-strands. The stereo-chemistry and sequence environment around such structures are discussed. The analysis of the side-chain torsion angles of αL residues indicate that the g+ rotamer is highly unfavourable due to stereo-chemical violations posed by the atoms of the side-chain with those of the backbone. The αL residues are highly conserved by residue type as well as conformation among related proteins indicating their vital importance in protein structures
Chapter 3 provides an explanation for the unusual preference of glycyl residues to occur in the bridge regions of the Ramachandran map. The Ramachandran steric map and energy diagrams for the glycyl residue are fully symmetric. Though a plot of the (Φ,Ψ) angles of glycyl residues derived from a data-set of 250 non-homologous and high-resolution protein structures is also largely symmetric, there is a clear aberration in the symmetry. While there is a cluster of points corresponding to the right-handed a-helical region, the "equivalent" cluster is shifted to centre around the (Φ,Ψ)values of (90°, 0°) instead of being centred at the left-handed a-helical region of (60°, 40°).
An analysis of glycyl conformations in small peptide structures and in "coil" proteins, which are largely devoid of helical and sheet regions, shows that glycyl residues prefer to adopt conformations around (±90°, 0°) instead of right and left handed a-helical regions. Using theoretical calculations, such conformations are shown to have highest solvent accessibility in a system of two-linked peptide units with glycyl residue at the central Cα atom. This is found to be consistent with the observations from 250 non-homologous protein structures where glycyl residues with conformations close to (±90°, 0°) are seen to have high solvent accessibility. Analysis of a sub-set of non-homologous structures with very high resolution (1.5A or better) shows that water molecules are indeed present at distances suitable for hydrogen bond interaction with glycyl residues possessing conformations close to (±90°, 0°). It is concluded that water molecules play a key role in determining and stabilising these conformations of glycyl residues and explains the aberration in the symmetry of glycyl conformations in proteins.
Chapter 4 discusses an analysis of backbone mimicry performed by polar side-chains
in protein structures. Backbonemimicry bythe formation of closed loop C7, C10, C13 (mimics of γ-, β- and α-turns) conformations through side-chain main-chain hydrogen bonds by polar groups is found to be a frequent observation in protein structures. A data-set of 250 non-homologous and high-resolution protein structures was used to analyse these conformations for their characteristic features. Seven out of the nine polar residues (Ser, Thr, Asn, Asp, Gin, Glu and His) have hydrogen bonding groups in their side-chains which can participate in such mimicry and as many as 15% of all these polar residues engage in such conformations. The distributions of dihedral angles of these mimics indicate that only certain combinations of the involved dihedral angles aids the formation of these mimics. The observed examples have been categorised into various classes based on these combinations resulting in well-defined motifs. Asn and Asp residues show a very high capability to perform such backbone secondary structural mimicry. The most highly mimicked backbone structure is of the Cio conformation by the Asx residues. The mimics formed by His, Ser, Thr and Glx residues are also discussed. The role of such conformations in initiating the formation of regular secondary structures during the course of protein folding seems significant.
Chapter 5 presents a description of deterministic features of side-chain main-chain hydrogen bonds as observed in protein structures. A total of 19,835 polar residues from the data set of 250 non-homologous and highly resolved protein crystal structures were used to identify side-chain main-chain (SC-MC) hydrogen bonds. The ratio of the total number of polar residues to the number of SC-MC hydrogen bonds is close to 2:1, indicating the ubiquitous nature of such hydrogen bonds. Close to 56% of the SC-MC hydrogen bonds are local involving side-chain acceptor/donor (‘i’) and a main-chain donor/acceptor within the window i-5 to i+5. These short-range hydrogen bonds form well defined conformational motifs characterised by specific combinations of backbone and side-chain torsion angles.
Some of the salient features of such hydrogen bonds are as follows, (a) The Ser/Thr residues show the greatest preference in forming intra-helical hydrogen bonds between the atoms Oyi and Oi-4 Such hydrogen bonds form motifs of the form αRαRαRαR(g") and are most commonly observed at the middle of α-helices. (b) These residues also show great preference to form hydrogen bonds between OYi and Oi-3, which are closely related to the previous type and though intra-helical, these hydrogen bonds are more often found at the C-termini of helices than at the middle. The motif represented by αRαRαRaR(g+) is most preferred in these cases, (c) The Ser, Thr and Glu (between the side-chain and main-chain of the same residue), (d) The side-chain acceptor atoms of Asn/Asp and Ser/Thr residues show high preference to form hydrogen bonds with acceptors two residues ahead in the chain, which are characterised by the motifs β(tt’)αR and β(t)αR, respectively. These hydrogen bonded segments referred to as Asx turns, are known to provide stability to type I and type I’ β-turns. (e) Ser/Thr residues often form a combination of SC-MC hydrogen bonds, with the side-chain donor hydrogen bonded to the carbonyl oxygen of its own peptide backbone and the side-chain acceptor hydrogen bonded to an amide hydrogen three residues ahead in the sequence. Such motifs are quite often seen at the beginning of a-helices, which are characterised by the β (g+)αRαR motif.
A remarkable majority of all these hydrogen bonds are buried from the protein surface, away from the surrounding solvent. This strongly indicates the possibility of side-chains playing the role of the backbone, in the protein interiors, to satisfy the potential hydrogen bonding sites and maintaining the network of hydrogen bonds which is crucial to the structure of the protein.
Chapter 6 provides a detailed characterisation of isolated β-strands. Reason for the formation of β-strands in proteins is often associated with the formation of β -sheets. However β-strands, not part of β-sheets, commonly occur in proteins. This raises questions about the structural role and stability of such isolated β-strands. Using a data set consisting of 250 proteins, 518 isolated β-strands have been identified from 187 proteins. The two important features that distinguish isolated β-strands from p-strands occurring in β-sheets are (i) the high preponderance of prolyl residues to occur in isolated β-strands and (ii) their high solvent exposure. It is shown that the high propensity for proline residues to occur in isolated β-strands is not due to the occurrence of polyproline type segments in the data-set. The propensities of other amino acids to occur in isolated β-strands follows the same trend as those for β-sheet forming β-strands. Isolated β-strands are characterised often by their main-chain amide and carbonyl groups involved in hydrogen bonding with polar side-chains or water. They are often flanked by irregular loop structures indicating that they are part of long of loops. Analysis of the conservation of such strands among families of homologous protein structures indicates that a sizeable fraction of them are highly conserved. It is suggested that though the formation of isolated β-strands are driven by the intrinsic preferences of amino acid residues, they have many characteristics like loop segments but with repetitive (Φ,Ψ) values falling within the β-region of the Ramachandran map.
In addition of the material described in the six chapters above, the thesis also contains the details of work carried out on an aspect slightly different from the main theme of the thesis. This pertains to the comparative analysis of the members of a family of cytokine receptors to derive information to model new members of the family. The three dimensional modelling of the leptin receptor has been used as a case study and the details are included as an appendix.
Appendix describes the 3-dimensional model of the satiety factor receptor (the leptin receptor) modelled using principles of homology modelling. Recessive mutations in the mouse obese (ob) and diabetes (db) genes result in obesity and diabetes in a syndrome resembling human obesity. Data from parabiosis (cross circulation) experiments suggested that the ob gene coded, and was responsible for the generation of a circulating factor called leptin which regulated energy balance and the db gene encoded the receptor for this factor. While the structure of the leptin has been determined that of its cognate receptor is as yet unknown. The leptin receptor shows low but clear sequence similarity to the members of the interleukin type 6 family of receptors. The structures of the members of this family are characterised by two p-sandwich like domains connected by a short 4-residue helical linker. The 3-dimensional models for the N- and C-terminal domains of the leptin receptor was generated using the corresponding structures of the signal transducing component of gpl30, the erythropoetin receptor and the prolactin receptor. Further using the evidence that the leptin binds to its receptor with a stoichiometry of 1:1, the relative orientation of the two domains was modelled based on the structure of the human growth hormone receptor, which also binds its ligand with similar stoichiometry. The complex of leptin with its receptor was also modelled based on the structure of human growth hormone/receptor complex. The final energy minimised model of the complex elucidates the mode of interaction between the leptin and its receptor.
|
Page generated in 0.0904 seconds