Global ETD Search

891	Development And Applications Of Computational Methods To Aid Recognition Of Protein Functions And Interactions Krishnadev, O 03 1900 (has links) (PDF) Protein homology detection has played a central role in the understanding of evolution of protein structures, functions and interactions. Many of the developments in protein bioinformatics can be traced back to an initial step of homology detection. It is not surprising then, that extension of remote homology detection has gained a lot of attention in the recent past. The explosive growth of genome sequences and the slow pace of experimental techniques have thrust computational analyses into the limelight. It is not surprising to see that many of the traditional experimental areas such as gene expression analysis, recognition of function and recognition of 3-D structure have been attempted effectively by computational approaches. The idea behind homology-based bioinformatics work is the fact that the hereditary mechanisms ensure that the parent generation gives rise to a very similar offspring generation. Since biological functions of proteins of an organism are product of expression of its genetic material, it follows that the genes of an organism should show conservation from one generation to another (with very few mutations if parent and offspring generation have to be nearly identical) Thus, if it can be established that two proteins have descended from a common ancestor, then it can be inferred that the biological functions of the two proteins could be very similar. Thus, homology-based information transfer from one protein to another has become a commonly used procedure in protein bioinformatics. The ability to recognize homologs of a protein solely from amino acid sequences has seen a steady increase in the last two decades. However, currently, still there are a large number of proteins of known amino acid sequence and yet unknown function . Thus, a major goal of current computational work is to extend the limits of remote homology detection to enable the functional characterization of proteins of unknown function. Since proteins do not work in isolation in a cell, it has become essential to understand the in vivo context of the function of a protein. For this purpose, it is essential to have an understanding of all the molecules that interact with a particular protein. Thus, another major area of bioinformatics has been to integrate biological information with protein-protein interactions to enable a better understanding of the molecular processes. Such attempts have been made successfully for the interaction network of proteins within an organism. The extension of the interaction network analysis to a host-pathogen scenario can lead to useful insights into pathophysiology of diseases. The work done as part of the thesis explores both the ideas mentioned above, namely, the extension of limits of remote homology detection and prediction of protein-protein interactions between a pathogen and its host. Since the work can logically be divided into two different areas though there is a connection, the thesis is organized as two parts. The first part of the thesis (comprising Chapters 2, 3, 4 and 5) describes the development and application of remote homology detection tools for function/structure annotation. The second part of the thesis (comprising of Chapters 6, 7, 8 and 9) describes the development and application of a homology-based procedure for detection of host-pathogen protein-protein interactions. Chapter 1 provides a background and literature survey in the areas of homology detection and prediction of protein-protein interactions. It is argued that homology-based information transfer is currently an important tool in the prediction and recognition of protein structures, functions and interactions. The development of remote homology detection methods and its effect on function recognition has been highlighted. Recent work in the area of prediction of protein-protein interactions using homology to known interaction templates is described and it is implied to be a successful approach for prediction of protein-protein interactions on a genome scale. The importance of further improvements in remote homology detection (as done in the first part of the thesis), is emphasized for annotation of proteins in newly sequenced genomes. The importance of application of homology detection methods in predicting protein-protein interactions across host-pathogen organisms is also explored. Chapter 2 analyzes the performance of the PSI-BLAST, one of the well-known and very effective approaches for recognition of related proteins, for remote homology detection. The chapter describes in detail the working of the PSI-BLAST algorithm and focuses on three parameters that determine the time required for searching in a large database, and also provide a ceiling for the sensitivity of the search procedure. The parameters that have been analyzed are the window size for two-hit method, the threshold for extension of an initial hit to dynamic programming and the extent of dependence on the query as encompassed in the profile generation step. The procedure followed for the analysis is to consider a large database of known evolutionary relationships (SCOP database was chosen for the analysis), and use the PSI-BLAST program at different values of three parameters to find out the effect on sensitivity (defined as the normalized number of correct SCOP superfamily relationships found in a search), and the time required for completion of the search. For the demonstration of the effect on the query dependence, a multiple sequence alignment (MSA) of a SCOP family (generated from all family sequences using ClustalW), was used with multiple queries to derive profiles in PSI-BLAST runs. The increase in sensitivity and the increase in time required for completion of each search were then monitored. The effect of changing the two PSI-BLAST internal parameters of score threshold for extension of word hits and the window size for the two-hit method do not result in a significant increase in sensitivity. Since PSI-BLAST uses the amino acid residues present in the query sequence to derive the Position Specific Scoring Matrix (PSSM) parameters, there is a strong query dependence on the sensitivity of each PSSM. Using multiple PSSMs derived from a single MSA can thus help overcome the query dependence and increase the sensitivity. In this Chapter such an approach, named as MulPSSM, has been demonstrated to have higher sensitivity than single profiles approach, (by up to two times more) in a benchmark dataset of 100 randomly chosen SCOP folds. Strategies to optimize sensitivity and the time required in searching MulPSSM have been explored and it is found that use of a non-redundant set of queries to generate MulPSSM can reduce the time required for each search while not affecting the sensitivity by a large degree. The application of the MulPSSM approach in function annotation of proteins in completely sequenced genomes was explored by searching genomic sequences in a MulPSSM database of Pfam families. The association of function to proteins has been assessed when both single profile per family database and MulPSSM database of families were used. It is found that in a comprehensive list of 291 genomes of Prokaryotes, 44 genomes of Eukaryotes and 40 genomes of Archea, that on an average MulPSSM is able to identify evolutionary relationships for 10% more proteins in a genome than single profiles-based approach. Such an enhancement in the recognition of evolutionary relationships, which has an implication in obtaining clues to functions, can help in more efficient exploration of newly sequenced genomes. Identification of evolutionary relationships involving some of the proteins of M. tuberculosis and M. leprae has been possible due to the use of multiple profiles search approach which is discussed in this chapter. The examples of annotations provided in the chapter include enzymes that are involved in glyco lipids synthesis which are vital for the survival of the pathogens inside the host and such annotations can help in expanding our knowledge of these processes. Chapter 3 describes the development and assessment of a sensitive remote homology detection method. The sensitivity of remote homology detection methods has been steadily increasing in the past decade and profile analysis has become a mainstay of such efforts. The profile is a probabilistic model of substitutions allowed at each position in a sequence family, and hence captures the essential features of a family. Alignment of two such profiles is thus considered to provide a more sensitive and accurate method than the alignment of two sequences. The performance of HMMs (Hidden Markov Models) has been shown to be higher than PSSMs (Position Specific Scoring Matrix). Thus, a profile-profile alignment using HMMs can in principle give the best possible sensitivity in remote homology detection. Many investigators have incorporated residue conservation and secondary structure information to align two HMMs, and such additional information has been demonstrated to provide better sensitivity in remote homology detection (for instance in the HHSearch program). The work presented in Chapter 3, extends the idea of incorporating additional information such as explicit hydrophobicity information, along with conservation and predicted secondary structure over a window of Multiple Sequence Alignment (MSA) columns in aligning HMMs. The new algorithm is named AlignHUSH (Alignment of HMMs Using Secondary structure and Hydrophobicity). The HMMs used in the work are derived from structural alignments using HMMER program and are taken from the publicly available superfamily database which provides HMMs for all the SCOP families. The HMMs are modified into two-state HMMs by collapsing the ‘insert’ and ‘delete’ states into a ‘non-match’ state in the AlignHUSH algorithm. The two state HMMs enables the use of dynamic programming methods and keeps intact the position-specific gap penalties. The two state HMMs can be more readily extended to alignment of PSSMs. The incorporation of secondary structure information is made using secondary structure predictions made using PSIPRED program. The hydrophobicity information is calculated using the Kyte Doolittle hydrophobicity values. The alignment is generated by scoring each position using the values present in a window of residues. The assessment of alignment accuracy is done by comparison to manually curated alignments present in the BaliBASE database. A detailed description of the optimization steps followed for obtaining the values for each score contribution (conservation, secondary structure and hydrophobicity) is provided. The assessment revealed that a high weightage to conservation score (18.0) and low weightage to the secondary structure score (1.5) and hydrophobicity (1.0) is optimal. The use of residue windows in alignment has been shown to dramatically increase the sensitivity (around 30% on a small dataset comprising 10% of total SCOP domains). The sensitivity of AlignHUSH algorithm in comparison to other HMM-HMM alignment methods HHSearch and PRC in an all-against-all comparison of SCOP 1.69 database demonstrates that AlignHUSH has better sensitivity than both HHSearch and PRC (approximately by 10% and 5% respectively). The alignment accuracy calculated as the ratio of correctly aligned residues and all alignment positions in BaliBASE alignments reveals that AlignHUSH algorithm provides an accuracy comparable or marginally higher than both HHSearch and PRC (25% for AlignHUSH and roughly 17% for both HHSearch and PRC). A few examples of structural relationships between SCOP families belonging to different folds and/or classes are presented in the chapter to illustrate the strength of AlignHUSH in detecting very remote relationships. Chapter 4 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important for obtaining better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Much effort has been taken by various investigators in bringing many proteins in the sequence databases within homology modeling distance with a protein of known structure. Structural genomics initiatives spend considerable effort in achieving this goal. The results from such experiments suggest that in many cases after the structure has been solved using X-ray crystallography or NMR methods, the protein is seen to have structural similarity to a protein of already known structure. Thus, an inability to detect such remote relationships severely impairs the efficiency of structural genomics initiatives. The development of the SUPFAM method was made earlier in the group to enable detection of distant relationships between Pfam families. In SUPFAM approach, relationships are detected by mapping the Pfam families to SCOP families. Further, using the implicit or explicit evolutionary relationship information present in the SCOP database relationships between Pfam families are detected. The work presented in this chapter is an improvement of previous development using the significantly more sensitive AlignHUSH method to uncover more relationships. The new database follows a procedure slightly different than the older SUPFAM database and hence is called SUPFAM+. The relative improvement brought by SUPFAM+ has been discussed in detail in the chapter. The methodology followed for the analysis is to first generate SUPFAM database by recognition of relationships between Pfam families and SCOP families using PSI BLAST / RPS BLAST. For the generation of SUPFAM+ database, recognition of relationships between Pfam families and SCOP families is done using AlignHUSH. The criteria are kept stringent at this stage to minimize the rate of false positives. In cases of a Pfam family mapping to two or more SCOP superfamilies, a semi-automated decision tree is used to assign the Pfam family to a single SCOP superfamily. Some of the Pfam families which remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families which are already mapped to a SCOP family. In the final step, the Pfam families still without a SCOP family mapping are mapped onto one another to form ‘Potential New Superfamilies’ (PNSF), which are excellent targets for structural genomics since none of the proteins in such PNSFs have a recognizable homologue of known structure. The clustering of Pfam families into Superfamilies belonging to SCOP 1.69 version, were then queried to check if a structure has been solved for these Pfam families subsequent to the release of the SCOP 1.69 database. The latest SCOP database reveals that for close to 87 Pfam families a structure was solved which is at best related at a SCOP superfamily level with a family present in SCOP 1.69. An analysis of the mappings provided by SUPFAM+ database reveals that the mappings are correct in 85% of the cases at the SCOP superfamily level. An in-depth analysis revealed that among the rest of the cases, only one can be adjudged as an incorrect mapping. Many of the inconsistent mappings were found to be due to the absence of the SCOP fold in the SCOP 1.69 release, although interestingly the mapping provided by SUPFAM+ database shows structural similarity to the actual fold for the Pfam family found subsequently. A straightforward comparison with a similar database (Pfam Clans database) reveals that the SUPFAM+ database could suggest four times more pairwise relationships between Pfam families than the Pfam Clans database. Thus, since the structural mappings provided in the SUPFAM+ database are very accurate the relationships found in the database could help in function annotation of uncharacterized protein families (explored in Chapter 5). The accuracy of mapping would be similar for the PNSFs, and hence these clusters can be excellent targets for structural genomics initiatives. The classiﬁcation of families based on sequence/structural similarities can also be useful for function annotation of families of uncharacterized proteins, and such an idea is explored in the next chapter. Chapter 5 describes the attempts made to obtain clues to the structure and/or function of the DUF (Domain of Unknown Function) families present in the Pfam database. Currently, the DUF families populate around 21% of the Pfam database (2260 out of 10340). Thus, although homologues for each of the proteins in these families can be recognized in sequence databases, the homology does not provide obvious insight into the function of these proteins. The annotation of such difficult targets is a major goal of computational biologists in the post-genomic era. The development of a sensitive profile-profile alignment method as part of this thesis, gives an excellent opportunity to increase the number of annotations for proteins, especially in the DUF families, since a profile for these families exists in the Pfam database. The method followed for the analysis is similar to the SUPFAM+ development, and involved generation of Pfam profiles compatible with the AlignHUSH method. For the analysis presented in the chapter, relationships found between DUF families and SCOP families were analyzed. In benchmarks using the AlignHUSH method, it was found that a Z score of 5.0 gives a 10% error rate, and a Z score of 7.5 gives an error rate of 1%, and hence a minimum Z score cutoff of 7.5 was used in the analysis. A very high Z score in AlignHUSH is usually seen in cases, when sequence identity is also high, so a maximum Z score cutoﬀ of 12.0 was used to find DUF families which are difficult to annotate using other profile based methods (such as PSI-BLAST). For some of the DUF families, subsequent structure determination of one of the proteins had been reported in literature, and these cases were used to assess the accuracy of structural annotation using AlignHUSH. In other cases, fold recognition was done using the PHYRE method to ensure that the structure mappings are corroborated by fold recognition. In all cases studied, the alignment of the DUF family with the SCOP family was generated and queried for conservation of active site residues reported for each homologous SCOP family in the CSA (Catalytic Site Atlas) database. The assessment on 8 DUF families for which structure was solved subsequent to the SCOP release used in the analysis, reveals that in all cases, the correct structure was identified using the AlignHUSH procedure. In the eight cases of validated structure annotation, the conservation of active site residues was seen pointing to the effectiveness of AlignHUSH and its use in function annotation. The 27 cases in which a structure for any one of the proteins in the DUF family is not known, the fold recognition attempts suggest that in all cases, the results from fold recognition corroborate the suggestion made by AlignHUSH. The alignments of each of the DUF families with the suggested homologous SCOP family reveals that in many cases the active site residues are not conserved or are substituted by different residues. An in-depth analysis of some cases reveals that the non-conservation of residues occurs between two SCOP families in the same SCOP superfamily. Thus, although structure annotation can be reliably provided for all the DUF families studied, the exact biochemical function could be detected only for those cases in which active site conservation is seen even among distantly related families (such as two SCOP families in the same SCOP superfamily). The development and application of methods for remote homology detection has been made successfully and it has been demonstrated in the first part of the thesis that there is scope for extending the limits of remote homology detection. The use of sequence derived information in aligning profiles makes the procedure generally applicable and has been applied successfully for the case of structure/function recognition in the DUF families. In the next part of the thesis, a method for prediction of protein-protein interactions between a host and pathogen organism and its application to three groups of pathogens is presented. Chapter 6 describes the development of a procedure for prediction of protein-protein interactions (PPI) between a pathogen and its host organism. In the past, prediction of PPI has been attempted for proteins of a given organism. This was often approached by identifying proteins of the organism of interest that are homologous to two interacting proteins of another organism. A study of conservation of interactions as a function of sequence identity has been made in the past by various groups, which reveal that homologues sharing a sequence identity greater than about 30% interact in similar way. This fact can be used, along with a high quality database of protein-protein interactions to predict interactions between proteins of same organism. The work done in this thesis is one of the first attempts at extending the idea to the prediction of interactions between two different organisms. Homology of proteins from a pathogen and its host to proteins which are known to interact with each other would suggest that the proteins from pathogen and host can interact. The feasibility of such an interaction to occur under in vivo conditions need to be addressed for biologically meaningful predictions. These issues have been dealt with in this part of the thesis. One of the main steps in the procedure for the prediction of PPI is identification of homologues of pathogen and host proteins to interacting proteins listed in PPI databases. Two template PPI databases have been used in this work. One of the databases is the DIP database which provides a list of interactions based on genome-scale yeast-two-hybrid data or small scale experiments. The other database used is the iPfam database which provides interaction templates (Pfam families) based on protein complexes of known structure present in Protein Data Bank (PDB). Thus, the two databases are both comprehensive and are of high quality. The search for homologues in the DIP database was made using PSI-BLAST with stringent cutoffs for various parameters to minimize false positives. The search in iPfam database is done using RPS-BLAST and MulPSSM using stringent cutoffs. The cutoffs for the searches were fixed based on an assessment of conservation of putative interacting residues in the host and pathogen proteins as compared to the protein complexes of known structure. The predictions made are analyzed manually to assess the importance to the pathogenesis of the disease under consideration. In this chapter, in order to obtain an idea about robustness of this approach, PPI prediction was made for the phage-bacteria system and the herpes virus – human system which have been experimentally studied extensively and hence opportunities exist to compare the “predictions” with experimental results. The prediction of phage – bacteria interactions suggests that the gross biological features of the pathogenesis have been captured in the predictions. The GO (Gene Ontology) based annotations for the bacterial proteins predicted to interact suggests that the predictions involve proteins participating in DNA replication and protein synthesis. Many of the known interactions such as between the lambda phage repressor and RecA protein of bacteria were also ‘predicted’ in the analysis. A few novel interactions were predicted. For example interaction between a tail component protein and a protein of unknown function, YeeJ in E.coli has been predicted. The prediction of interactions between Herpes Virus 8 and human host and its comparison to a set of experimentally veriﬁed interactions reported in literature suggested that close to 50% of the known interactions were ‘predicted’ by the procedure followed. A few novel cases of interaction between the viral proteins and the p53 protein have also been made which might help in understanding the tumorigenesis of the viral disease. A comparison between the procedure followed in this thesis and the results from another genome-scale method (proposed by Andrej Sali and coworkers) suggests that although the proteins involved in predicted interactions from two methods may diﬀer, the functions of the proteins concerned suggested by GO annotations are highly correlated (greater than 98%). In the next few chapters, the prediction of interactions for diﬀerent host-pathogen systems is described. In the Chapter 7, the prediction of PPI between a Eukaryotic malarial pathogen, P.falciparum and its human host is described. The malarial parasite was chosen because of the extensive work reported in the literature on this pathogen in the recent years. Also, the gene expression patterns in the pathogen are highly correlated to the human tissue types with each stage of the pathogen occurring in a distinct tissue type. Thus, the biological context of the PPI can be explicitly assessed, which makes this example a well suited case for the procedure described in the Chapter 6 of this thesis. The pathogen is important from a medical perspective since there has been a recent emergence of P.falciparum induced malaria which is unresponsive to conventional drugs. Thus, studies of this parasite have gained an importance in the post genomic era. The difficulty in identifying homologues of many of the P.falciparum proteins makes this a challenging case study. Prediction of PPI between the malarial parasite and the human proteins has been approached in the same way as described in Chapter 6, with the cutoffs in homology searches kept stringent. However, in this case effective use of available additional biological data has been possible. The tissue specific expression information for human proteins has been obtained from the Atlas of Human transcriptome, and the NCBI GEO database. The pathogen stage-specific expression data has been obtained from multiple genome-scale experiments reported in the literature. The subcellular localization of both human and pathogen proteins has been predicted and hence this information is given low weightage in subsequent analysis. The prediction of PPI between malarial parasite and human, resulted in a total of more than 30,000 interactions which were compatible in an in vivo condition according to the expression data. Further reduction in the set of predicted interactions was made by incorporating the subcellular localization predictions (reduced to around 2000 interactions). Manual analysis of each of these interactions taking aid from literature on malarial parasites reveals that many of the known PPI are also ‘predicted’ in the analysis such as the interaction between SSP2 protein of P.falciparum and human ICAMs. For many proteins known to be important for pathogenesis, such as the RESA antigen, novel interactions were predicted that could help in better understanding of the pathogen. For some of the novel predicted interactions, such as that between the parasite Plasmepsin and human Spectrin, there exists circumstantial experimental evidence of interaction. Among many other novel interactions, the procedure used could predict interactions for 441 ‘hypothetical proteins’ of unknown function coded in the genome of the pathogen. The comprehensive list of predictions made using the procedure and an exploration of its biological significance can lead to novel hypothesis regarding the parthenogenesis of malaria and hence the work presented in this chapter can be helpful for further experimental exploration of the pathogen. The success of the procedure in predicting known interactions as well as novel interactions in a Eukaryotic pathogen suggests that the procedure developed is generally applicable. However it must be pointed out that in many cases of host-pathogen systems, such extensive expression and localization data may not be available, which makes the analysis difficult due to the large number of interactions predicted. One of such difficult cases is the interactions between Mycobacterial species and human host which is described in the next chapter. Chapter 8 describes the prediction of PPI between human and M.tuberculosis as well as three pathogens closely related to M.tuberculosis. Each of the pathogens has seen to re-emerge due to drug resistance and other causes. M.tuberculosis is becoming a global problem due to the limited number of drugs available to treat TB, which is susceptible to resistance. M.leprae has also shown signs of emergence of drug resistance, whereas C.diptheriae another pathogen studied in this chapter is seen as an emerging pathogen in Eastern Europe and in Indian subcontinent. Nocardial infections have also seen a rise due to the prevalence of AIDS which leads to susceptibility to the Nocardia infections. Thus, there is a need to understand further the pathogens in this important family, in order to better direct drug development. An important area for such endeavors is the mapping of the PPI between the pathogens and the human host. The procedure developed as part of the thesis can be used to predict such interactions. The procedure for prediction of interactions is the same as followed in Chapter 6 and involves identifications of homologues for the pathogen and host proteins among the proteins listed in the two template datasets DIP and iPfam using PSI-BLAST and RPS-BLAST (MulPSSM). In addition to the homology to the proteins involved in PPI, information / prediction on subcellular localization is used to assess biological significance of the interaction. An experimentally derived dataset of exported proteins in the M.tuberculosis was used to supplement the predictions from PSORTb database that provides subcellular localization for bacterial proteins. In order to minimize the number of predictions explored manually and to maximize the biological relevance of predicted interactions,, the predictions were made only for proteins present on the membrane of the pathogen or which are exported into the host. Prediction of interactions between human proteins and the proteins of four pathogens studied revealed that, some of the interactions which were known from earlier experiments were “predicted” by the present procedure. For example, the M.leprae exported Serine protease is known to interact with Ras-like proteins in the human host, and this interaction was ‘predicted’. Among other predicted interactions, several novel interactions have been suggested for proteins important for pathogenesis such as the MPT70 protein of M.tuberculosis which has been predicted to interact with TGFβ associated proteins which could play an important role in the pathogenesis of the disease. Some of the human proteins are known to play important role in pathogenesis, especially the toll-like receptors. A C.diphtheriae protein Mycosin, has been predicted to interact with the toll-like receptors raising the possibility that the Mycosins may play an important role in pathogenesis. Several hypothetical proteins of unknown function in the pathogens have been predicted to interact with human proteins. A few of such cases from M.tuberculosis have been described in the thesis and these proteins are predicted to interact with proteins involved in post-transnational modification in the human host. The prediction of novel interactions along with known interactions in four bacterial species thus points to the fact that the procedure can be used for almost any host-pathogen pair. In the next chapter, the application of the method to three other bacterial species belonging to the Enterobacteriaciae family is presented. Chapter 9 describes the analysis performed on the predicted interactions between human and three pathogens in the Enterobact Protein Functions Bioactive Proteins Computational Biochemistry Protein-Protein Interactions Protein Homology Detection - Algorithms Protein Bioinformatics Protein Sequence Homology (Biology) Remote Homology Detection Biochemistry
892	Structural And Mechanistic Studies On Receptor Protein Tyrosine Phosphatases From Drosophila Melanogaster Madan, Lalima Lochan 09 1900 (has links) (PDF) Protein Tyrosine Phosphatases (PTPs) initiate, modulate and terminate key cellular processes by dephosphorylating phosphotyrosine (pY) residues on signaling proteins. The coordinated action of PTPs with their cognate tyrosine kinases is crucial for the maintenance of cellular homeostasis. Five Receptor Tyrosine Phosphatases (RPTPs) DLAR, PTP99A, PTP69D,PTP10D and PTP52F are involved in the axon guidance process of the fruit-fly Drosophila melanogaster. The receptors in these RPTPs comprise of Cell Adhesion Molecules (CAMs) whilethe cytosolic region contains the catalytic PTP domains. Extensive studies on the genetic interactions between these RPTPs reveal that these five RPTPs collaborate, compete or are partially redundant in some developmental contexts. While the genetic interactions between these RPTPs are well characterized, the role of domain-domain interactions and the mechanism(s) of substrate recognition are poorly understood. The aim of this study was to understand the molecular basis for these interactions using a combination of biophysical, biochemical and structural biology tools. This thesis is organized as follows: Chapter 1: The introductory chapter of this thesis highlights the mechanistic issues in signal transduction with an emphasis on the role of the RPTPs in the neuro-development of Drosophila melanogaster. The first part of this chapter describes the structural features and the catalytic mechanism of the PTP domain. This is followed by a description of the mechanisms that modulate the activity of a PTP domain. The latter part of the chapter summarizes the role ofthese RPTPs in axon guidance of Drosophila melanogaster. The interactions between the RPTPsbased on genetic data provide a mechanistic hypothesis that could be examined in vitro. The studies described in the subsequent chapters of this thesis were performed to evaluate this hypothesis. Chapter 2: This chapter reports our observations on the so-called construct dependence on the expression of recombinant PTP domains in Escherichia coli. This chapter details the strategies used to obtain recombinant PTP domains in a soluble form suitable for biochemical and structural studies. This study involved substantial optimization in the size of the protein and overexpression strategies to avoid inclusion-body formation. Five strains of E. coli as well as three variations in purification tags viz., poly-histidine peptide attachments at the N-and C-termini and a construct with Glutathione-S-transferase at the N-terminus were examined. In this study, we observed that inclusion of a 45 residue stretch at the N-terminus was crucial for the over-expression of the PTP domains, influencing both the solubility and the stability of these recombinant proteins. While the addition of negatively charged residues in the N-terminal extension could partially rationalize the improvement in the solubility of these constructs, conventional parameters like the proportion of order-promoting residues or the aliphatic index did not correlate with the improved biochemical characteristics. The findings in this chapter suggest that the inclusion of additional parameters like secondary structure propensities apart from rigid domain predictions could play a crucial role in obtaining a soluble recombinant protein upon expression in E. coli. Chapter 3: This chapter reports the crystal structure of the PTP domain of PTP10D and PTP10Dsubstrate/inhibitor complexes. These structural studies revealed aromatic ring stacking interactions that mediate substrate recruitment into the PTP active site. In particular, these studies revealed the role of conserved aromatic residue in Motif 1 (Phenylalaline 76 in case ofPTP10D). Mutation of Phenylalanine 76 residue to a Leucine (similar to the mutation found in the inactive distal PTP domains in other bi-domain PTPs) resulted in a sixty-fold decrease in the catalytic efficiency of the enzyme. Fluorescence kinetic measurements to monitor ligand binding showed a three fold increase in the half time of enzyme-ligand complex formation. These studies highlight the role of the KNRY loop in substrate recruitment at the active of the PTP domain and the role of this segment in modulating the kinetics of the enzyme-substrate complex formation. Chapter 4: This chapter describes a strategy to utilize protein-protein interaction data to identify putative peptide substrates for a given protein. This study was performed in collaboration with Shameer Khader and Prof. R. Sowdhamini at the National Center for Biological Sciences (NCBS).This integrated search approach, called ‘PeptideMine’ was developed into a web-server for experimental and computational biologists. The Peptide Mine strategy combines sequence searches in the 'interacting sequence space' of a protein using sequence patterns or functional motifs. A compilation of indices that describe the chemical and solubility properties of potential peptide substrates to facilitate investigation by in vitro or in silico studies is also obtained from this server. The biological significance of such a design-strategy was examined in the context of protein-peptide interactions in the case of RPTPs of Drosophila melanogaster. Chapter 5: In this chapter, we report an analysis of the influence of the membrane distal (D2) domain on the catalytic activity and substrate specificity of the membrane proximal (D1) domain using two bi-domain RPTPs as a model system. Biochemical studies reveal contrasting roles for the D2 domain of the Drosophila Leukocyte antigen Related (DLAR) and Protein Tyrosine Phosphatase on Drosophila chromosome band 99A (PTP99A). While D2 lowers the catalytic activity of the D1 domain in DLAR, the D2 domain of PTP99A leads to an increase in the catalytic activity of its D1 domain. Substrate specificity, on the other hand, is cumulative, whereby the individual specificities of the D1 and D2 domains contribute to the substrate specificity of these two-domain enzymes. Molecular dynamics simulations on structural models of DLAR and PTP99A revealed a conformational rationale for the experimental observations. These studies suggested that concerted structural changes mediate inter-domain communication resulting in either inhibitory or activating effects of the membrane distal PTP domain on the catalytic activity of the membrane proximal PTP domain. Chapter 6: This chapter describes biochemical studies to understand the role of the D2 domain of PTP99A. While the catalytic activity of PTP99A is localized to its membrane proximal (D1)domain, the inactive membrane distal (D2) domain influences the catalytic activity of the D1domain. Phosphatase activity, monitored using small molecule as well as peptide substrates, suggested that the D2 domain activates D1. Thermodynamic measurements on the bi-domain(D1-D2 protein) as well as single domain PTP99A protein constructs suggest that the presence of the inactive D2 domain influences the stability of the bi-domain protein. The mechanism by which the D2 domain activates and stabilizes the bi-domain protein is governed by a few interactions at the inter-domain interface. In particular, we note that mutating Lys990 at the interface attenuates inter-domain communication. This residue is located at a structurally equivalent position to the so-called allosteric site of a canonical PTP, PTP1B. These observations suggest functional optimization in bi-domain RPTPs wherein the inactive PTP domain modulates the catalytic activity of the bi-domain enzyme. Chapter 7: This chapter summarizes the experimental and computational studies on the Drosophila melanogaster PTP domains. The salient features of the experimental data that revealed hitherto uncharacterized sequence-structure relationships in the conserved PTP domain are highlighted. The latter part of this chapter briefly suggests the scope of future research in this area based on some of the findings reported in this thesis. Appendix : This thesis has an appendix section with four parts. These comprise of technical details and auxiliary work that was not included in the main text of the thesis. Appendix I describes cloning strategies, purification protocols and a list of all recombinant proteins used in this study. Appendix II describes the standardization of the ‘Three Phase partitioning’ protocol for refolding and solubilization of protein from inclusion bodies. Appendix III includes theimmunochemical work performed to elucidate the localization of PTP10D in Drosophila embryos. Appendix IV describes the work on a Quercetin 2,3 Dioxygenase from Bacillus subtilis with an emphasis on the role of metal ions in modulating catalytic activity in this class of proteins. Drosophila melanogaster Protein (Receptors) Protein Tyrosine Phosphorylation Protein Tyrosine Phosphatase (PTP) Protein Tyrosine Phosphatases (PTPs) PTP99A Biochemistry
893	Protein surface charge of trypsinogen changes its activation pattern Buettner, Karin, Kreisig, Thomas, Sträter, Norbert, Züchner, Thole January 2014 (has links) Background: Trypsinogen is the inactive precursor of trypsin, a serine protease that cleaves proteins and peptides after arginine and lysine residues. In this study, human trypsinogen was used as a model protein to study the influence of electrostatic forces on protein–protein interactions. Trypsinogen is active only after its eight-amino-acid-long activation peptide has been cleaved off by another protease, enteropeptidase. Trypsinogen can also be autoactivated without the involvement of enteropeptidase. This autoactivation process can occur if a trypsinogen molecule is activated by another trypsin molecule and therefore is based on a protein–protein interaction. Results: Based on a rational protein design based on autoactivation-defective guinea pig trypsinogen, several amino acid residues, all located far away from the active site, were changed to modify the surface charge of human trypsinogen. The influence of the surface charge on the activation pattern of trypsinogen was investigated. The autoactivation properties of mutant trypsinogen were characterized in comparison to the recombinant wild-type enzyme. Surface-charged trypsinogen showed practically no autoactivation compared to the wild-type but could still be activated by enteropeptidase to the fully active trypsin. The kinetic parameters of surface-charged trypsinogen were comparable to the recombinant wild-type enzyme. Conclusion: The variant with a modified surface charge compared to the wild-type enzyme showed a complete different activation pattern. Our study provides an example how directed modification of the protein surface charge can be utilized for the regulation of functional protein–protein interactions, as shown here for human trypsinogen. info:eu-repo/classification/ddc/572 ddc:572
894	Functional Consequences of Conjugating Polymers to Protein and Study of Biomarkers for Cell Death Pathway Rahman, Monica Sharfin 14 July 2022 (has links) No description available. Polymer Chemistry Biochemistry Analytical Chemistry Protein-polymer bioconjugation Protein activity Protein stability Protein-protein interaction inhibition Spike protein capture Cell death Biomarker
895	Graph-based protein-protein interaction prediction in Saccharomyces cerevisiae Paradesi, Martin Samuel Rao January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / William H. Hsu / The term 'protein-protein interaction (PPI)' refers to the study of associations between proteins as manifested through biochemical processes such as formation of structures, signal transduction, transport, and phosphorylation. PPI play an important role in the study of biological processes. Many PPI have been discovered over the years and several databases have been created to store the information about these interactions. von Mering (2002) states that about 80,000 interactions between yeast proteins are currently available from various high-throughput interaction detection methods. Determining PPI using high-throughput methods is not only expensive and time-consuming, but also generates a high number of false positives and false negatives. Therefore, there is a need for computational approaches that can help in the process of identifying real protein interactions. Several methods have been designed to address the task of predicting protein-protein interactions using machine learning. Most of them use features extracted from protein sequences (e.g., amino acids composition) or associated with protein sequences directly (e.g., GO annotation). Others use relational and structural features extracted from the PPI network, along with the features related to the protein sequence. When using the PPI network to design features, several node and topological features can be extracted directly from the associated graph. In this thesis, important graph features of a protein interaction network that help in predicting protein interactions are identified. Two previously published datasets are used in this study. A third dataset has been created by combining three PPI databases. Several classifiers are applied on the graph attributes extracted from protein interaction networks of these three datasets. A detailed study has been performed in this present work to determine if graph attributes extracted from a protein interaction network are more predictive than biological features of protein interactions. The results indicate that the performance criteria (such as Sensitivity, Specificity and AUC score) improve when graph features are combined with biological features. Protein-protein interactions Machine Learning Bioinformatics Computer Science (0984)
896	Proteomic investigation of the MDM2 interactome and linear motif interactions Nicholson, Judith January 2011 (has links) The oncoprotein MDM2 has an integral role in cancer development via multiple signalling pathways. Two proteomic mass spectrometry screens, label-free with spectral counting quantitation and 8-plex iTRAQ were used to identify proteins up or downregulated over time by the MDM2 targeting drug Nutlin. A subset of previously identified MDM2 binding partners were identified as altered after Nutlin treatment, along with proteins which have not as yet been linked to MDM2 or p53. Proteins altered two hours after Nutlin treatment were screened for sequence similarity to an MDM2 binding consensus motif based on the BOX-I region of p53. Peptides corresponding to this motif were validated for MDM2 binding, and the mode of binding investigated using competition ELISA and thermal denaturation assays. Known MDM2 ligands such as Nutlin were shown to have a range of effects on the binding of these newly identified MDM2 peptides, which may be attributed to allosteric regulation of MDM2. The effects of Nutlin on two full length proteins identified by the MS screens, CypB and NPM, were confirmed in vivo. In vitro binding of MDM2 to CypB and PK, which contain BOX-I like motifs, was also demonstrated validating proteomic mass spectrometry screens as a method to identify new protein-protein interactions. To further investigate the potential of linear motifs to modulate protein-protein interactions, a peptide aptamer targeting the protein AGR2 was tested for effect on AGR2 and p53 in a cancer cell line. 572
897	Systematic analysis of protein-protein interactions of oncogenic Human Papilloma Virus Gundurao, Ramya Mavinkaihalli January 2013 (has links) Human papilloma virus (HPV) is a ubiquitous virus implicated in a growing list of cancers, particularly cervical cancer‐ the second most common cancer among women worldwide. Although persistent infection with high‐risk oncogenic HPVs such as types ‐16 or ‐18 is necessary, additional factors like co‐infection with other viruses can play a role in cancer progression. Protein‐protein interactions play a central role in the infection, survival and proliferation of the virus in the host. Although some interactions of HPV proteins are well characterised, it is essential to discover other key viral interactions to further improve our understanding of the virus and to use this knowledge for the development of newer biomarkers and therapeutics. The aim of this study was to systematically analyse the interactions of HPV‐16 proteins using yeast two‐hybrid (Y2H). To achieve this, a clone collection of the viral proteome was generated by recombinatorial cloning and three independent Y2H screens were performed: (i) Intra‐viral screen to identify interactions among the HPV‐16 proteins; (ii) Inter‐viral screen to identify interactions with proteins of Herpes Simplex Virus (HSV) which is suggested to be a co‐factor; and (iii) Virus‐host screen to identify novel cellular binding partners. The intra‐viral Y2H screen confirmed some of the previously known interactions and also identified binding of the E1 and E7 proteins. Deletion mutagenesis was performed to map the interaction domains to the amino‐terminal 92 amino acids of E1 and carboxy‐terminal CxxC domain of E7. Replication assays suggest a possible repression of E1‐mediated episomal replication by direct binding of E7. The inter‐viral Y2H screen identified interactions of HPV proteins with seventeen HSV‐1 proteins including transcriptional regulator ICP4 and neurovirulance factor ICP34.5. The biological relevance of these interactions in the context of co‐infection is discussed. The virus‐host screen performed against a human cDNA library identified 54 interactions, a subset of which was validated by biochemical pull‐down assays. The functional relevance of an interaction between E7 and a proto‐oncogene spermatogenic leucine zipper protein (SPZ1) was further investigated suggesting a role of SPZ1 in E7‐mediated cell proliferation. The work presented in this thesis identifies several novel interactions of HPV proteins. Future work will involve the in‐depth elucidation of biological relevance of these interactions. In particular, the interactions of E7 with E1 and SPZ1 are of great interest to improve our understanding of the life cycle and pathogenesis of the virus which can be applied for improved strategies of prevention and treatment of malignancies caused by HPV. 616.99
898	Study of the N-terminal domains of MDM2 and MDM4, and their potential for targeting by small-molecule drugs Sanchez Perez, Maria Concepcion January 2011 (has links) The MDM2 and MDM4 oncoproteins are both involved in regulating the tumour suppressor, p53. While the MDM2–p53 interface is structurally and biophysically well characterised, the MDM4-p53 interaction has only recently attracted researchers’ attentions. The goal of this project was to establish structural and chemical ground rules for the disruption of the interactions between the N-terminal domains of MDM2/4 and p53, which is an attractive anticancer strategy. In the current work, successful recombinant production and purification protocols for both the N-terminal domains of MDM2 (i.e. MDM2-N, residues 11-118) and MDM4 (MDM4-N, residues 14-111) have been established, yielding protein in sufficient quantity and quality for analysis using nuclear magnetic resonance spectroscopy (NMR). Two screening strategies were employed to identify small-molecule antagonists of the MDM2-N:p53 interaction. First, a virtual screening exercise identified several compounds that were shown (by NMR) to bind to MDM2-N with μM KDs. Docking studies supported by NMR chemical shift perturbation analysis suggested proposals for binding modes. The results are discussed in relation to the previously reported binding to MDM2-N of well-characterised inhibitors of the MDM2:p53 interaction such as Nutlin-3. Second, a fragment-based library was screened against MDM2-N using TROSY-type NMR spectra to monitor binding. Several hits were identified and the results are discussed with regard to the “druggability” of the MDM2-N p53 interaction. To better understand the p53-binding groove of MDM4-N, multidimensional NMR was used to investigate the structure and backbone dynamics of double-isotopically labelled samples of MDM4-N, both free (i.e. apo-MDM4-N) and in complexes with a p53-derived peptide or Nutlin-3. The apo-MDM4-N is more conformationally dynamic than MDM2, since it contains unstructured regions. These regions appear to become structured upon binding of a ligand. MDM4 appears to bind its ligand through conformational selection and/or an induced fit mechanism involving reorganization of key sub-sites within the binding groove. This study highlighted Abstract differences between Nutlin-3 and peptide binding that suggest the rational design of specific inhibitors of the MDM4:p53 interaction. 572
899	Identification and characterization of lysine-rich proteins and starch biosynthesis genes in the opaque2 mutant by transcriptional and proteomic analysis Jia, Mo, Wu, Hao, Clay, Kasi, Jung, Rudolf, Larkins, Brian, Gibbon, Bryan January 2013 (has links) BACKGROUND:The opaque2 mutant is valuable for producing maize varieties with enhanced nutritional value. However, the exact mechanisms by which it improves protein quality and creates a soft endosperm texture are unclear. Given the importance of improving nutritional quality in grain crops, a better understanding of the physiological basis for these traits is necessary.RESULTS:In this study, we combined transcript profiling and proteomic analysis to better understand which genes and proteins are altered by opaque2 in the W64A inbred line. These analyses showed that the accumulation of some lysine-rich proteins, such as sorbitol dehydrogenase and glyceraldehyde3-phosphate dehydrogenase, was increased in mature kernels and may contribute substantially to the lysine content of opaque2 endosperm. Some defense proteins such as beta-glucosidase aggregating factor were strongly down regulated and may be regulated directly by opaque2. The mutant also had altered expression of a number of starch biosynthesis genes and this was associated with a more highly crystalline starch.CONCLUSIONS:The results of these studies revealed specific target genes that can be investigated to further improve nutritional quality and agronomic performance of high lysine maize lines, particularly those based on the presence of the opaque2 mutation. Alteration of amylopectin branching patterns in opaque2 starch could contribute to generation of the soft, starchy endosperm. Opaque endosperm Opaque2 Quality protein maize Starch biosynthesis Protein quality
900	A Study on the interaction between Gadd153 mRNA and HuR protein in HeLa cells upon treatment with 4HPR Leung, Mei-chi., 梁美姿. January 2008 (has links) published_or_final_version / Biological Sciences / Master / Master of Philosophy Retinoids. HeLa cells. Gene expression. Cytogenetics. Protein-protein interactions.

Search results