Global ETD Search

21	Protein fold evolution on completed genomes : distinguishing between young and old folds Abeln, Sanne January 2007 (has links) We review fold usage on completed genomes in order to explore protein structure evolution and assess the evolutionary relevance of current structural classification systems (SCOP and CATH). We assign folds on a set of 150 completed genomes using fold recognition methods (PSI-BLAST, SUPERFAMILY and Gene3D). The patterns of presence or absence of folds on genomes gives us insights into the relationships between folds and how we have arrived at the set of folds we see today. In particular, we develop a technique to estimate the relative ages of a protein fold based on genomic occurrence patterns in a phylogeny. We find that SCOP's `alpha/beta' class has relatively fewer distinct folds on large genomes, and that folds of this class tend to be older; folds of SCOP's `small protein' class follow opposite trends. Usage patterns show that folds with many copies on a genome are generally old, but that old folds do not necessarily have many copies. In addition, longer domains tend to be older and hydrophobic amino acids have high propensities for older folds whereas, polar - but non-charged - amino acids are associated with younger folds. Generally domains with stabilising features tend to be older. We also show that the reliability of fold recognition methods may be assessed using occurrence patterns. We develop a method, that detects false positives by identifying isolated occurrences in a phylogeny of species, and is able to improve genome wide fold recognition assignment sets. We use a structural fragment library to investigate evolutionary links between protein folds. We show that 'older' folds have relatively more such links than 'younger' folds. This correlation becomes stronger for longer fragment lengths suggesting that such links may reflect evolutionary relatedness. 572.633
22	Development And Applications Of Computational Methods To Aid Recognition Of Protein Functions And Interactions Krishnadev, O 03 1900 (has links) (PDF) Protein homology detection has played a central role in the understanding of evolution of protein structures, functions and interactions. Many of the developments in protein bioinformatics can be traced back to an initial step of homology detection. It is not surprising then, that extension of remote homology detection has gained a lot of attention in the recent past. The explosive growth of genome sequences and the slow pace of experimental techniques have thrust computational analyses into the limelight. It is not surprising to see that many of the traditional experimental areas such as gene expression analysis, recognition of function and recognition of 3-D structure have been attempted effectively by computational approaches. The idea behind homology-based bioinformatics work is the fact that the hereditary mechanisms ensure that the parent generation gives rise to a very similar offspring generation. Since biological functions of proteins of an organism are product of expression of its genetic material, it follows that the genes of an organism should show conservation from one generation to another (with very few mutations if parent and offspring generation have to be nearly identical) Thus, if it can be established that two proteins have descended from a common ancestor, then it can be inferred that the biological functions of the two proteins could be very similar. Thus, homology-based information transfer from one protein to another has become a commonly used procedure in protein bioinformatics. The ability to recognize homologs of a protein solely from amino acid sequences has seen a steady increase in the last two decades. However, currently, still there are a large number of proteins of known amino acid sequence and yet unknown function . Thus, a major goal of current computational work is to extend the limits of remote homology detection to enable the functional characterization of proteins of unknown function. Since proteins do not work in isolation in a cell, it has become essential to understand the in vivo context of the function of a protein. For this purpose, it is essential to have an understanding of all the molecules that interact with a particular protein. Thus, another major area of bioinformatics has been to integrate biological information with protein-protein interactions to enable a better understanding of the molecular processes. Such attempts have been made successfully for the interaction network of proteins within an organism. The extension of the interaction network analysis to a host-pathogen scenario can lead to useful insights into pathophysiology of diseases. The work done as part of the thesis explores both the ideas mentioned above, namely, the extension of limits of remote homology detection and prediction of protein-protein interactions between a pathogen and its host. Since the work can logically be divided into two different areas though there is a connection, the thesis is organized as two parts. The first part of the thesis (comprising Chapters 2, 3, 4 and 5) describes the development and application of remote homology detection tools for function/structure annotation. The second part of the thesis (comprising of Chapters 6, 7, 8 and 9) describes the development and application of a homology-based procedure for detection of host-pathogen protein-protein interactions. Chapter 1 provides a background and literature survey in the areas of homology detection and prediction of protein-protein interactions. It is argued that homology-based information transfer is currently an important tool in the prediction and recognition of protein structures, functions and interactions. The development of remote homology detection methods and its effect on function recognition has been highlighted. Recent work in the area of prediction of protein-protein interactions using homology to known interaction templates is described and it is implied to be a successful approach for prediction of protein-protein interactions on a genome scale. The importance of further improvements in remote homology detection (as done in the first part of the thesis), is emphasized for annotation of proteins in newly sequenced genomes. The importance of application of homology detection methods in predicting protein-protein interactions across host-pathogen organisms is also explored. Chapter 2 analyzes the performance of the PSI-BLAST, one of the well-known and very effective approaches for recognition of related proteins, for remote homology detection. The chapter describes in detail the working of the PSI-BLAST algorithm and focuses on three parameters that determine the time required for searching in a large database, and also provide a ceiling for the sensitivity of the search procedure. The parameters that have been analyzed are the window size for two-hit method, the threshold for extension of an initial hit to dynamic programming and the extent of dependence on the query as encompassed in the profile generation step. The procedure followed for the analysis is to consider a large database of known evolutionary relationships (SCOP database was chosen for the analysis), and use the PSI-BLAST program at different values of three parameters to find out the effect on sensitivity (defined as the normalized number of correct SCOP superfamily relationships found in a search), and the time required for completion of the search. For the demonstration of the effect on the query dependence, a multiple sequence alignment (MSA) of a SCOP family (generated from all family sequences using ClustalW), was used with multiple queries to derive profiles in PSI-BLAST runs. The increase in sensitivity and the increase in time required for completion of each search were then monitored. The effect of changing the two PSI-BLAST internal parameters of score threshold for extension of word hits and the window size for the two-hit method do not result in a significant increase in sensitivity. Since PSI-BLAST uses the amino acid residues present in the query sequence to derive the Position Specific Scoring Matrix (PSSM) parameters, there is a strong query dependence on the sensitivity of each PSSM. Using multiple PSSMs derived from a single MSA can thus help overcome the query dependence and increase the sensitivity. In this Chapter such an approach, named as MulPSSM, has been demonstrated to have higher sensitivity than single profiles approach, (by up to two times more) in a benchmark dataset of 100 randomly chosen SCOP folds. Strategies to optimize sensitivity and the time required in searching MulPSSM have been explored and it is found that use of a non-redundant set of queries to generate MulPSSM can reduce the time required for each search while not affecting the sensitivity by a large degree. The application of the MulPSSM approach in function annotation of proteins in completely sequenced genomes was explored by searching genomic sequences in a MulPSSM database of Pfam families. The association of function to proteins has been assessed when both single profile per family database and MulPSSM database of families were used. It is found that in a comprehensive list of 291 genomes of Prokaryotes, 44 genomes of Eukaryotes and 40 genomes of Archea, that on an average MulPSSM is able to identify evolutionary relationships for 10% more proteins in a genome than single profiles-based approach. Such an enhancement in the recognition of evolutionary relationships, which has an implication in obtaining clues to functions, can help in more efficient exploration of newly sequenced genomes. Identification of evolutionary relationships involving some of the proteins of M. tuberculosis and M. leprae has been possible due to the use of multiple profiles search approach which is discussed in this chapter. The examples of annotations provided in the chapter include enzymes that are involved in glyco lipids synthesis which are vital for the survival of the pathogens inside the host and such annotations can help in expanding our knowledge of these processes. Chapter 3 describes the development and assessment of a sensitive remote homology detection method. The sensitivity of remote homology detection methods has been steadily increasing in the past decade and profile analysis has become a mainstay of such efforts. The profile is a probabilistic model of substitutions allowed at each position in a sequence family, and hence captures the essential features of a family. Alignment of two such profiles is thus considered to provide a more sensitive and accurate method than the alignment of two sequences. The performance of HMMs (Hidden Markov Models) has been shown to be higher than PSSMs (Position Specific Scoring Matrix). Thus, a profile-profile alignment using HMMs can in principle give the best possible sensitivity in remote homology detection. Many investigators have incorporated residue conservation and secondary structure information to align two HMMs, and such additional information has been demonstrated to provide better sensitivity in remote homology detection (for instance in the HHSearch program). The work presented in Chapter 3, extends the idea of incorporating additional information such as explicit hydrophobicity information, along with conservation and predicted secondary structure over a window of Multiple Sequence Alignment (MSA) columns in aligning HMMs. The new algorithm is named AlignHUSH (Alignment of HMMs Using Secondary structure and Hydrophobicity). The HMMs used in the work are derived from structural alignments using HMMER program and are taken from the publicly available superfamily database which provides HMMs for all the SCOP families. The HMMs are modified into two-state HMMs by collapsing the ‘insert’ and ‘delete’ states into a ‘non-match’ state in the AlignHUSH algorithm. The two state HMMs enables the use of dynamic programming methods and keeps intact the position-specific gap penalties. The two state HMMs can be more readily extended to alignment of PSSMs. The incorporation of secondary structure information is made using secondary structure predictions made using PSIPRED program. The hydrophobicity information is calculated using the Kyte Doolittle hydrophobicity values. The alignment is generated by scoring each position using the values present in a window of residues. The assessment of alignment accuracy is done by comparison to manually curated alignments present in the BaliBASE database. A detailed description of the optimization steps followed for obtaining the values for each score contribution (conservation, secondary structure and hydrophobicity) is provided. The assessment revealed that a high weightage to conservation score (18.0) and low weightage to the secondary structure score (1.5) and hydrophobicity (1.0) is optimal. The use of residue windows in alignment has been shown to dramatically increase the sensitivity (around 30% on a small dataset comprising 10% of total SCOP domains). The sensitivity of AlignHUSH algorithm in comparison to other HMM-HMM alignment methods HHSearch and PRC in an all-against-all comparison of SCOP 1.69 database demonstrates that AlignHUSH has better sensitivity than both HHSearch and PRC (approximately by 10% and 5% respectively). The alignment accuracy calculated as the ratio of correctly aligned residues and all alignment positions in BaliBASE alignments reveals that AlignHUSH algorithm provides an accuracy comparable or marginally higher than both HHSearch and PRC (25% for AlignHUSH and roughly 17% for both HHSearch and PRC). A few examples of structural relationships between SCOP families belonging to different folds and/or classes are presented in the chapter to illustrate the strength of AlignHUSH in detecting very remote relationships. Chapter 4 describes a database of evolutionary relationships identified between Pfam families. The grouping of Pfam families is important for obtaining better understanding on evolutionary relationships and in obtaining clues to functions of proteins in families of yet unknown function. Much effort has been taken by various investigators in bringing many proteins in the sequence databases within homology modeling distance with a protein of known structure. Structural genomics initiatives spend considerable effort in achieving this goal. The results from such experiments suggest that in many cases after the structure has been solved using X-ray crystallography or NMR methods, the protein is seen to have structural similarity to a protein of already known structure. Thus, an inability to detect such remote relationships severely impairs the efficiency of structural genomics initiatives. The development of the SUPFAM method was made earlier in the group to enable detection of distant relationships between Pfam families. In SUPFAM approach, relationships are detected by mapping the Pfam families to SCOP families. Further, using the implicit or explicit evolutionary relationship information present in the SCOP database relationships between Pfam families are detected. The work presented in this chapter is an improvement of previous development using the significantly more sensitive AlignHUSH method to uncover more relationships. The new database follows a procedure slightly different than the older SUPFAM database and hence is called SUPFAM+. The relative improvement brought by SUPFAM+ has been discussed in detail in the chapter. The methodology followed for the analysis is to first generate SUPFAM database by recognition of relationships between Pfam families and SCOP families using PSI BLAST / RPS BLAST. For the generation of SUPFAM+ database, recognition of relationships between Pfam families and SCOP families is done using AlignHUSH. The criteria are kept stringent at this stage to minimize the rate of false positives. In cases of a Pfam family mapping to two or more SCOP superfamilies, a semi-automated decision tree is used to assign the Pfam family to a single SCOP superfamily. Some of the Pfam families which remain without a mapping to a SCOP family are mapped indirectly to a SCOP family by identifying relationships between such Pfam families and other Pfam families which are already mapped to a SCOP family. In the final step, the Pfam families still without a SCOP family mapping are mapped onto one another to form ‘Potential New Superfamilies’ (PNSF), which are excellent targets for structural genomics since none of the proteins in such PNSFs have a recognizable homologue of known structure. The clustering of Pfam families into Superfamilies belonging to SCOP 1.69 version, were then queried to check if a structure has been solved for these Pfam families subsequent to the release of the SCOP 1.69 database. The latest SCOP database reveals that for close to 87 Pfam families a structure was solved which is at best related at a SCOP superfamily level with a family present in SCOP 1.69. An analysis of the mappings provided by SUPFAM+ database reveals that the mappings are correct in 85% of the cases at the SCOP superfamily level. An in-depth analysis revealed that among the rest of the cases, only one can be adjudged as an incorrect mapping. Many of the inconsistent mappings were found to be due to the absence of the SCOP fold in the SCOP 1.69 release, although interestingly the mapping provided by SUPFAM+ database shows structural similarity to the actual fold for the Pfam family found subsequently. A straightforward comparison with a similar database (Pfam Clans database) reveals that the SUPFAM+ database could suggest four times more pairwise relationships between Pfam families than the Pfam Clans database. Thus, since the structural mappings provided in the SUPFAM+ database are very accurate the relationships found in the database could help in function annotation of uncharacterized protein families (explored in Chapter 5). The accuracy of mapping would be similar for the PNSFs, and hence these clusters can be excellent targets for structural genomics initiatives. The classiﬁcation of families based on sequence/structural similarities can also be useful for function annotation of families of uncharacterized proteins, and such an idea is explored in the next chapter. Chapter 5 describes the attempts made to obtain clues to the structure and/or function of the DUF (Domain of Unknown Function) families present in the Pfam database. Currently, the DUF families populate around 21% of the Pfam database (2260 out of 10340). Thus, although homologues for each of the proteins in these families can be recognized in sequence databases, the homology does not provide obvious insight into the function of these proteins. The annotation of such difficult targets is a major goal of computational biologists in the post-genomic era. The development of a sensitive profile-profile alignment method as part of this thesis, gives an excellent opportunity to increase the number of annotations for proteins, especially in the DUF families, since a profile for these families exists in the Pfam database. The method followed for the analysis is similar to the SUPFAM+ development, and involved generation of Pfam profiles compatible with the AlignHUSH method. For the analysis presented in the chapter, relationships found between DUF families and SCOP families were analyzed. In benchmarks using the AlignHUSH method, it was found that a Z score of 5.0 gives a 10% error rate, and a Z score of 7.5 gives an error rate of 1%, and hence a minimum Z score cutoff of 7.5 was used in the analysis. A very high Z score in AlignHUSH is usually seen in cases, when sequence identity is also high, so a maximum Z score cutoﬀ of 12.0 was used to find DUF families which are difficult to annotate using other profile based methods (such as PSI-BLAST). For some of the DUF families, subsequent structure determination of one of the proteins had been reported in literature, and these cases were used to assess the accuracy of structural annotation using AlignHUSH. In other cases, fold recognition was done using the PHYRE method to ensure that the structure mappings are corroborated by fold recognition. In all cases studied, the alignment of the DUF family with the SCOP family was generated and queried for conservation of active site residues reported for each homologous SCOP family in the CSA (Catalytic Site Atlas) database. The assessment on 8 DUF families for which structure was solved subsequent to the SCOP release used in the analysis, reveals that in all cases, the correct structure was identified using the AlignHUSH procedure. In the eight cases of validated structure annotation, the conservation of active site residues was seen pointing to the effectiveness of AlignHUSH and its use in function annotation. The 27 cases in which a structure for any one of the proteins in the DUF family is not known, the fold recognition attempts suggest that in all cases, the results from fold recognition corroborate the suggestion made by AlignHUSH. The alignments of each of the DUF families with the suggested homologous SCOP family reveals that in many cases the active site residues are not conserved or are substituted by different residues. An in-depth analysis of some cases reveals that the non-conservation of residues occurs between two SCOP families in the same SCOP superfamily. Thus, although structure annotation can be reliably provided for all the DUF families studied, the exact biochemical function could be detected only for those cases in which active site conservation is seen even among distantly related families (such as two SCOP families in the same SCOP superfamily). The development and application of methods for remote homology detection has been made successfully and it has been demonstrated in the first part of the thesis that there is scope for extending the limits of remote homology detection. The use of sequence derived information in aligning profiles makes the procedure generally applicable and has been applied successfully for the case of structure/function recognition in the DUF families. In the next part of the thesis, a method for prediction of protein-protein interactions between a host and pathogen organism and its application to three groups of pathogens is presented. Chapter 6 describes the development of a procedure for prediction of protein-protein interactions (PPI) between a pathogen and its host organism. In the past, prediction of PPI has been attempted for proteins of a given organism. This was often approached by identifying proteins of the organism of interest that are homologous to two interacting proteins of another organism. A study of conservation of interactions as a function of sequence identity has been made in the past by various groups, which reveal that homologues sharing a sequence identity greater than about 30% interact in similar way. This fact can be used, along with a high quality database of protein-protein interactions to predict interactions between proteins of same organism. The work done in this thesis is one of the first attempts at extending the idea to the prediction of interactions between two different organisms. Homology of proteins from a pathogen and its host to proteins which are known to interact with each other would suggest that the proteins from pathogen and host can interact. The feasibility of such an interaction to occur under in vivo conditions need to be addressed for biologically meaningful predictions. These issues have been dealt with in this part of the thesis. One of the main steps in the procedure for the prediction of PPI is identification of homologues of pathogen and host proteins to interacting proteins listed in PPI databases. Two template PPI databases have been used in this work. One of the databases is the DIP database which provides a list of interactions based on genome-scale yeast-two-hybrid data or small scale experiments. The other database used is the iPfam database which provides interaction templates (Pfam families) based on protein complexes of known structure present in Protein Data Bank (PDB). Thus, the two databases are both comprehensive and are of high quality. The search for homologues in the DIP database was made using PSI-BLAST with stringent cutoffs for various parameters to minimize false positives. The search in iPfam database is done using RPS-BLAST and MulPSSM using stringent cutoffs. The cutoffs for the searches were fixed based on an assessment of conservation of putative interacting residues in the host and pathogen proteins as compared to the protein complexes of known structure. The predictions made are analyzed manually to assess the importance to the pathogenesis of the disease under consideration. In this chapter, in order to obtain an idea about robustness of this approach, PPI prediction was made for the phage-bacteria system and the herpes virus – human system which have been experimentally studied extensively and hence opportunities exist to compare the “predictions” with experimental results. The prediction of phage – bacteria interactions suggests that the gross biological features of the pathogenesis have been captured in the predictions. The GO (Gene Ontology) based annotations for the bacterial proteins predicted to interact suggests that the predictions involve proteins participating in DNA replication and protein synthesis. Many of the known interactions such as between the lambda phage repressor and RecA protein of bacteria were also ‘predicted’ in the analysis. A few novel interactions were predicted. For example interaction between a tail component protein and a protein of unknown function, YeeJ in E.coli has been predicted. The prediction of interactions between Herpes Virus 8 and human host and its comparison to a set of experimentally veriﬁed interactions reported in literature suggested that close to 50% of the known interactions were ‘predicted’ by the procedure followed. A few novel cases of interaction between the viral proteins and the p53 protein have also been made which might help in understanding the tumorigenesis of the viral disease. A comparison between the procedure followed in this thesis and the results from another genome-scale method (proposed by Andrej Sali and coworkers) suggests that although the proteins involved in predicted interactions from two methods may diﬀer, the functions of the proteins concerned suggested by GO annotations are highly correlated (greater than 98%). In the next few chapters, the prediction of interactions for diﬀerent host-pathogen systems is described. In the Chapter 7, the prediction of PPI between a Eukaryotic malarial pathogen, P.falciparum and its human host is described. The malarial parasite was chosen because of the extensive work reported in the literature on this pathogen in the recent years. Also, the gene expression patterns in the pathogen are highly correlated to the human tissue types with each stage of the pathogen occurring in a distinct tissue type. Thus, the biological context of the PPI can be explicitly assessed, which makes this example a well suited case for the procedure described in the Chapter 6 of this thesis. The pathogen is important from a medical perspective since there has been a recent emergence of P.falciparum induced malaria which is unresponsive to conventional drugs. Thus, studies of this parasite have gained an importance in the post genomic era. The difficulty in identifying homologues of many of the P.falciparum proteins makes this a challenging case study. Prediction of PPI between the malarial parasite and the human proteins has been approached in the same way as described in Chapter 6, with the cutoffs in homology searches kept stringent. However, in this case effective use of available additional biological data has been possible. The tissue specific expression information for human proteins has been obtained from the Atlas of Human transcriptome, and the NCBI GEO database. The pathogen stage-specific expression data has been obtained from multiple genome-scale experiments reported in the literature. The subcellular localization of both human and pathogen proteins has been predicted and hence this information is given low weightage in subsequent analysis. The prediction of PPI between malarial parasite and human, resulted in a total of more than 30,000 interactions which were compatible in an in vivo condition according to the expression data. Further reduction in the set of predicted interactions was made by incorporating the subcellular localization predictions (reduced to around 2000 interactions). Manual analysis of each of these interactions taking aid from literature on malarial parasites reveals that many of the known PPI are also ‘predicted’ in the analysis such as the interaction between SSP2 protein of P.falciparum and human ICAMs. For many proteins known to be important for pathogenesis, such as the RESA antigen, novel interactions were predicted that could help in better understanding of the pathogen. For some of the novel predicted interactions, such as that between the parasite Plasmepsin and human Spectrin, there exists circumstantial experimental evidence of interaction. Among many other novel interactions, the procedure used could predict interactions for 441 ‘hypothetical proteins’ of unknown function coded in the genome of the pathogen. The comprehensive list of predictions made using the procedure and an exploration of its biological significance can lead to novel hypothesis regarding the parthenogenesis of malaria and hence the work presented in this chapter can be helpful for further experimental exploration of the pathogen. The success of the procedure in predicting known interactions as well as novel interactions in a Eukaryotic pathogen suggests that the procedure developed is generally applicable. However it must be pointed out that in many cases of host-pathogen systems, such extensive expression and localization data may not be available, which makes the analysis difficult due to the large number of interactions predicted. One of such difficult cases is the interactions between Mycobacterial species and human host which is described in the next chapter. Chapter 8 describes the prediction of PPI between human and M.tuberculosis as well as three pathogens closely related to M.tuberculosis. Each of the pathogens has seen to re-emerge due to drug resistance and other causes. M.tuberculosis is becoming a global problem due to the limited number of drugs available to treat TB, which is susceptible to resistance. M.leprae has also shown signs of emergence of drug resistance, whereas C.diptheriae another pathogen studied in this chapter is seen as an emerging pathogen in Eastern Europe and in Indian subcontinent. Nocardial infections have also seen a rise due to the prevalence of AIDS which leads to susceptibility to the Nocardia infections. Thus, there is a need to understand further the pathogens in this important family, in order to better direct drug development. An important area for such endeavors is the mapping of the PPI between the pathogens and the human host. The procedure developed as part of the thesis can be used to predict such interactions. The procedure for prediction of interactions is the same as followed in Chapter 6 and involves identifications of homologues for the pathogen and host proteins among the proteins listed in the two template datasets DIP and iPfam using PSI-BLAST and RPS-BLAST (MulPSSM). In addition to the homology to the proteins involved in PPI, information / prediction on subcellular localization is used to assess biological significance of the interaction. An experimentally derived dataset of exported proteins in the M.tuberculosis was used to supplement the predictions from PSORTb database that provides subcellular localization for bacterial proteins. In order to minimize the number of predictions explored manually and to maximize the biological relevance of predicted interactions,, the predictions were made only for proteins present on the membrane of the pathogen or which are exported into the host. Prediction of interactions between human proteins and the proteins of four pathogens studied revealed that, some of the interactions which were known from earlier experiments were “predicted” by the present procedure. For example, the M.leprae exported Serine protease is known to interact with Ras-like proteins in the human host, and this interaction was ‘predicted’. Among other predicted interactions, several novel interactions have been suggested for proteins important for pathogenesis such as the MPT70 protein of M.tuberculosis which has been predicted to interact with TGFβ associated proteins which could play an important role in the pathogenesis of the disease. Some of the human proteins are known to play important role in pathogenesis, especially the toll-like receptors. A C.diphtheriae protein Mycosin, has been predicted to interact with the toll-like receptors raising the possibility that the Mycosins may play an important role in pathogenesis. Several hypothetical proteins of unknown function in the pathogens have been predicted to interact with human proteins. A few of such cases from M.tuberculosis have been described in the thesis and these proteins are predicted to interact with proteins involved in post-transnational modification in the human host. The prediction of novel interactions along with known interactions in four bacterial species thus points to the fact that the procedure can be used for almost any host-pathogen pair. In the next chapter, the application of the method to three other bacterial species belonging to the Enterobacteriaciae family is presented. Chapter 9 describes the analysis performed on the predicted interactions between human and three pathogens in the Enterobact Protein Functions Bioactive Proteins Computational Biochemistry Protein-Protein Interactions Protein Homology Detection - Algorithms Protein Bioinformatics Protein Sequence Homology (Biology) Remote Homology Detection Biochemistry
23	Computational studies of signalling at the cell membrane Lumb, Craig Nicholas January 2012 (has links) In order to associate with the cytoplasmic leaflet of the plasma membrane, many cytosolic signalling proteins possess a distinct lipid binding domain as part of their overall fold. Here, a multiscale simulation approach has been used to investigate three membrane-binding proteins involved in cellular processes such as growth and proliferation. The pleckstrin homology (PH) domain from the general receptor for phosphoinositides 1 (GRP1-PH) binds phosphatidylinositol (3,4,5)-trisphosphate (PI(3,4,5)P₃) with high affinity and specificity. To investigate how this peripheral protein is able to locate its target lipid in the complex membrane environment, Brownian dynamics (BD) simulations were employed to explore association pathways for GRP1-PH binding to PI(3,4,5)P₃ embedded in membranes with different surface charge densities and distributions. The results indicated that non-PI(3,4,5)P₃ lipids can act as decoys to disrupt PI(3,4,5)P₃ binding, but that at approximately physiological anionic lipid concentrations steering towards PI(3,4,5)P₃ is actually enhanced. Atomistic molecular dynamics (MD) simulations revealed substantial membrane penetration of membrane-bound GRP1-PH, evident when non-equilibrium, steered MD simulations were used to forcibly dissociate the protein from the membrane surface. Atomistic and coarse grained (CG) MD simulations of the phosphatase and tensin homologue deleted on chromosome ten (PTEN) tumour suppressor, which also binds PI(3,4,5)P₃, detected numerous non-specific protein-lipid contacts and anionic lipid clustering around PTEN that can be modulated by selective in silico mutagenesis. These results suggested a dual recognition model of membrane binding, with non-specific membrane interactions complementing the protein-ligand interaction. Molecular docking and MD simulations were used to characterise the lipid binding properties of kindlin-1 PH. Simulations demonstrated that a dynamic salt bridge was responsible for controlling the accessibility of the binding site. Electrostatics calculations applied to a variety of PH domains suggested that their molecular dipole moments are typically aligned with their ligand binding sites, which has implications for steering and ligand electrostatic funnelling. 572
24	Computational studies of ligand-water mediated interactions in ionotropic glutamate receptors Sahai, Michelle Asha January 2011 (has links) Careful treatment of water molecules in ligand-protein interactions is required in many cases if the correct binding pose is to be identified for molecular docking. Water can form complex bridging networks and can play a critical role in dictating the binding mode of ligands. A particularly striking example of this can be found in the ionotropic glutamate receptors (iGluRs), a family of ligand gated ion channels that are responsible for a majority of the fast synaptic neurotransmission in the central nervous system that are thought to be essential in memory and learning. Thus, pharmacological intervention at these neuronal receptors is a valuable therapeutic strategy. This thesis relies on various computational studies and X-ray crystallography to investigate the role of ligand-water mediated interactions in iGluRs bound to glutamate and α-amino-3-hydroxy-5-methyl-4- isoxazole-propionic acid (AMPA). Comparative molecular dynamics (MD) simulations of each subtype of iGluRs bound to glutamate revealed that crystal water positions were reproduced and that all but one water molecule, W5, in the binding site can be rearranged or replaced with water molecules from the bulk. Further density functional theory calculations (DFT) have been used to confirm the MD results and characterize the energetics of W5 and another water molecule implicated in influencing the dynamics of a proposed switch in these receptors. Additional comparative studies on the AMPA subtypes of iGluRs show that each step of the calculation must be considered carefully if the results are to be meaningful. Crystal structures of two ligands, glutamate and AMPA revealed two distinct modes of binding when bound to an AMPA subtype of iGluRs, GluA2. The difference is related to the position of water molecules within the binding pocket. DFT calculations investigated the interaction energies and polarisation effects resulting in a prediction of the correct binding mode for glutamate. For AMPA alternative modes of binding have similar interaction energies as a result of a higher internal energy than glutamate. A combined MD and X-ray crystallographic study investigated the binding of the ligand AMPA in the AMPA receptor subtypes. Analysis of the binding pocket show that AMPA is not preserved in the crystal bound mode and can instead adopt an alternative mode of binding. This involves a displacement of a key water molecule followed by AMPA adopting the pose seen by glutamate. Thus, this thesis makes use of various studies to assess the energetics and dynamics of water molecules in iGluRs. The resulting data provides additional information on the importance of water molecules in mediating ligand interactions as well as identifying key water molecules that can be useful in the de novo design of new selective drugs against iGluRs. 571.4
25	K+ channels : gating mechanisms and lipid interactions Schmidt, Matthias Rene January 2013 (has links) Computational methods, including homology modelling, in-silico dockings, and molecular dynamics simulations have been used to study the functional dynamics and interactions of K<sup>+</sup> channels. Molecular models were built of the inwardly rectifying K<sup>+</sup> channel Kir2.2, the bacterial homolog K<sup>+</sup> channel KirBac3.1, and the twin pore (K2P) K<sup>+</sup> channels TREK-1 and TRESK. To investigate the electrostatic energy profile of K<sup>+</sup> permeating through these homology models, continuum electrostatic calculations were performed. The primary mechanism of KirBac3.1 gating is believed to involve an opening at the helix bundle crossing (HBC). However, simulations of Kir channels have not yet revealed opening at the HBC. Here, in simulations of the new KirBac3.1-S129R X-ray crystal structure, in which the HBC was trapped open by the S129R mutation in the inner pore-lining helix (TM2), the HBC was found to exhibit considerable mobility. In a simulation of the new KirBac3.1-S129R-S205L double mutant structure, if the S129R and the S205L mutations were converted back to the wild-type serine, the HBC would close faster than in the simulations of the KirBac3.1-S129R single mutant structure. The double mutant structure KirBac3.1-S129R-S205L therefore likely represents a higher-energy state than the single mutant KirBac3.1-S129R structure, and these simulations indicate a staged pathway of gating in KirBac channels. Molecular modelling and MD simulations of the Kir2.2 channel structure demonstrated that the HBC would tend to open if the C-linker between the transmembrane and cytoplasmic domain was modelled helical. The electrostatic energy barrier for K<sup>+</sup> permeation at the helix bundle crossing was found to be sensitive to subtle structural changes in the C-linker. Charge neutralization or charge reversal of the PIP2-binding residue R186 on the C-linker decreased the electrostatic barrier for K<sup>+</sup> permeation through the HBC, suggesting an electrostatic contribution to the PIP2-dependent gating mechanism. Multi-scale simulations determined the PIP2 binding site in Kir2.2, in good agreement with crystallographic predictions. A TREK-1 homology model was built, based on the TRAAK structure. Two PIP2 binding sites were found in this TREK-1 model, at the C-terminal end, in line with existing functional data, and between transmembrane helices TM2 and TM3. The TM2-TM3 site is in reasonably good agreement with electron density attributed to an acyl tail in a recently deposited TREK-2 structure. 572
26	Large-scale layered systems and synthetic biology : model reduction and decomposition Prescott, Thomas Paul January 2014 (has links) This thesis is concerned with large-scale systems of Ordinary Differential Equations that model Biomolecular Reaction Networks (BRNs) in Systems and Synthetic Biology. It addresses the strategies of model reduction and decomposition used to overcome the challenges posed by the high dimension and stiffness typical of these models. A number of developments of these strategies are identified, and their implementation on various BRN models is demonstrated. The goal of model reduction is to construct a simplified ODE system to closely approximate a large-scale system. The error estimation problem seeks to quantify the approximation error; this is an example of the trajectory comparison problem. The first part of this thesis applies semi-definite programming (SDP) and dissipativity theory to this problem, producing a single a priori upper bound on the difference between two models in the presence of parameter uncertainty and for a range of initial conditions, for which exhaustive simulation is impractical. The second part of this thesis is concerned with the BRN decomposition problem of expressing a network as an interconnection of subnetworks. A novel framework, called layered decomposition, is introduced and compared with established modular techniques. Fundamental properties of layered decompositions are investigated, providing basic criteria for choosing an appropriate layered decomposition. Further aspects of the layering framework are considered: we illustrate the relationship between decomposition and scale separation by constructing singularly perturbed BRN models using layered decomposition; and we reveal the inter-layer signal propagation structure by decomposing the steady state response to parametric perturbations. Finally, we consider the large-scale SDP problem, where large scale SDP techniques fail to certify a system’s dissipativity. We describe the framework of Structured Storage Functions (SSF), defined where systems admit a cascaded decomposition, and demonstrate a significant resulting speed-up of large-scale dissipativity problems, with applications to the trajectory comparison technique discussed above. 572
27	Cell fate mechanisms in colorectal cancer Kay, Sophie Kate January 2014 (has links) Colorectal cancer (CRC) arises in part from the dysregulation of cellular proliferation, associated with the canonical Wnt pathway, and differentiation, effected by the Notch signalling network. In this thesis, we develop a mathematical model of ordinary differential equations (ODEs) for the coupled interaction of the Notch and Wnt pathways in cells of the human intestinal epithelium. Our central aim is to understand the role of such crosstalk in the genesis and treatment of CRC. An embedding of this model in cells of a simulated colonic tissue enables computational exploration of the cell fate response to spatially inhomogeneous growth cues in the healthy intestinal epithelium. We also examine an alternative, rule-based model from the literature, which employs a simple binary approach to pathway activity, in which the Notch and Wnt pathways are constitutively on or off. Comparison of the two models demonstrates the substantial advantages of the equation-based paradigm, through its delivery of stable and robust cell fate patterning, and its versatility for exploring the multiscale consequences of a variety of subcellular phenomena. Extension of the ODE-based model to include mutant cells facilitates the study of Notch-mediated therapeutic approaches to CRC. We find a marked synergy between the application of γ-secretase inhibitors and Hath1 stabilisers in the treatment of early-stage intestinal polyps. This combined treatment is an efficient means of inducing mitotic arrest in the cell population of the intestinal epithelium through enforced conversion to a secretory phenotype and is highlighted as a viable route for further theoretical, experimental and clinical study. 616.99
28	Methods, rules and limits of successful self-assembly Williamson, Alexander James January 2011 (has links) The self-assembly of structured particles into monodisperse clusters is a challenge on the nano-, micro- and even macro-scale. While biological systems are able to self-assemble with comparative ease, many aspects of this self-assembly are not fully understood. In this thesis, we look at the strategies and rules that can be applied to encourage the formation of monodisperse clusters. Though much of the inspiration is biological in nature, the simulations use a simple minimal patchy particle model and are thus applicable to a wide range of systems. The topics that this thesis addresses include: Encapsulation: We show how clusters can be used to encapsulate objects and demonstrate that such `templates' can be used to control the assembly mechanisms and enhance the formation of more complex objects. Hierarchical self-assembly: We investigate the use of hierarchical mechanisms in enhancing the formation of clusters. We find that, while we are able to extend the ranges where we see successful assembly by using a hierarchical assembly pathway, it does not straightforwardly provide a route to enhance the complexity of structures that can be formed. Pore formation: We use our simple model to investigate a particular biological example, namely the self-assembly and formation of heptameric alpha-haemolysin pores, and show that pore insertion is key to rationalising experimental results on this system. Phase re-entrance: We look at the computation of equilibrium phase diagrams for self-assembling systems, particularly focusing on the possible presence of an unusual liquid-vapour phase re-entrance that has been suggested by dynamical simulations, using a variety of techniques. 541.2
29	Stratagems for effective function evaluation in computational chemistry Skone, Gwyn S. January 2010 (has links) In recent years, the potential benefits of high-throughput virtual screening to the drug discovery community have been recognized, bringing an increase in the number of tools developed for this purpose. These programs have to process large quantities of data, searching for an optimal solution in a vast combinatorial range. This is particularly the case for protein-ligand docking, since proteins are sophisticated structures with complicated interactions for which either molecule might reshape itself. Even the very limited flexibility model to be considered here, using ligand conformation ensembles, requires six dimensions of exploration - three translations and three rotations - per rigid conformation. The functions for evaluating pose suitability can also be complex to calculate. Consequently, the programs being written for these biochemical simulations are extremely resource-intensive. This work introduces a pure computer science approach to the field, developing techniques to improve the effectiveness of such tools. Their architecture is generalized to an abstract pattern of nested layers for discussion, covering scoring functions, search methods, and screening overall. Based on this, new stratagems for molecular docking software design are described, including lazy or partial evaluation, geometric analysis, and parallel processing implementation. In addition, a range of novel algorithms are presented for applications such as active site detection with linear complexity (PIES) and small molecule shape description (PASTRY) for pre-alignment of ligands. The various stratagems are assessed individually and in combination, using several modified versions of an existing docking program, to demonstrate their benefit to virtual screening in practical contexts. In particular, the importance of appropriate precision in calculations is highlighted. 502.85
30	Inferring structural properties of protein-DNA binding using high-throughput sequencing : the paradigm of GATA1, KLF1 and their complexes GATA1/FOG1 and GATA1/KLF1 : insights into the transcriptional regulation of the erythroid cell lineage Oikonomopoulos, Spyridon January 2014 (has links) GATA1 and KLF1 are transcription factors that regulate genes which are important for the development of erythroid cells. The GATA1 transcriptional co-factor FOG1 has been shown to be essential in a wide range of GATA1 dependent cellular functions. Here we tried to understand the diverse mechanisms by which GATA1 and KLF1 recognize their binding sites, how the GATA1 recognition mechanisms are affected by complexation with either FOG1 or KLF1 and how the GATA1 recognition mechanisms affect the transcriptional regulation of the erythroid differentiation. We profiled the DNA binding specificities/affinities of a GATA1 fragment (mGATA1NC), that contains only the two DNA binding domains (N-terminal and C-terminal Zn finger), and the DNA binding specificities/affinities of a KLF1 fragment (mKLF1257-358), that contains the three DNA binding domains, using a novel methodology that combines EMSA with high throughput sequencing (EMSA-seq (Wong et al., 2011a)). We also profiled the DNA binding specificities of the C-terminal Zn finger of GATA1 alone (mGATA1C), the wt-mGATA1, the wt-mGATA1/wt-mFOG1 complex and the mGATA1NC/mKLF1257-358 complex. At first, we confirmed that the N-terminal Zn finger of GATA1 has a strong preference for the “GATC” motif, whereas the C-terminal Zn finger of GATA1 has a strong preference for the “GATA” motif. Next, we found that in the mGATA1NC, both DNA binding domains can bind simultaneously a wide range of different positional combinations of the co-occurring “GATA” and “GATC” motifs, on the same DNA sequence. The wt-mGATA1 did not show the ability to bind in the same co-occurring motifs implying an effect of the non-DNA binding domains of the protein in the regulation of its DNA binding specificities. On the contrary, complexation of wt-mGATA1 with the wt-mFOG1 partially restored its ability to bind in a now limited range of different positional combinations of the co-occurring “GATA” and “GATC” motifs, on the same DNA sequence. Similar observations were made for other pairs of GATA1 N-terminal and C-terminal Zn finger specific motifs. We then projected the GATA1 DNA binding specificities/affinities in vivo and we classified the GATA1 ChIP-seq peaks in low, medium or high affinity based on the number of the GATA1 motifs. We noticed that high affinity GATA1 ChIP-seq peaks tend to appear in regions with low nucleosome occupancy. We also noticed that GATA1 ChIP-seq peaks in the enhancer regions are usually high affinity whereas GATA1 ChIP-seq peaks in the proximal promoter regions are usually low affinity. Additionally, we observed that high affinity GATA1 ChIP-seq peaks are usually found in regions with increased levels of H3K4me2 and are associated with a higher decrease in the H3K4me3 levels on the TSS of the nearby genes. None of these GATA1 related in vivo observations were found for the KLF1 ChIP-seq positions. These findings significantly advance our understanding of the DNA binding properties of GATA1, KLF1 and their complexes and give an insight on the importance of the GATA1 DNA binding affinities in the regulation of the erythroid transcriptional program. Overall the work establishes an experimental and analytical framework to investigate how transcriptional co-factors can change the DNA binding specificities of specific transcription factors and how integration of the transcription factor DNA binding affinities with in vivo data can give novel insights into the transcriptional regulation. 572.8

Search results