Spelling suggestions: "subject:"1protein docking"" "subject:"1protein cocking""
11 |
Robust learning to rank models and their biomedical applicationsSotudian, Shahabeddin 24 May 2023 (has links)
There exist many real-world applications such as recommendation systems, document retrieval, and computational biology where the correct ordering of instances is of equal or greater importance than predicting the exact value of some discrete or continuous outcome. Learning-to-Rank (LTR) refers to a group of algorithms that apply machine learning techniques to tackle these ranking problems. Despite their empirical success, most existing LTR models are not built to be robust to errors in labeling or annotation, distributional data shift, or adversarial data perturbations. To fill this gap, we develop four LTR frameworks that are robust to various types of perturbations. First, Pairwise Elastic Net Regression Ranking (PENRR) is an
elastic-net-based regression method for drug sensitivity prediction. PENRR infers robust predictors of drug responses from patient genomic information. The special design of this model (comparing each drug with other drugs in the same cell line and comparing that drug with itself in other cell lines) significantly enhances the accuracy of the drug prediction model under limited data. This approach is also able to solve the problem of fitting on the insensitive drugs that is commonly encountered in regression-based models. Second, Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) is a ridge-regression-based method for ranking clusters of similar protein complex conformations generated by an underlying docking program (i.e., ClusPro). Rather than using regression to predict scores, which would equally penalize deviations for either low-quality and high-quality clusters, we seek to predict the difference of scores for any pair of clusters corresponding to the same complex. RRPCC combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show. improvement by 24%–100% in ranking acceptable or better quality clusters first, and by 15%–100% in ranking medium or better quality clusters first. Third, Distributionally Robust Multi-Output Regression Ranking (DRMRR) is a listwise LTR model that induces robustness into LTR problems using the Distributionally Robust Optimization framework. Contrasting to existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. DRMRR employs ranking metrics (i.e., NDCG) in its output. Particularly, we used the notion of position deviation to define a vector of relevance score instead of a scalar one. We then adopted the DRO framework to minimize a worst-case expected multi-output loss function over a probabilistic ambiguity set that is defined by the Wasserstein metric. We also presented an equivalent convex reformulation of the DRO problem, which is shown to be tighter than the ones proposed by the previous studies. Fourth, Inversion Transformer-based Neural Ranking (ITNR) is a Transformer-based model to predict drug responses using RNAseq gene expression profiles, drug descriptors, and drug fingerprints. It utilizes a Context-Aware-Transformer architecture as its scoring function that ensures the modeling of inter-item dependencies. We also introduced a new loss function using the concept of Inversion and approximate permutation matrices. The accuracy and robustness of these LTR models are verified through three medical applications, namely cluster ranking in protein-protein docking, medical document retrieval, and drug response prediction.
|
12 |
Structural and Functional Aspects of Evolutionarily Conserved Signature Indels in Protein Sequences.Khadka, Bijendra January 2019 (has links)
Analysis of genome sequences is enabling identification of numerous novel characteristics that provide valuable means for genetic and biochemical studies. Of these characteristics, Conserved Signature Indels (CSIs) in proteins which are specific for a given group of organisms have proven particularly useful for evolutionary and biochemical studies. My research work focused on using comparative genomics techniques to identify a large number of CSIs which are distinctive characteristics of fungi and other important groups of organisms. These CSIs were utilized to understand the evolutionary relationships among different proteins (species), and also regarding their structural features and functional significance. Based on multiple CSIs that I have identified for the PIP4K/PIP5K family of proteins, different isozymes of these proteins and also their subfamilies can now be reliably distinguished in molecular terms. Further, the species distribution of CSIs in the PIP4K/PIP5K proteins and phylogenetic analyses of these protein sequences, my work provides important insights into the evolutionary history of this protein family. The functional significance of one of the CSI in the PIP5K proteins, specific for the Saccharomycetaceae family of fungi, was also investigated. The results from structural analysis and molecular dynamics (MD) simulation studies show that this 8 aa CSI plays an important role in facilitating the binding of fungal PIP5K protein to the membrane surface. In other work, we identified multiple highly-specific CSIs in the phosphoketolase (PK) proteins, which clearly distinguish the bifunctional form of PK found in bifidobacteria from its homologs (monofunctional) found in other organisms. Structural analyses and docking studies with these proteins indicate that the CSIs in bifidobacterial PK, which are located on the subunit interface, play a role in the formation/stabilization of the protein dimer. We have also identified 2 large CSIs in SecA proteins that are uniquely found in thermophilic species from two different phyla of bacteria. Detailed bioinformatics analyses on one of these CSIs show that a number of residues from this CSI, through their interaction with a conserved network of water molecules, play a role in stabilizing the binding of ADP/ATP to the SecA protein at high temperature. My work also involved developing an integrated software pipeline for homology modeling of proteins and analyzing the location of CSIs in protein structures. Overall, my thesis work establishes the usefulness of CSIs in protein sequences as valuable means for genetic, biochemical, structural and evolutionary studies. / Dissertation / Doctor of Philosophy (PhD)
|
13 |
Improving protein docking with binding site predictionHuang, Bingding 17 July 2008 (has links) (PDF)
Protein-protein and protein-ligand interactions are fundamental as many proteins mediate their biological function through these interactions. Many important applications follow directly from the identification of residues in the interfaces between protein-protein and protein-ligand interactions, such as drug design, protein mimetic engineering, elucidation of molecular pathways, and understanding of disease mechanisms. The identification of interface residues can also guide the docking process to build the structural model of protein-protein complexes. This dissertation focuses on developing computational approaches for protein-ligand and protein-protein binding site prediction and applying these predictions to improve protein-protein docking. First, we develop an automated approach LIGSITEcs to predict protein-ligand binding site, based on the notion of surface-solvent-surface events and the degree of conservation of the involved surface residues. We compare our algorithm to four other approaches, LIGSITE, CAST, PASS, and SURFNET, and evaluate all on a dataset of 48 unbound/bound structures and 210 bound-structures. LIGSITEcs performs slightly better than the other tools and achieves a success rate of 71% and 75%, respectively. Second, for protein-protein binding site, we develop metaPPI, a meta server for interface prediction. MetaPPI combines results from a number of tools, such as PPI_Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. After refinement, metaPPI significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. Third, for protein-protein docking, we develop a FFT-based docking algorithm and system BDOCK, which includes specific scoring functions for specific types of complexes. BDOCK uses family-based residue interface propensities as a scoring function and obtains improvement factors of 4-30 for enzyme-inhibitor and 4-11 for antibody-antigen complexes in two specific SCOP families. Furthermore, the degrees of buriedness of surface residues are integrated into BDOCK, which improves the shape discriminator for enzyme-inhibitor complexes. The predicted interfaces from metaPPI are integrated as well, either during docking or after docking. The evaluation results show that reliable interface predictions improve the discrimination between near-native solutions and false positive. Finally, we propose an implicit method to deal with the flexibility of proteins by softening the surface, to improve docking for non enzyme-inhibitor complexes.
|
14 |
Scoring functions for protein docking and drug designViswanath, Shruthi 26 June 2014 (has links)
Predicting the structure of complexes formed by two interacting proteins is an important problem in computation structural biology. Proteins perform many of their functions by binding to other proteins. The structure of protein-protein complexes provides atomic details about protein function and biochemical pathways, and can help in designing drugs that inhibit binding. Docking computationally models the structure of protein-protein complexes, given three-dimensional structures of the individual chains. Protein docking methods have two phases. In the first phase, a comprehensive, coarse search is performed for optimally docked models. In the second refinement and reranking phase, the models from the first phase are refined and reranked, with the expectation of extracting a small set of accurate models from the pool of thousands of models obtained from the first phase. In this thesis, new algorithms are developed for the refinement and reranking phase of docking. New scoring functions, or potentials, that rank models are developed. These potentials are learnt using large-scale machine learning methods based on mathematical programming. The procedure for learning these potentials involves examining hundreds of thousands of correct and incorrect models. In this thesis, hierarchical constraints were introduced into the learning algorithm. First, an atomic potential was developed using this learning procedure. A refinement procedure involving side-chain remodeling and conjugate gradient-based minimization was introduced. The refinement procedure combined with the atomic potential was shown to improve docking accuracy significantly. Second, a hydrogen bond potential, was developed. Molecular dynamics-based sampling combined with the hydrogen bond potential improved docking predictions. Third, mathematical programming compared favorably to SVMs and neural networks in terms of accuracy, training and test time for the task of designing potentials to rank docking models. The methods described in this thesis are implemented in the docking package DOCK/PIERR. DOCK/PIERR was shown to be among the best automated docking methods in community wide assessments. Finally, DOCK/PIERR was extended to predict membrane protein complexes. A membrane-based score was added to the reranking phase, and shown to improve the accuracy of docking. This docking algorithm for membrane proteins was used to study the dimers of amyloid precursor protein, implicated in Alzheimer's disease.R. DOCK/PIERR was shown to be among the best automated docking methods in community wide assessments. Finally, DOCK/PIERR was extended to predict membrane protein complexes. A membrane-based score was added to the reranking phase, and shown to improve the accuracy of docking. This docking algorithm for membrane proteins was used to study the dimers of amyloid precursor protein, implicated in Alzheimer’s disease. / text
|
15 |
Hydropathic Interactions and Protein Structure: Utilizing the HINT Force Field in Structure Prediction and Protein‐Protein Docking.Ahmed, Mostafa H. 01 January 2014 (has links)
Protein structure predication is a field of computational molecular modeling with an enormous potential for improvement. Side-chain geometry prediction is a critical component of this process that is crucial for computational protein structure predication as well as crystallographers in refining experimentally determined protein crystal structures. The cornerstone of side-chain geometry prediction are side-chain rotamer libraries, usually obtained through exhaustive statistical analysis of existing protein structures. Little is known, however, about the driving forces leading to the preference or suitability of one rotamer over another. Construction of 3D hydropathic interaction maps for nearly 30,000 tyrosines extracted from the PDB reveals their environments, in terms of hydrophobic and polar (collectively “hydropathic”) interactions. Using a unique 3D similarity metric, these environments were clustered with k-means. In the ϕ, ψ region (–200° < ϕ < –155°; –205° < ψ < –160°) representing 631 tyrosines, clustering reduced the set to 14 unique hydropathic environments, with most diversity arising from favorable hydrophobic interactions. Polar interactions for tyrosine include ubiquitous hydrogen bonding with the phenolic OH and a handful of unique environments surrounding the backbone. The memberships of all but one of the 14 environments are dominated by a single χ1/χ2 rotamer. Each tyrosine residue attempts to fulfill its hydropathic valence. Structural water molecules are thus used in a variety of roles throughout protein structure. A second project involves elucidating the 3D structure of CRIP1a, a cannabinoid 1 receptor (CB1R) binding protein that could provide information for designing small molecules targeting the CRIP1a-CB1R interaction. The CRIP1a protein was produced in high purity. Crystallization experiments failed, both with and without the last 9 or 12 amino acid peptide of the CB1R C-terminus. Attempts were made to use NMR for structure determination; however, the protein precipitated out during data acquisition. A model was thus built computationally to which the CB1R C-terminus peptide was docked. HINT was used in selecting optimum models and analyzing interactions involved in the CRIP1a-CB1R complex. The final model demonstrated key putative interactions between CRIP1a and CB1R while also predicting highly flexible areas of the CRIP1a possibly contributing to the difficulties faced during crystallization.
|
16 |
Gβγ acts at an inter-subunit cleft to activate GIRK1 channelsMahajan, Rahul 09 October 2012 (has links)
Heterotrimeric guanine nucleotide-binding proteins (G-proteins) consist of an alpha subunit (Gα) and the dimeric beta-gamma subunit (Gβγ). The first example of direct cell signaling by Gβγ was the discovery of its role in activating G-protein regulated inwardly rectifying K+ (GIRK) channels which underlie the acetylcholine-induced K+ current responsible for vagal inhibition of heart rate. Published crystal structures have provided important insights into the structures of the G-protein subunits and GIRK channels separately, but co-crystals of the channel and Gβγ together remain elusive and no specific reciprocal residue interactions between the two proteins are currently known. Given the absence of direct structural evidence, we attempted to identify these functionally important channel-Gβγ interactions using a computational approach. We developed a multistage computational docking algorithm that combines several known methods in protein-protein docking. Application of the docking protocol to previously published structures of Gβγ and GIRK1 homomeric channels produced a clear signal of a favored binding mode. Analysis of this binding mode suggested a mechanism by which Gβγ promotes the open state of the channel. The channel-Gβγ interactions predicted by the model in silico could be disrupted in vitro by mutation of one protein and rescued by additional mutation of reciprocal residues in the other protein. These interactions were found to extend to agonist induced activation of the channels as well as to activation of the native heteromeric channels. Currently, the structural mechanism by which Gβγ regulates the functional conformations of GIRK channels or of any of its membrane-associated effector proteins is not known. This work shows the first evidence for specific reciprocal interactions between Gβγ and a GIRK channel and places these interactions in the context of a general model of intracellular regulation of GIRK gating.
|
17 |
An Isometry-Invariant Spectral Approach for Macro-Molecular DockingDe Youngster, Dela 26 November 2013 (has links)
Proteins and the formation of large protein complexes are essential parts of living organisms. Proteins are present in all aspects of life processes, performing a multitude of various functions ranging from being structural components of cells, to facilitating the passage of certain molecules between various regions of cells. The 'protein docking problem' refers to the computational method of predicting the appropriate matching pair of a protein (receptor) with respect to another protein (ligand), when attempting to bind to one another to form a stable complex.
Research shows that matching the three-dimensional (3D) geometric structures of candidate proteins plays a key role in determining a so-called docking pair, which is one of the key aspects of the Computer Aided Drug Design process. However, the active sites which are responsible for binding do not always present a rigid-body shape matching problem. Rather, they may undergo sufficient deformation when docking occurs, which complicates the problem of finding a match.
To address this issue, we present an isometry-invariant and topologically robust partial shape matching method for finding complementary protein binding sites, which we call the ProtoDock algorithm. The ProtoDock algorithm comes in two variations. The first version performs a partial shape complementarity matching by initially segmenting the underlying protein object mesh into smaller portions using a spectral mesh segmentation approach. The Heat Kernel Signature (HKS), the underlying basis of our shape descriptor, is subsequently computed for the obtained segments. A final descriptor vector is constructed from the Heat Kernel Signatures and used as the basis for the segment matching. The three different descriptor methods employed are, the accepted Bag of Features (BoF) technique, and our two novel approaches, Closest Medoid Set (CMS) and Medoid Set Average (MSA).
The second variation of our ProtoDock algorithm aims to perform the partial matching by utilizing the pointwise HKS descriptors. The use of the pointwise HKS is mainly motivated by the suggestion that, at adequate times, the Heat Kernel Signature of a point on a surface sufficiently describes its neighbourhood. Hence, the HKS of a point may serve as the representative descriptor of its given region of which it forms a part. We propose three (3) sampling methods---Uniform, Random, and Segment-based Random sampling---for selecting these points for the partial matching. Random and Segment-based Random sampling both prove superior to the Uniform sampling method.
Our experimental results, run against the Protein-Protein Benchmark 4.0, demonstrate the viability of our approach, in that, it successfully returns known binding segments for known pairing proteins. Furthermore, our ProtoDock-1 algorithm still still yields good results for low resolution protein meshes. This results in even faster processing and matching times with sufficiently reduced computational requirements when obtaining the HKS.
|
18 |
Computer-aided design of novel antithrombotic agents / Conception des nouveaux agents anti-thrombiques assistée par ordinateurKhristova, Tetiana 15 November 2013 (has links)
La thrombose est le plus important processus pathologique sous-jacent à de nombreuses maladies cardiovasculaires, qui sont responsables d’une mortalité élevée dans le monde entier. Dans cette thèse, la conception assistée par ordinateur de nouveaux agents antithrombotiques capables d’inhiber deux types de récepteurs situés à la surface des plaquettes a été appliquée. Le premier - αIIbβ3 - est responsable de l’interaction des plaquettes activées avec le fibrinogène pour former des caillots, tandis que le second – le thromboxane A2 – est responsable de l’activation des plaquettes par l’un des agonistes excrétés par les plaquettes activées. Afin d’atteindre cet objectif, différents types de modèles ont été développés en utilisant les informations expérimentales disponibles et la structure des complexes protéine-ligand, comprenant des modèles QSAR, des pharmacophores 3D basés sur la structure de la protéine ou du ligand, des pharmacophores 2D, des modèles basés sur la forme et sur le champ moléculaire. L’ensemble des modèles développés ont été utilisés en criblage virtuel. Cette étude a abouti sur la suggestion de nouveaux antagonistes potentiels des récepteurs αIIbβ3 et thromboxane A2. Les antagonistes de αIIbβ3 suggérés pouvant se lier soit à la forme ouverte soit à la forme fermée du récepteur ont été synthétisés et testés expérimentalement. L’expérience montre qu’ils font preuve d’une forte activité; de plus, certains des composés conçus théoriquement sont plus efficaces que le Tirofiban, qui est un médicament commercialisé. Les antagonistes recommandés du récepteur thromboxane A2 ont déjà été synthétisés mais les tests biologiques n’ont pas encore été complétés. / Thrombosis is the most important pathological process underlying many cardiovascular diseases, which are responsible for high mortality worldwide. In this theses the computer-aided design of new anti-thrombotic agents able to inhibit two types of receptors located on the surface of the platelets has been applied. The first one - αIIbβ3 - is responsible for the interaction of activated platelets with fibrinogen to form clots, whereas the second one - thromboxane A2 - is responsible for platelet activation by one of agonists excreted by activated platelets. To achieve this, different types of models have been developed using experimentally available information and structure of protein-ligand complexes. This concerns: QSAR models, structure-based and ligand-based 3D pharmacophore models, 2D pharmacophore models, shape-based and molecular field-based models. The ensemble of the developed models were used in virtual screening. This study resulted in suggestion of new potential antagonists of αIIbβ3 and thromboxane A2 receptors. Suggested antagonists of αIIbβ3 able to bind either open or closed form of the receptor have been synthesized and tested experimentally. Experiments show that they display high activity; moreover some of theoretically designed compounds are more efficient than Tirofiban – the commercialized drug molecule. The recommended antagonists of thromboxane A2 receptor have been already synthesized but biological tests have not been completed yet.
|
19 |
An Isometry-Invariant Spectral Approach for Macro-Molecular DockingDe Youngster, Dela January 2013 (has links)
Proteins and the formation of large protein complexes are essential parts of living organisms. Proteins are present in all aspects of life processes, performing a multitude of various functions ranging from being structural components of cells, to facilitating the passage of certain molecules between various regions of cells. The 'protein docking problem' refers to the computational method of predicting the appropriate matching pair of a protein (receptor) with respect to another protein (ligand), when attempting to bind to one another to form a stable complex.
Research shows that matching the three-dimensional (3D) geometric structures of candidate proteins plays a key role in determining a so-called docking pair, which is one of the key aspects of the Computer Aided Drug Design process. However, the active sites which are responsible for binding do not always present a rigid-body shape matching problem. Rather, they may undergo sufficient deformation when docking occurs, which complicates the problem of finding a match.
To address this issue, we present an isometry-invariant and topologically robust partial shape matching method for finding complementary protein binding sites, which we call the ProtoDock algorithm. The ProtoDock algorithm comes in two variations. The first version performs a partial shape complementarity matching by initially segmenting the underlying protein object mesh into smaller portions using a spectral mesh segmentation approach. The Heat Kernel Signature (HKS), the underlying basis of our shape descriptor, is subsequently computed for the obtained segments. A final descriptor vector is constructed from the Heat Kernel Signatures and used as the basis for the segment matching. The three different descriptor methods employed are, the accepted Bag of Features (BoF) technique, and our two novel approaches, Closest Medoid Set (CMS) and Medoid Set Average (MSA).
The second variation of our ProtoDock algorithm aims to perform the partial matching by utilizing the pointwise HKS descriptors. The use of the pointwise HKS is mainly motivated by the suggestion that, at adequate times, the Heat Kernel Signature of a point on a surface sufficiently describes its neighbourhood. Hence, the HKS of a point may serve as the representative descriptor of its given region of which it forms a part. We propose three (3) sampling methods---Uniform, Random, and Segment-based Random sampling---for selecting these points for the partial matching. Random and Segment-based Random sampling both prove superior to the Uniform sampling method.
Our experimental results, run against the Protein-Protein Benchmark 4.0, demonstrate the viability of our approach, in that, it successfully returns known binding segments for known pairing proteins. Furthermore, our ProtoDock-1 algorithm still still yields good results for low resolution protein meshes. This results in even faster processing and matching times with sufficiently reduced computational requirements when obtaining the HKS.
|
20 |
Improving protein docking with binding site predictionHuang, Bingding 10 July 2008 (has links)
Protein-protein and protein-ligand interactions are fundamental as many proteins mediate their biological function through these interactions. Many important applications follow directly from the identification of residues in the interfaces between protein-protein and protein-ligand interactions, such as drug design, protein mimetic engineering, elucidation of molecular pathways, and understanding of disease mechanisms. The identification of interface residues can also guide the docking process to build the structural model of protein-protein complexes. This dissertation focuses on developing computational approaches for protein-ligand and protein-protein binding site prediction and applying these predictions to improve protein-protein docking. First, we develop an automated approach LIGSITEcs to predict protein-ligand binding site, based on the notion of surface-solvent-surface events and the degree of conservation of the involved surface residues. We compare our algorithm to four other approaches, LIGSITE, CAST, PASS, and SURFNET, and evaluate all on a dataset of 48 unbound/bound structures and 210 bound-structures. LIGSITEcs performs slightly better than the other tools and achieves a success rate of 71% and 75%, respectively. Second, for protein-protein binding site, we develop metaPPI, a meta server for interface prediction. MetaPPI combines results from a number of tools, such as PPI_Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. After refinement, metaPPI significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. Third, for protein-protein docking, we develop a FFT-based docking algorithm and system BDOCK, which includes specific scoring functions for specific types of complexes. BDOCK uses family-based residue interface propensities as a scoring function and obtains improvement factors of 4-30 for enzyme-inhibitor and 4-11 for antibody-antigen complexes in two specific SCOP families. Furthermore, the degrees of buriedness of surface residues are integrated into BDOCK, which improves the shape discriminator for enzyme-inhibitor complexes. The predicted interfaces from metaPPI are integrated as well, either during docking or after docking. The evaluation results show that reliable interface predictions improve the discrimination between near-native solutions and false positive. Finally, we propose an implicit method to deal with the flexibility of proteins by softening the surface, to improve docking for non enzyme-inhibitor complexes.
|
Page generated in 0.0731 seconds