Global ETD Search

1	Recognizing Table Formatting From Text Files Rajendran, Venkatprabhu 11 December 2006 (has links) No description available. Computer Science Pattern Matching Techniques Entities Templates Evaluation Scoring Function
2	A Combined Motif Discovery Method Lu, Daming 06 August 2009 (has links) A central problem in the bioinformatics is to find the binding sites for regulatory motifs. This is a challenging problem that leads us to a platform to apply a variety of data mining methods. In the efforts described here, a combined motif discovery method that uses mutual information and Gibbs sampling was developed. A new scoring schema was introduced with mutual information and joint information content involved. Simulated tempering was embedded into classic Gibbs sampling to avoid local optima. This method was applied to the 18 pieces DNA sequences containing CRP binding sites validated by Stormo and the results were compared with Bioprospector. Based on the results, the new scoring schema can get over the defect that the basic model PWM only contains single positioin information. Simulated tempering proved to be an adaptive adjustment of the search strategy and showed a much increased resistance to local optima. Transcription Factor Binding Site Gibbs Sampling Mutual Information Information Content Simulated Tempering Scoring Function
3	Improving rapid affinity calculations for drug-protein interactions Ross, Gregory A. January 2013 (has links) The rationalisation of drug potency using three-dimensional structures of protein-ligand complexes is a central paradigm in medicinal research. For over two decades, a major goal has been to find the rules that accurately relate the structure of any protein-ligand complex to its affinity. Addressing this problem is of great concern to the pharmaceutical industry, which uses virtual screens to computationally assay up to many millions of compounds against a protein target. A fast and trustworthy affinity estimator could potentially streamline the drug discovery process, reducing reliance on expensive wet lab experiments, speeding up the discovery of new hits and aiding lead optimization. Water plays a critical role in drug-protein interactions. To address the often ambiguous nature of water in binding sites, a water placement method was developed and found to be in good agreement with X-ray crystallography, neutron diffraction data and molecular dynamics simulations. The method is fast and has facilitated a large scale study of the statistics of water in ligand binding sites, as well as the creation of models pertaining to water binding free energies and displacement propensities, which are of particular interest to medicinal chemistry. Structure-based scoring functions employing the explicit water models were developed. Surprisingly, these attempts were no more accurate than the current state of the art, and the models suffered from the same inadequacies which have plagued all previous scoring functions. This suggests a unifying cause behind scoring function inaccuracy. Accordingly, mathematical analyses on the fundamental uncertainties in structure-based modelling were conducted. Using statistical learning theory and information theory, the existence of inherent errors in empirical scoring functions was proven. Among other results, it was found that even the very best generalised structure-based model is significantly limited in its accuracy, and protein-specific models are always likely to be better. The theoretical framework developed herein hints at modelling strategies that operate at the leading edge of achievable accuracy. 615.1
4	Functional Norm Regularization for Margin-Based Ranking on Temporal Data Stojkovic, Ivan January 2018 (has links) Quantifying the properties of interest is an important problem in many domains, e.g., assessing the condition of a patient, estimating the risk of an investment or relevance of the search result. However, the properties of interest are often latent and hard to assess directly, making it difficult to obtain classification or regression labels, which are needed to learn a predictive models from observable features. In such cases, it is typically much easier to obtain relative comparison of two instances, i.e. to assess which one is more intense (with respect to the property of interest). One framework able to learn from such kind of supervised information is ranking SVM, and it will make a basis of our approach. Applications in bio-medical datasets typically have specific additional challenges. First, and the major one, is the limited amount of data examples, due to an expensive measuring technology, and/or infrequency of conditions of interest. Such limited number of examples makes both identification of patterns/models and their validation less useful and reliable. Repeated samples from the same subject are collected on multiple occasions over time, which breaks IID sample assumption and introduces dependency structure that needs to be taken into account more appropriately. Also, feature vectors are highdimensional, and typically of much higher cardinality than the number of samples, making models less useful and their learning less efficient. Hypothesis of this dissertation is that use of the functional norm regularization can help alleviating mentioned challenges, by improving generalization abilities and/or learning efficiency of predictive models, in this case specifically of the approaches based on the ranking SVM framework. The temporal nature of data was addressed with loss that fosters temporal smoothness of functional mapping, thus accounting for assumption that temporally proximate samples are more correlated. Large number of feature variables was handled using the sparsity inducing L1 norm, such that most of the features have zero effect in learned functional mapping. Proposed sparse (temporal) ranking objective is convex but non-differentiable, therefore smooth dual form is derived, taking the form of quadratic function with box constraints, which allows efficient optimization. For the case where there are multiple similar tasks, joint learning approach based on matrix norm regularization, using trace norm L* and sparse row L21 norm was also proposed. Alternate minimization with proximal optimization algorithm was developed to solve the mentioned multi-task objective. Generalization potentials of the proposed high-dimensional and multi-task ranking formulations were assessed in series of evaluations on synthetically generated and real datasets. The high-dimensional approach was applied to disease severity score learning from gene expression data in human influenza cases, and compared against several alternative approaches. Application resulted in scoring function with improved predictive performance, as measured by fraction of correctly ordered testing pairs, and a set of selected features of high robustness, according to three similarity measures. The multi-task approach was applied to three human viral infection problems, and for learning the exam scores in Math and English. Proposed formulation with mixed matrix norm was overall more accurate than formulations with single norm regularization. / Computer and Information Science Computer Science Artificial Intelligence Information Science Functional Norm Regularization Optimization Proximal Algorithms Scoring Function Learning Svm Ranking Temporal Data
5	Ranking ligands in structure-based virtual screening using siamese neural networks Santos, Alan Diego dos 29 March 2017 (has links) Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2017-11-21T17:02:34Z No. of bitstreams: 1 Alan_Diego_dos_Santos_dis.pdf: 1881856 bytes, checksum: cf0113b0b67e0771e4b2920440d41e2b (MD5) / Rejected by Caroline Xavier (caroline.xavier@pucrs.br), reason: Devolvido devido ? falta da folha de rosto (p?gina com as principais informa??es) no arquivo PDF, passando direto da capa para a ficha catalogr?fica. on 2017-11-29T19:03:08Z (GMT) / Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2017-11-30T15:50:58Z No. of bitstreams: 1 Alan_Diego_dos_Santos_dis.pdf: 1884320 bytes, checksum: 6e508a972289e66527fd4b76cbae3586 (MD5) / Approved for entry into archive by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-12-04T16:14:52Z (GMT) No. of bitstreams: 1 Alan_Diego_dos_Santos_dis.pdf: 1884320 bytes, checksum: 6e508a972289e66527fd4b76cbae3586 (MD5) / Made available in DSpace on 2017-12-04T16:18:35Z (GMT). No. of bitstreams: 1 Alan_Diego_dos_Santos_dis.pdf: 1884320 bytes, checksum: 6e508a972289e66527fd4b76cbae3586 (MD5) Previous issue date: 2017-03-29 / Triagem virtual de bancos de dados de ligantes ? amplamente utilizada nos est?gios iniciais do processo de descoberta de f?rmacos. Abordagens computacionais ?docam? uma pequena mol?cula dentro do s?tio ativo de um estrutura biol?gica alvo e avaliam a afinidade das intera??es entre a mol?cula e a estrutura. Todavia, os custos envolvidos ao aplicar algoritmos de docagem molecular em grandes bancos de ligantes s?o proibitivos, dado a quantidade de recursos computacionais necess?rios para essa execu??o. Nesse contexto, estrat?gias de aprendizagem de m?quina podem ser aplicadas para ranquear ligantes baseadas na afinidade com determinada estrutura biol?gica e, dessa forma, reduzir o n?mero de compostos qu?micos a serem testados. Nesse trabalho, propomos um modelo para ranquear ligantes baseados na arquitetura de redes neurais siamesas. Esse modelo calcula a compatibilidade entre receptor e ligante usando grades de propriedades bioqu?micas. N?s tamb?m mostramos que esse modelo pode aprender a identificar intera??es moleculares importantes entre ligante e receptor. A compatibilidade ? calculada baseada em rela??o ? conforma??o do ligante, independente de sua posi??o e orienta??o em rela??o ao receptor. O modelo proposto foi treinado usando ligantes ativos previamente conhecidos e mol?culas chamarizes (decoys) em um modelo de receptor totalmente flex?vel (Fully Flexible Receptor - FFR) do complexo InhA-NADH da Mycobacterium tuberculosis, encontrando ?timos resultados. / Structure-based virtual screening (SBVS) on compounds databases has been widely applied in early stage of the drug discovery on drug target with known 3D structure. In SBVS, computational approaches usually ?dock? small molecules into binding site of drug target and ?score? their binding affinity. However, the costs involved in applying docking algorithms into huge compounds databases are prohibitive, due to the computational resources required by this operation. In this context,different types of machine learning strategies can be applied to rank ligands, based on binding affinity,and to reduce the number of compounds to be tested. In this work, we propose a deep learning energy-based model using siamese neural networks to rank ligands. This model takes as inputs grids of biochemical properties of ligands and receptors and calculates their compatibility. We show that the model can learn to identify important biochemical interactions between ligands and receptors. Besides, we demonstrate that the compatibility score is computed based only on conformation of small molecule, independent of its position and orientation in relation to the receptor. The proposed model was trained using known ligands and decoys in a Fully Flexible Receptor model of InhA-NADH complex (PDB ID: 1ENY), having achieved outstanding results. Triagem Virtual Redes Neurais Siameses Fun??es de Escore Docagem Molecular Virtual Screening Siamese Neural Network Scoring Function Molecular Docking
6	Recherche d'inhibiteurs de l'interaction Lutheran-Laminine par des techniques de modélisation et de simulation moléculaires / Investigation of Lutheran-Laminin Interaction Inhibitors Using Molecular Modeling and Simulation Techniques Madeleine, Noelly 28 September 2017 (has links) La drépanocytose est une maladie génétique qui se caractérise par des globules rouges en forme de faucille. Chez les personnes atteintes de drépanocytose, ces globules rouges (GR) adhèrent à l’endothélium vasculaire et provoquent ainsi une vaso-occlusion. Ce phénomène s’explique par la surexpression de la protéine Lutheran (Lu) à la surface des globules rouges falciformes qui se lie fortement à la Laminine (Ln) 511/521 exprimée par l’endothélium vasculaire enflammé. Le but de cette étude est d’identifier un inhibiteur d’interaction protéine-protéine (PPI) qui possède une forte probabilité de liaison à Lu afin d’inhiber l’interaction Lu-Ln 511/521. Un criblage virtuel de 1 295 678 composés ciblant la protéine Lu a été réalisé. La validation préalable d’un protocole de scoring a été envisagée sur la protéine CD80 qui présente un site de liaison avec des caractéristiquestopologiques et physico-chimiques similaires au site de liaison prédit sur Lu ainsi que plusieurs ligands avec des constantes d’affinité connues. Ce protocole contient différentes étapes de sélection basées sur les affinités calculées (scores), des simulations de dynamique moléculaire et les propriétés moléculaires. Un protocole de scoring fiable a été validé sur CD80 avec le programme de docking DOCK6 et les fonctions de scoring XSCORE et MM-PBSA ainsi qu’avec la méthode decalcul FMO. L’application de ce protocole sur Lu a permis d’obtenir deux ligands validés par des tests in vitro qui font l’objet d’un dépôt de brevet. La fonction de scoring XSCORE a permis d’identifier neuf autres ligands qui semblent aussi être des candidats prometteurs pour inhiber l’interaction Lu-Ln 511/521. / Drepanocytosis is a genetic blood disorder characterized by red blood cells that assume an abnormal sickle shape. In the pathogenesis of vaso-occlusive crises of sickle cell disease, red blood cells bind to the vascular endothelium and promote vaso-occlusion. At the surface of these sickle red blood cells, the overexpressed protein Lutheran (Lu) strongly interacts with the Laminin (Ln) 511/521.The aim of this study was to identify a protein-protein interaction (PPI) inhibitor with a highprobability of binding to Lu for the inhibition of the Lu-Ln 511/521 interaction. A virtual screening was performed with 1 295 678 compounds that target Lu. Prior validation of a robust scoring protocol was considered on the protein CD80 because this protein has a binding site with similar topological and physico-chemical characteristics and it also has a series of ligands with known affinity constants. This protocol consisted of multiple filtering steps based on calculated affinities (scores), molecular dynamics simulations and molecular properties. A robust scoring protocol was validated on the protein CD80 with the docking program DOCK6 and the scoring functions XSCORE and MM-PBSA and also with the FMO method. This protocol was applied to the protein Lu and we found two compounds that were validated by in vitro studies. The protection of these ligands by a patent is under process. Nine other compounds were identified by the scoring functionXSCORE and seem to be promising candidates for inhibiting the Lu-Ln 511/521 interaction. Drépanocytose Protéine Lutheran Interaction protéine-Protéine Docking moléculaire Simulations de dynamique moléculaire Fonction de scoring Drepanocytosis Lutheran protéin Protein-Protein interaction Molecular docking Molecular dynamics simulations Scoring function
7	Text-Based Information Retrieval Using Relevance Feedback Krishnan, Sharenya January 2011 (has links) Europeana, a freely accessible digital library with an idea to make Europe's cultural and scientific heritage available to the public was founded by the European Commission in 2008. The goal was to deliver a semantically enriched digital content with multilingual access to it. Even though they managed to increase the content of data they slowly faced the problem of retrieving information in an unstructured form. So to complement the Europeana portal services, ASSETS (Advanced Search Service and Enhanced Technological Solutions) was introduced with services that sought to improve the usability and accessibility of Europeana. My contribution is to study different text-based information retrieval models, their relevance feedback techniques and to implement one simple model. The thesis explains a detailed overview of the information retrieval process along with the implementation of the chosen strategy for relevance feedback that generates automatic query expansion. Finally, the thesis concludes with the analysis made using relevance feedback, discussion on the model implemented and then an assessment on future use of this model both as a continuation of my work and using this model in ASSETS. Information Retrieval Relevance Feedback Query Expansion Rocchio classification Probabilistic model Lucene Similarity scoring function Kullback-Leibler Divergence (KLD) Engineering and Technology Teknik och teknologier
8	Protein-drug binding affinity prediction with machine learning : Assessing the impact of features from molecular dynamic simulations Guttormsson, Guðmundur Andri, Le Gallo, Léa January 2024 (has links) The development of medicine is generally a long and costly process, and one big factor is estimating the affinity of protein-drug binding. Leveraging machine learning in this field is a promising approach as it can streamline the prediction process and reduce the need for expensive experimental methods. Machine learning methods have already enabled significant advances in predicting protein-drug binding affinity, yet there remains room for improvement. The primary challenge is the quality of data used for these machine learning models. In this work, two ensemble machine learning models, Random Forest and Extreme Gradient Boosting Machine, have been tested and compared with a recent database of protein-ligand complex features calculated from molecular dynamics simulation. Additional features were also extracted from the PDB database through PLIP (Protein-Ligand interaction Profiler), aiming to improve the predictions further. The results indicate that while the features from the PDB database provided strong predictive power, including features from molecular dynamic simulations did not improve the models’ performance. machine learning ensemble models binding affinity molecular dynamics simulations scoring function Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Biochemistry and Molecular Biology Biokemi och molekylärbiologi Computer Sciences Datavetenskap (datalogi)
9	Nouvelles méthodes de calcul pour la prédiction des interactions protéine-protéine au niveau structural / Novel computational methods to predict protein-protein interactions on the structural level Popov, Petr 28 January 2015 (has links) Le docking moléculaire est une méthode permettant de prédire l'orientation d'une molécule donnée relativement à une autre lorsque celles-ci forment un complexe. Le premier algorithme de docking moléculaire a vu jour en 1990 afin de trouver de nouveaux candidats face à la protéase du VIH-1. Depuis, l'utilisation de protocoles de docking est devenue une pratique standard dans le domaine de la conception de nouveaux médicaments. Typiquement, un protocole de docking comporte plusieurs phases. Il requiert l'échantillonnage exhaustif du site d'interaction où les éléments impliqués sont considérées rigides. Des algorithmes de clustering sont utilisés afin de regrouper les candidats à l'appariement similaires. Des méthodes d'affinage sont appliquées pour prendre en compte la flexibilité au sein complexe moléculaire et afin d'éliminer de possibles artefacts de docking. Enfin, des algorithmes d'évaluation sont utilisés pour sélectionner les meilleurs candidats pour le docking. Cette thèse présente de nouveaux algorithmes de protocoles de docking qui facilitent la prédiction des structures de complexes protéinaires, une des cibles les plus importantes parmi les cibles visées par les méthodes de conception de médicaments. Une première contribution concerne l‘algorithme Docktrina qui permet de prédire les conformations de trimères protéinaires triangulaires. Celui-ci prend en entrée des prédictions de contacts paire-à-paire à partir d'hypothèse de corps rigides. Ensuite toutes les combinaisons possibles de paires de monomères sont évalués à l'aide d'un test de distance RMSD efficace. Cette méthode à la fois rapide et efficace améliore l'état de l'art sur les protéines trimères. Deuxièmement, nous présentons RigidRMSD une librairie C++ qui évalue en temps constant les distances RMSD entre conformations moléculaires correspondant à des transformations rigides. Cette librairie est en pratique utile lors du clustering de positions de docking, conduisant à des temps de calcul améliorés d'un facteur dix, comparé aux temps de calcul des algorithmes standards. Une troisième contribution concerne KSENIA, une fonction d'évaluation à base de connaissance pour l'étude des interactions protéine-protéine. Le problème de la reconstruction de fonction d'évaluation est alors formulé et résolu comme un problème d'optimisation convexe. Quatrièmement, CARBON, un nouvel algorithme pour l'affinage des candidats au docking basés sur des modèles corps-rigides est proposé. Le problème d'optimisation de corps-rigides est vu comme le calcul de trajectoires quasi-statiques de corps rigides influencés par la fonction énergie. CARBON fonctionne aussi bien avec un champ de force classique qu'avec une fonction d'évaluation à base de connaissance. CARBON est aussi utile pour l'affinage de complexes moléculaires qui comportent des clashes stériques modérés à importants. Finalement, une nouvelle méthode permet d'estimer les capacités de prédiction des fonctions d'évaluation. Celle-ci permet d‘évaluer de façon rigoureuse la performance de la fonction d'évaluation concernée sur des benchmarks de complexes moléculaires. La méthode manipule la distribution des scores attribués et non pas directement les scores de conformations particulières, ce qui la rend avantageuse au regard des critères standard basés sur le score le plus élevé. Les méthodes décrites au sein de la thèse sont testées et validées sur différents benchmarks protéines-protéines. Les algorithmes implémentés ont été utilisés avec succès pour la compétition CAPRI concernant la prédiction de complexes protéine-protéine. La méthodologie développée peut facilement être adaptée pour de la reconnaissance d'autres types d'interactions moléculaires impliquant par exemple des ligands, de l'ARN… Les implémentations en C++ des différents algorithmes présentés seront mises à disposition comme SAMSON Elements de la plateforme logicielle SAMSON sur http://www.samson-connect.net ou sur http://nano-d.inrialpes.fr/software. / Molecular docking is a method that predicts orientation of one molecule with respect to another one when forming a complex. The first computational method of molecular docking was applied to find new candidates against HIV-1 protease in 1990. Since then, using of docking pipelines has become a standard practice in drug discovery. Typically, a docking protocol comprises different phases. The exhaustive sampling of the binding site upon rigid-body approximation of the docking subunits is required. Clustering algorithms are used to group similar binding candidates. Refinement methods are applied to take into account flexibility of the molecular complex and to eliminate possible docking artefacts. Finally, scoring algorithms are employed to select the best binding candidates. The current thesis presents novel algorithms of docking protocols that facilitate structure prediction of protein complexes, which belong to one of the most important target classes in the structure-based drug design. First, DockTrina - a new algorithm to predict conformations of triangular protein trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) is presented. The method takes as input pair-wise contact predictions from a rigid-body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation (RMSD) test. Being fast and efficient, DockTrina outperforms state-of-the-art computational methods dedicated to predict structure of protein oligomers on the collected benchmark of protein trimers. Second, RigidRMSD - a C++ library that in constant time computes RMSDs between molecular poses corresponding to rigid-body transformations is presented. The library is practically useful for clustering docking poses, resulting in ten times speed up compared to standard RMSD-based clustering algorithms. Third, KSENIA - a novel knowledge-based scoring function for protein-protein interactions is developed. The problem of scoring function reconstruction is formulated and solved as a convex optimization problem. As a result, KSENIA is a smooth function and, thus, is suitable for the gradient-base refinement of molecular structures. Remarkably, it is shown that native interfaces of protein complexes provide sufficient information to reconstruct a well-discriminative scoring function. Fourth, CARBON - a new algorithm for the rigid-body refinement of docking candidates is proposed. The rigid-body optimization problem is viewed as the calculation of quasi-static trajectories of rigid bodies influenced by the energy function. To circumvent the typical problem of incorrect stepsizes for rotation and translation movements of molecular complexes, the concept of controlled advancement is introduced. CARBON works well both in combination with a classical force-field and a knowledge-based scoring function. CARBON is also suitable for refinement of molecular complexes with moderate and large steric clashes between its subunits. Finally, a novel method to evaluate prediction capability of scoring functions is introduced. It allows to rigorously assess the performance of the scoring function of interest on benchmarks of molecular complexes. The method manipulates with the score distributions rather than with scores of particular conformations, which makes it advantageous compared to the standard hit-rate criteria. The methods described in the thesis are tested and validated on various protein-protein benchmarks. The implemented algorithms are successfully used in the CAPRI contest for structure prediction of protein-protein complexes. The developed methodology can be easily adapted to the recognition of other types of molecular interactions, involving ligands, polysaccharides, RNAs, etc. The C++ versions of the presented algorithms will be made available as SAMSON Elements for the SAMSON software platform at http://www.samson-connect.net or at http://nano-d.inrialpes.fr/software. Interactions protéine-protéine Docking moléculaire Scoring fonction Minimisation de corps rigide Optimisation convexe Root écart quadratique moyen Protein-protein interactions Molecular docking Scoring function Rigid-body minimization Convex optimization Root mean square deviation 510 004
10	Nouvelles méthodes de calcul pour la prédiction des interactions protéine-protéine au niveau structural / Novel computational methods to predict protein-protein interactions on the structural level Popov, Petr 28 January 2015 (has links) Le docking moléculaire est une méthode permettant de prédire l'orientation d'une molécule donnée relativement à une autre lorsque celles-ci forment un complexe. Le premier algorithme de docking moléculaire a vu jour en 1990 afin de trouver de nouveaux candidats face à la protéase du VIH-1. Depuis, l'utilisation de protocoles de docking est devenue une pratique standard dans le domaine de la conception de nouveaux médicaments. Typiquement, un protocole de docking comporte plusieurs phases. Il requiert l'échantillonnage exhaustif du site d'interaction où les éléments impliqués sont considérées rigides. Des algorithmes de clustering sont utilisés afin de regrouper les candidats à l'appariement similaires. Des méthodes d'affinage sont appliquées pour prendre en compte la flexibilité au sein complexe moléculaire et afin d'éliminer de possibles artefacts de docking. Enfin, des algorithmes d'évaluation sont utilisés pour sélectionner les meilleurs candidats pour le docking. Cette thèse présente de nouveaux algorithmes de protocoles de docking qui facilitent la prédiction des structures de complexes protéinaires, une des cibles les plus importantes parmi les cibles visées par les méthodes de conception de médicaments. Une première contribution concerne l‘algorithme Docktrina qui permet de prédire les conformations de trimères protéinaires triangulaires. Celui-ci prend en entrée des prédictions de contacts paire-à-paire à partir d'hypothèse de corps rigides. Ensuite toutes les combinaisons possibles de paires de monomères sont évalués à l'aide d'un test de distance RMSD efficace. Cette méthode à la fois rapide et efficace améliore l'état de l'art sur les protéines trimères. Deuxièmement, nous présentons RigidRMSD une librairie C++ qui évalue en temps constant les distances RMSD entre conformations moléculaires correspondant à des transformations rigides. Cette librairie est en pratique utile lors du clustering de positions de docking, conduisant à des temps de calcul améliorés d'un facteur dix, comparé aux temps de calcul des algorithmes standards. Une troisième contribution concerne KSENIA, une fonction d'évaluation à base de connaissance pour l'étude des interactions protéine-protéine. Le problème de la reconstruction de fonction d'évaluation est alors formulé et résolu comme un problème d'optimisation convexe. Quatrièmement, CARBON, un nouvel algorithme pour l'affinage des candidats au docking basés sur des modèles corps-rigides est proposé. Le problème d'optimisation de corps-rigides est vu comme le calcul de trajectoires quasi-statiques de corps rigides influencés par la fonction énergie. CARBON fonctionne aussi bien avec un champ de force classique qu'avec une fonction d'évaluation à base de connaissance. CARBON est aussi utile pour l'affinage de complexes moléculaires qui comportent des clashes stériques modérés à importants. Finalement, une nouvelle méthode permet d'estimer les capacités de prédiction des fonctions d'évaluation. Celle-ci permet d‘évaluer de façon rigoureuse la performance de la fonction d'évaluation concernée sur des benchmarks de complexes moléculaires. La méthode manipule la distribution des scores attribués et non pas directement les scores de conformations particulières, ce qui la rend avantageuse au regard des critères standard basés sur le score le plus élevé. Les méthodes décrites au sein de la thèse sont testées et validées sur différents benchmarks protéines-protéines. Les algorithmes implémentés ont été utilisés avec succès pour la compétition CAPRI concernant la prédiction de complexes protéine-protéine. La méthodologie développée peut facilement être adaptée pour de la reconnaissance d'autres types d'interactions moléculaires impliquant par exemple des ligands, de l'ARN… Les implémentations en C++ des différents algorithmes présentés seront mises à disposition comme SAMSON Elements de la plateforme logicielle SAMSON sur http://www.samson-connect.net ou sur http://nano-d.inrialpes.fr/software. / Molecular docking is a method that predicts orientation of one molecule with respect to another one when forming a complex. The first computational method of molecular docking was applied to find new candidates against HIV-1 protease in 1990. Since then, using of docking pipelines has become a standard practice in drug discovery. Typically, a docking protocol comprises different phases. The exhaustive sampling of the binding site upon rigid-body approximation of the docking subunits is required. Clustering algorithms are used to group similar binding candidates. Refinement methods are applied to take into account flexibility of the molecular complex and to eliminate possible docking artefacts. Finally, scoring algorithms are employed to select the best binding candidates. The current thesis presents novel algorithms of docking protocols that facilitate structure prediction of protein complexes, which belong to one of the most important target classes in the structure-based drug design. First, DockTrina - a new algorithm to predict conformations of triangular protein trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) is presented. The method takes as input pair-wise contact predictions from a rigid-body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation (RMSD) test. Being fast and efficient, DockTrina outperforms state-of-the-art computational methods dedicated to predict structure of protein oligomers on the collected benchmark of protein trimers. Second, RigidRMSD - a C++ library that in constant time computes RMSDs between molecular poses corresponding to rigid-body transformations is presented. The library is practically useful for clustering docking poses, resulting in ten times speed up compared to standard RMSD-based clustering algorithms. Third, KSENIA - a novel knowledge-based scoring function for protein-protein interactions is developed. The problem of scoring function reconstruction is formulated and solved as a convex optimization problem. As a result, KSENIA is a smooth function and, thus, is suitable for the gradient-base refinement of molecular structures. Remarkably, it is shown that native interfaces of protein complexes provide sufficient information to reconstruct a well-discriminative scoring function. Fourth, CARBON - a new algorithm for the rigid-body refinement of docking candidates is proposed. The rigid-body optimization problem is viewed as the calculation of quasi-static trajectories of rigid bodies influenced by the energy function. To circumvent the typical problem of incorrect stepsizes for rotation and translation movements of molecular complexes, the concept of controlled advancement is introduced. CARBON works well both in combination with a classical force-field and a knowledge-based scoring function. CARBON is also suitable for refinement of molecular complexes with moderate and large steric clashes between its subunits. Finally, a novel method to evaluate prediction capability of scoring functions is introduced. It allows to rigorously assess the performance of the scoring function of interest on benchmarks of molecular complexes. The method manipulates with the score distributions rather than with scores of particular conformations, which makes it advantageous compared to the standard hit-rate criteria. The methods described in the thesis are tested and validated on various protein-protein benchmarks. The implemented algorithms are successfully used in the CAPRI contest for structure prediction of protein-protein complexes. The developed methodology can be easily adapted to the recognition of other types of molecular interactions, involving ligands, polysaccharides, RNAs, etc. The C++ versions of the presented algorithms will be made available as SAMSON Elements for the SAMSON software platform at http://www.samson-connect.net or at http://nano-d.inrialpes.fr/software. Interactions protéine-protéine Docking moléculaire Scoring fonction Minimisation de corps rigide Optimisation convexe Root écart quadratique moyen Protein-protein interactions Molecular docking Scoring function Rigid-body minimization Convex optimization Root mean square deviation 510 004

Search results