Spelling suggestions: "subject:"atructural bioinformatics"" "subject:"atructural ioinformatics""
21 |
Functional characterization of proteins involved in cell cycle by structure-based computational methodsSontheimer, Jana 16 April 2012 (has links)
In the recent years, a rapidly increasing amount of experimental data has been generated by high-throughput technologies. Despite of these large quantities of protein-related data and the development of computational prediction methods, the function of many proteins is still unknown. In the human proteome, at least 20% of the annotated proteins are not characterized. Thus, the question, how to predict protein function from its amino acid sequence, remains to be answered for many proteins. Classical bioinformatics approaches for function prediction are based on inferring function from well-characterized homologs, which are identified based on sequence similarity. However, these methods fail to identify distant homologs with low sequence similarity. As protein structure is more conserved than sequence in protein families, structure-based methods (e.g. fold recognition) may recognize possible structural similarities even at low sequence similarity and therefore provide information for function inference. These fold recognition methods have already been proven to be successful for individual proteins, but their automation for high-throughput application is difficult due to intrinsic challenges of these techniques, mainly caused by a high false positive rate. Automated identification of remote homologs based on fold recognition methods would allow a signi cant improvement in functional annotation of proteins. My approach was to combine structure-based computational prediction methods with experimental data from genome-wide RNAi screens to support the establishment of functional hypotheses by improving the analysis of protein structure prediction results.
In the first part of my thesis, I characterized proteins from the Ska complex by computational methods. I showed the benefit of including experimental information to identify remote homologs: Integration of functional data helped to reduce the number of false positives in fold recognition results and made it possible to establish interesting functional hypotheses based on high con dence structural predictions. Based on the structural hypothesis of a GLEBS motif in c13orf3 (Ska3), I could derive a potential molecular mechanism that could explain the observed phenotype.
In the second part of my thesis, my goal was to develop computational tools and automated analysis techniques to be able to perform structure-based functional annotation in a high-throughput way. I designed and implemented key tools that were successfully integrated into a computational platform, called StrAnno, which I set up together with my colleagues. These novel computational modules include a domain prediction algorithm and a graphical overview that facilitates and accelerates the analysis of results.
StrAnno can be seen as a first step towards automatic functional annotation of proteins by structure-based methods. First, the analysis of long hit lists to identify promising candidates for further analysis is substantially facilitated by integration and combination of various sequence-based computational tools and data from functional databases. Second, the developed post-processing tools accelerate the evaluation of structural and functional hypotheses. False positives from the threading result lists are removed by various filters, and analysis of the possible true positives is greatly enhanced by the graphical overview. With these two essential benefits, fold recognition techniques are applicable to large-scale approaches. By applying this developed methodology to hits from a genome-wide cell cycle RNAi screen and evaluating structural hypotheses by molecular modeling techniques, I aimed to associate biological functions to human proteins and link the RNAi phenotype to a molecular function. For two selected human proteins, c20orf43 and HJURP, I could establish interesting structural and functional hypotheses. These predictions were based on templates with low sequence identity (10-20%). The uncharacterized human protein c20orf43 might be a E3 SUMO-ligase that could be involved either in DNA repair or rRNA regulatory processes. Based on the structural hypotheses of two domains of HJURP, I predicted a potential link to ubiquitylation processes and direct DNA binding. In addition, I substantiated the cell cycle arrest phenotype of these two genes upon RNAi knockdown.
Fold recognition methods are a promising alternative for functional annotation of proteins that escape sequence-based annotation due to their low sequence identity to well-characterized protein families. The structural and functional hypotheses I established in my thesis open the door to investigate the molecular mechanisms of previously uncharacterized proteins, which may provide new insights into cellular mechanisms.
|
22 |
Applications of Structural Bioinformatics for the Structural Genomics EraNovotny, Marian January 2007 (has links)
<p>Structural bioinformatics deals with the analysis, classification and prediction of three-dimensional structures of biomacromolecules. It is becoming increasingly important as the number of structures is growing rapidly. This thesis describes three studies concerned with protein-function prediction and two studies about protein structure validation.</p><p>New protein structures are often compared to known structures to find out if they have a known fold, which may provide hints about their function. The functionality and performance of eleven fold-comparison servers were evaluated. None of the tested servers achieved perfect recall, so in practise a combination of servers should be used.</p><p>If fold comparison does not provide any hints about the function of a protein, structural motif searches can be employed. A survey of left-handed helices in known protein structures was carried out. The results show that left-handed helices are rare motifs, but most of them occur in active or ligand-binding sites. Their identification can therefore help to pinpoint potentially important residues.</p><p>Sometimes all available methods fail to provide hints about the function of a protein. Therefore, the potential of using docking techniques to predict which ligands are likely to bind to a particular protein has been investigated. Initial results show that it will be difficult to build a reliable automated docking protocol that will suit all proteins.</p><p>The effect of various phenomena on the precision of accessible surface area calculations was also investigated. The results suggest that it is prudent to report such values with a precision of 50 to 100 Å<sup>2</sup>.</p><p>Finally, a survey of register shifts in known protein structures was carried out. The identified potential register shifts were analysed and classified. A machine-learning approach ("rough sets") was used in an attempt to diagnose register errors in structures.</p>
|
23 |
Applications of Structural Bioinformatics for the Structural Genomics EraNovotny, Marian January 2007 (has links)
Structural bioinformatics deals with the analysis, classification and prediction of three-dimensional structures of biomacromolecules. It is becoming increasingly important as the number of structures is growing rapidly. This thesis describes three studies concerned with protein-function prediction and two studies about protein structure validation. New protein structures are often compared to known structures to find out if they have a known fold, which may provide hints about their function. The functionality and performance of eleven fold-comparison servers were evaluated. None of the tested servers achieved perfect recall, so in practise a combination of servers should be used. If fold comparison does not provide any hints about the function of a protein, structural motif searches can be employed. A survey of left-handed helices in known protein structures was carried out. The results show that left-handed helices are rare motifs, but most of them occur in active or ligand-binding sites. Their identification can therefore help to pinpoint potentially important residues. Sometimes all available methods fail to provide hints about the function of a protein. Therefore, the potential of using docking techniques to predict which ligands are likely to bind to a particular protein has been investigated. Initial results show that it will be difficult to build a reliable automated docking protocol that will suit all proteins. The effect of various phenomena on the precision of accessible surface area calculations was also investigated. The results suggest that it is prudent to report such values with a precision of 50 to 100 Å2. Finally, a survey of register shifts in known protein structures was carried out. The identified potential register shifts were analysed and classified. A machine-learning approach ("rough sets") was used in an attempt to diagnose register errors in structures.
|
24 |
Modifying a Protein-Protein Interaction Identifier with a Topology and Sequence-Order Independent Structural Comparison MethodJohansson, Joakim January 2018 (has links)
Using computational methods to identify protein-protein interactions (PPIs) supports experimental techniques by using less time and less resources. Identifying PPIs can be made through a template-based approach that describes how unstudied proteins interact by aligning a common structural template that exists in both interacting proteins. A pipeline that uses this is InterPred, that combines homology modelling and massive template comparison to construct coarse interaction models. These models are reviewed by a machine learning classifier that classifies models that shows traits of being true, which can be further refined with a docking technique. However, InterPred is dependent on using complex structural information, that might not be available from unstudied proteins, while it is suggested that PPIs are dependent of the shape and interface of proteins. A method that aligns structures based on the interface attributes is InterComp, which uses topological and sequence-order independent structural comparison. Implementing this method into InterPred will lead to restricting structural information to the interface of proteins, which could lead to discovery of undetected PPI models. The result showed that the modified pipeline was not comparable based on the receiver operating characteristic (ROC) performance. However, the modified pipeline could identify new potential PPIs that were undetected by InterPred.
|
25 |
Análise por ferramentas de bioinformática da proteína não-estrutural 5A do vírus da hepatite C genótipo 1 e 3 em amostras pré-tratamentoYamasaki, Lílian Hiromi Tomonari [UNESP] 14 April 2010 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:27:20Z (GMT). No. of bitstreams: 0
Previous issue date: 2010-04-14Bitstream added on 2014-06-13T18:31:21Z : No. of bitstreams: 1
yamasaki_lht_me_sjrp.pdf: 6194517 bytes, checksum: 350d37acd1e46ef2885eecc1354a317c (MD5) / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / A infecção pelo vírus da Hepatite C (HCV) é considerada um grande problema de saúde pública, desde a sua descoberta em 1989. Entretanto a terapia mais utilizada atualmente, baseada no uso de Peginterferon, tem sucesso em aproximadamente 50% dos pacientes com o genótipo 1. Embora os mecanismos envolvidos nesta resistência viral ainda não sejam esclarecidos, sugere-se que fatores virais e do hospedeiro participam deste. A proteína não-estrutural 5A (NS5A) está envolvida em diversos processos celulares e é um componente essencial para o HCV. Entretanto, sua estrutura e função ainda não foram bem elucidadas. A partir destes fatos, os objetivos do presente estudo foram elaborar um modelo teórico da NS5A e investigar as propriedades estruturais e funcionais in silico. Foram analisadas 345 sequências da proteína NS5A do HCV de 23 pacientes infectados com o genótipo 1 ou 3. As composições de aminoácidos e de estrutura secundária demonstraram que há diferença entre os genótipos, podendo indicar que há diferenças nas interações proteína-proteína entre os genótipos, o que pode estar relacionado com a diferença da taxa de resistência ao tratamento. A análise funcional foi realizada com o ProtFun, que sugeriu que a NS5A estaria envolvida nas funções celulares de metabolismo intermediário central, tradução, crescimento, tranporte, ligação e hormônio. Estas funções variaram entre os domínios, suportando a hipótese de que a NS5A é uma proteína multifuncional. A análise pelo PROSITE indicou vários sítios de glicosilação, fosforilação e miristoilação, que são altamente conservados e podem ter função importante na estabilização da estrutura e função, sendo assim possíveis alvos de novos antivirais. Alguns deles estão em regiões relacionadas com a resposta ao tratamento. Outro... / Hepatitis C virus (HCV) infects almost 3% of people worldwide and it is considered the main cause of liver chronic diseases and transplants. Until today, there is no effective vaccine and the current most used therapy, based on Peginterferon, is successful only in 50% of patients infected by genotype 1. Although the outcomes of this treatment resistance are unclear, it is suggested host and virus factors may participate in this mechanism. Non-structural 5A (NS5A) protein is involved in several cellular and virus processes and it is a critical component of HCV. However, its structure and function are still uncertain. Regarding these facts, the present study attachments were to elaborate a model of the NS5A protein and to investigate NS5A structural and functional features, using in silico tools. It was analyzed 345 sequences of HCV NS5A protein from 23 patients infected by genotypes 1 or 3. Residues and secondary structure composition of all sequences demonstrated that there are differences between genotypes. It may indicate that there are differences in interactions between genotypes, which could be related with the distinct average of treatment resistance. In addition, among those that varied between genotypes, there were amino acids in regions that studies suggested as related with virus persistence. Functional analysis was performed with ProtFun. It suggested that NS5A is involved with central intermediary metabolism, translation, growth, transport, ligation and hormone functions in the cell. These functions vary between the domains, strengthening the hypothesis that NS5A is a multifunctional protein. Prosite motif search indicated that there are many glicosilation, fosforilation and myristoilation sites, which are highly conserved and may play an important role in structural stabilization and... (Complete abstract click electronic access below)
|
26 |
Structural bioinformatics studies and tool development related to drug discoveryHatherley, Rowan January 2016 (has links)
This thesis is divided into two distinct sections which can be combined under the broad umbrella of structural bioinformatics studies related to drug discovery. The first section involves the establishment of an online South African natural products database. Natural products (NPs) are chemical entities synthesised in nature and are unrivalled in their structural complexity, chemical diversity, and biological specificity, which has long made them crucial to the drug discovery process. South Africa is rich in both plant and marine biodiversity and a great deal of research has gone into isolating compounds from organisms found in this country. However, there is no official database containing this information, making it difficult to access for research purposes. This information was extracted manually from literature to create a database of South African natural products. In order to make the information accessible to the general research community, a website, named “SANCDB”, was built to enable compounds to be quickly and easily searched for and downloaded in a number of different chemical formats. The content of the database was assessed and compared to other established natural product databases. Currently, SANCDB is the only database of natural products in Africa with an online interface. The second section of the thesis was aimed at performing structural characterisation of proteins with the potential to be targeted for antimalarial drug therapy. This looked specifically at 1) The interactions between an exported heat shock protein (Hsp) from Plasmodium falciparum (P. falciparum), PfHsp70-x and various host and exported parasite J proteins, as well as 2) The interface between PfHsp90 and the heat shock organising protein (PfHop). The PfHsp70-x:J protein study provided additional insight into how these two proteins potentially interact. Analysis of the PfHsp90:PfHop also provided a structural insight into the interaction interface between these two proteins and identified residues that could be targeted due to their contribution to the stability of the Hsp90:Hop binding complex and differences between parasite and human proteins. These studies inspired the development of a homology modelling tool, which can be used to assist researchers with homology modelling, while providing them with step-by-step control over the entire process. This thesis presents the establishment of a South African NP database and the development of a homology modelling tool, inspired by protein structural studies. When combined, these two applications have the potential to contribute greatly towards in silico drug discovery research.
|
27 |
Protein Model Quality Assessment : A Machine Learning ApproachUziela, Karolis January 2017 (has links)
Many protein structure prediction programs exist and they can efficiently generate a number of protein models of a varying quality. One of the problems is that it is difficult to know which model is the best one for a given target sequence. Selecting the best model is one of the major tasks of Model Quality Assessment Programs (MQAPs). These programs are able to predict model accuracy before the native structure is determined. The accuracy estimation can be divided into two parts: global (the whole model accuracy) and local (the accuracy of each residue). ProQ2 is one of the most successful MQAPs for prediction of both local and global model accuracy and is based on a Machine Learning approach. In this thesis, I present my own contribution to Model Quality Assessment (MQA) and the newest developments of ProQ program series. Firstly, I describe a new ProQ2 implementation in the protein modelling software package Rosetta. This new implementation allows use of ProQ2 as a scoring function for conformational sampling inside Rosetta, which was not possible before. Moreover, I present two new methods, ProQ3 and ProQ3D that both outperform their predecessor. ProQ3 introduces new training features that are calculated from Rosetta energy functions and ProQ3D introduces a new machine learning approach based on deep learning. ProQ3 program participated in the 12th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP12) and was one of the best methods in the MQA category. Finally, an important issue in model quality assessment is how to select a target function that the predictor is trying to learn. In the fourth manuscript, I show that MQA results can be improved by selecting a contact-based target function instead of more conventional superposition based functions. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.</p>
|
28 |
Développements algorithmiques pour l'analyse et la prédiction de la structure des protéines / Novel computational developments for protein structure analysis and predictionPages, Guillaume 12 September 2019 (has links)
Les protéines sont omniprésentes dans les processus biologiques. Identifier leurs fonctions aide à comprendre et éventuellement à contrôler ces processus. Cependant, si la détermination de la séquence protéique est désormais une procédure de routine, il est souvent difficile d'utiliser cette information pour extraire des connaissances fonctionnelles pertinentes sur le système étudié. En effet, la fonction d'une protéine repose sur ses propriétés chimiques et mécaniques, lesquelles sont définies par sa structure. Ainsi, la prédiction, la compréhension et l'analyse de la structure des protéines sont parmi les principaux défis de la biologie moléculaire.La prédiction et l'analyse des repliements de protéines est le sujet central de cette thèse. Cependant, de nombreuses protéines sont organisées selon des assemblages qui sont symétriques dans la plupart des cas et certaines protéines contiennent des répétitions internes. La conception d'une structure avec des répétitions ou d'un assemblage protéique symétrique est souvent le moyen le plus simple pour l'évolution d'atteindre une certaine fonction. Ceci qui nous a poussé à développer des méthodes spécialement conçues pour les assemblages protéiques symétriques et les protéines avec répétitions internes. Une autre motivation derrière cette thèse était d'explorer et de faire progresser le domaine émergent de l'apprentissage profond appliqué aux données atomistiques tridimensionnelle (3D).Cette thèse s'articule autour de deux parties. Dans la première partie, nous proposons des algorithmes pour analyser la structures des assemblages symétriques de protéines. Nous commençons par définir une mesure de symétrie basée sur la distance euclidienne 3D et décrivons un algorithme permettant de calculer efficacement cette mesure et de déterminer les axes de symétrie des assemblages protéiques. Cet algorithme est capable de traiter tous les groupes ponctuels de symétrie, à savoir les symétries cycliques, dièdrales, tétraédriques, octaédriques et icosaédriques, grâce à une heuristique robuste qui perçoit la correspondance entre sous-unités asymétriques. Nous étendons ensuite les limites du problème et proposons une méthode applicable à des cartes de densité 3D. Nous abordons ce problème à l'aide d'un réseau neuronal profond (DNN), et nous proposons une méthode qui prédit l'ordre de symétrie l'axe de symétrie 3D.Ensuite, nous proposons une architecture DNN pour évaluer la qualité de modèles 3D de repliements de protéines. Nous avons entrainé le DNN en utilisant en entrée la géométrie locale autour de chaque résidu dans un modèle de protéine représenté par une carte de densité, et avons prédit les CAD-scores de ces résidus. Le DNN a été créé pour être invariant par rapport à l'orientation du modèle d'entrée. Nous avons également conçu certaines parties du DNN pour reconnaître automatiquement les propriétés des atomes et sélectionner des descripteurs pertinents. Enfin, nous analysons les descripteurs appris par le DNN. Nous montrons que notre architecture apprend effectivement des propriétés des atomes, des acides aminés et des structures moléculaires de niveau supérieur. Certaines propriétés sont déjà bien étudiées comme les éléments chimiques, les charges partielles atomiques, les propriétés des acides aminés, la structure secondaire des protéines et l'exposition au solvant. Nous démontrons également que notre réseau apprend de nouvelles caractéristiques structurelles.Cette étude présente de nouveaux outils pour la biologie structurale. Certains sont déjà utilisés dans la communauté, par les évaluateurs de CASP par example. Elle démontre également la puissance de l'apprentissage profond pour la représentation de la structure des protéines et son applicabilité aux problèmes des données 3D. / Proteins are ubiquitous for virtually all biological processes. Identifying their role helps to understand and potentially control these processes. However, even though protein sequence determination is now a routine procedure, it is often very difficult to use this information to extract relevant functional knowledge about system under study. Indeed, the function of a protein relies on a combination of its chemical and mechanical properties, which are defined by its structure. Thus, understanding, analysis and prediction of protein structure are the key challenges in molecular biology.Prediction and analysis of individual protein folds is the central topic of this thesis. However, many proteins are organized in higher-level assemblies, which are symmetric in most of the cases, and also some proteins contain internal repetitions.In many cases, designing a fold with repetitions or designing a symmetric protein assembly is the simplest way for evolution to achieve a specific function. This is because the number of combinatorial possibilities in the interactions of designed folds reduces exponentially in the symmetric cases. This motivated us to develop specific methods for symmetric protein assemblies and also for individual proteins with internal repeats. Another motivation behind this thesis was to explore and advance the emerging deep neural network field in application to atomistic 3-dimensional (3D) data.This thesis can be logically split into two parts. In the first part, we propose algorithms to analyse structures of protein assemblies, and more specifically putative structural symmetries.We start with a definition of a symmetry measure based on 3D Euclidean distance, and describe an algorithm to efficiently compute this measure, and to determine the axes of symmetry of protein assemblies. This algorithm is able to deal with all point groups, which include cyclic, dihedral, tetrahedral, octahedral and icosahedral symmetries, thanks to a robust heuristic that perceives correspondence between asymmetric subunits. We then extend the boundaries of the problem, and propose a method applicable to the atomistic structures without atom correspondence, internal symmetries, and repetitions in raw density maps. We tackle this problem using a deep neural network (DNN), and we propose a method that predicts the symmetry order and a 3D symmetry axis.Then, we extend the DNN architecture to recognise folding quality of 3D protein models. We trained the DNN using as input the local geometry around each residue in a protein model represented as a density map, and we predicted the CAD-scores of these residues. The DNN was specifically conceived to be invariant with respect to the orientation of the input model. We also designed some parts of the network to automatically recognise atom properties and robustly select features. Finally, we provide an analysis of the features learned by the DNN. We show that our architecture correctly learns atomic, amino acid, and also higher-level molecular descriptors. Some of them are rather complex, but well understood from the biophysical point of view. These include atom partial charges, atom chemical elements, properties of amino acids, protein secondary structure and atom solvent exposure. We also demonstrate that our network learns novel structural features.This study introduces novel tools for structural biology. Some of them are already used in the community, for example, by the PDBe database and CASP assessors. It also demonstrates the power of deep learning in the representation of protein structure and shows applicability of DNNs to computational tasks that involve 3D data.
|
29 |
Clustering approaches for extracting structural determinants of enzyme active sitesStamatelou, Ismini - Christina January 2020 (has links)
The study of enzyme binding sites is an essential but rather demanding process of increased complexity since the amino acids lining these areas are not rigid. At the same time, the minimization of side effects and the specificity of new ligands is a great challenge in the structure-based drug design approach. Using glycogen phosphorylase - a validated target for the development of new antidiabetic agents - as a case study, this project focuses on the examination of side-chain conformations of amino acids that play a key role in the catalytic site of the enzyme. Specifically, different rotamers of each amino acid were collected to build a dataset of different conformations of the catalytic site. The rotamers were filtered by their probability of occurrence and subsequently, all rotamers that create steric clashes were rejected. Then, these conformations were clustered based on their similarity. Three different clustering algorithms and multiple numbers of clusters were tested using the silhouette scores evaluation for the clustering process. In order to measure the similarity, the Euclidean metric was used which due to the correspondence of the coordinates between the conformations was very similar to the cRMSD metric. Two-level clustering was applied to the dataset for more in-depth observations. According to the clustering results, specific aminoacids with major geometrical variations in their rotamers play the most important role in the separation of the clusters. Additionally, all rotamers of an amino acid can be grouped based on their structure, something that was confirmed using “Chimera” software as a visualization tool. To this end, the ultimate aim of this study is to examine whether the clustering of conformations produces clusters with points geometrically similar to each other, in order to identify near neighbors, i.e. conformations that are quite similar in structure but do not play a determinant role in the function and those that are quite diverse and could be further exploited.
|
30 |
Statistical Analysis of Biological Interactions from Homologous ProteinsXu, Qifang January 2008 (has links)
Information fusion aims to develop intelligent approaches of integrating information from complementary sources, such that a more comprehensive basis is obtained for data analysis and knowledge discovery. Our Protein Biological Unit (ProtBuD) database is the first database that integrated the biological unit information from the Protein Data Bank (PDB), Protein Quaternary Server (PQS) and Protein Interfaces, Surfaces and Assemblies (PISA) server, and compared the three biological units side-by-side. The statistical analyses show that the inconsistency within these databases and between them is significant. In order to improve the inconsistency, we studied interfaces across different PDB entries in a protein family using an assumption that interfaces shared by different crystal forms are likely to be biologically relevant. A novel computational method is proposed to achieve this goal. First, redundant data were removed by clustering similar crystal structures, and a representative entry was used for each cluster. Then a modified k-d tree algorithm was applied to facilitate the computation of identifying interfaces from crystals. The interface similarity functions were derived from Gaussian distributions fit to the data. Hierarchical clustering was used to cluster interfaces to define the likely biological interfaces by the number of crystal forms in a cluster. Benchmark data sets were used to determine whether the existence or lack of existence of interfaces across multiple crystal forms can be used to predict whether a protein is an oligomer or not. The probability that a common interface is biological is given. An interface shared in two different crystal forms by divergent proteins is very likely to be biologically important. The interface data not only provide new interaction templates for computational modeling, but also provide more accurate data for training sets and testing sets in data-mining research to predict protein-protein interactions. In summary, we developed a framework which is based on databases where different biological unit information is integrated and new interface data are stored. In order for users from the biology community to use the data, a stand-alone software program, a web site with a user-friendly graphical interface, and a web service are provided. / Computer and Information Science
|
Page generated in 0.0921 seconds