• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 1
  • 1
  • Tagged with
  • 19
  • 19
  • 12
  • 8
  • 8
  • 7
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Computational bioinformatics on three-dimensional structures of ribosomes using multiresolutional analysis

Hsiao, Chiaolong 25 August 2008 (has links)
RNA is amazing. We found that without changing the backbone connectivity, RNA can maintain structural conservation in 3D via topology switches, at a single residue level. I developed a method of representing RNA structure in multiresolution, called the PBR approach (P stands for Phosphate; B stands for Base; R stands for Ribose). In this method, structural data is viewed through a series of resolutions from finest to coarsest. At a single nucleotide resolution (fine resolution), RNA is abstruse and elaborate with structural insertions/deletions, strand clips, and 3,2-switches. The compilation of structural deviations of RNA, called DevLS (Deviations of Local Structure), provides a new descriptive language of RNA structure, allowing one to systematize and investigate RNA structure. Using PBR analysis, a total of 103 tetraloops within the crystal structures of the 23s rRNA of H. marismortui and the 70s rRNA of T. thermophilus are found and classified. Combining them, I constructed a 'tetraloop family tree', using a tree formalism, to unify and re-define the tetraloop motif and to represent relationships between tetraloops, as grouped by DevLS. To date, structural alignment of very large RNAs remains challenge due to the large size, intricate backbone choreography, and tertiary interactions. To overcome these obstacles, I developed a concept of structural anchors along with a 'Divide and Conquer' strategy for performing superimposition of 23s rRNAs. The successful alignment and superimpositions of the 23s rRNAs of T. thermophilus and H. marismortui gives an overall RMSD of atomic positions of 1.2 Å, as utilized 73% of RNA backbone atoms (~ 2129 residues). By using principles of inorganic chemistry along with structural alignment technique as described above, a recurrent magnesium-binding motif in large RNAs is revealed. These magnesium-binding motifs play a critical role in the framework of the ribosomal PTC by their locations, topologies, and coordination geometries. Common features of Mg2+-mc's include direct phosphate chelation of two magnesium ions in the form of Mg2+(i)-(O1P-P-O2P)-Mg2+(j), phosphate groups of adjacent RNA residues as ligands of a given Mg2+, and undulated RNA surfaces with unpaired and unstacked bases.
12

The Structural and Functional Identity of the Protein Kinase Superfamily

Knight, James D R January 2011 (has links)
The human protein kinase superfamily consists of over 500 members that individually control specific aspects of cell behavior and collectively control the complete range of cellular processes. That such a large group of proteins is able to uniquely diversify and establish individual identities while retaining common enzymatic function and significant sequence/structural conservation is remarkable. The means by which this is achieved is poorly understood, and we have begun to examine the issue by performing a comparative analysis of the catalytic domain of protein kinases. A novel approach for protein structural alignment has revealed a high degree of similarity found across the kinase superfamily, with variability confined largely to a single region thought to be involved in substrate binding. The similarity detected is not limited to amino acids, but includes a group of conserved water molecules that play important structural roles in stabilizing critical residues and the fold of the kinase domain. The development of a novel technique for identifying kinase substrates on a large scale directly from cell lysate has revealed that substrate specificity is not what discriminates the closely related p38α and β mitogen-activated protein kinases. Instead cellular localization appears to be their distinguishing characteristic, at least during myoblast differentiation. Together these results highlight the extent of conservation, as well as the minimal variability, that is found in the catalytic domain of all protein kinase superfamily members, and that while distantly related kinases may be distinguished by substrate specificity, closely related kinases are likely to be distinguished by other factors. Although these results focus on representative members of the kinase superfamily, they give insight as to how all protein kinases likely diversified and established unique non-redundant identities. In addition, the novel techniques developed and presented here for structural alignment and substrate discovery offer new tools for studying molecular biology and cell signaling.
13

Modifying a Protein-Protein Interaction Identifier with a Topology and Sequence-Order Independent Structural Comparison Method

Johansson, Joakim January 2018 (has links)
Using computational methods to identify protein-protein interactions (PPIs) supports experimental techniques by using less time and less resources. Identifying PPIs can be made through a template-based approach that describes how unstudied proteins interact by aligning a common structural template that exists in both interacting proteins. A pipeline that uses this is InterPred, that combines homology modelling and massive template comparison to construct coarse interaction models. These models are reviewed by a machine learning classifier that classifies models that shows traits of being true, which can be further refined with a docking technique. However, InterPred is dependent on using complex structural information, that might not be available from unstudied proteins, while it is suggested that PPIs are dependent of the shape and interface of proteins. A method that aligns structures based on the interface attributes is InterComp, which uses topological and sequence-order independent structural comparison. Implementing this method into InterPred will lead to restricting structural information to the interface of proteins, which could lead to discovery of undetected PPI models. The result showed that the modified pipeline was not comparable based on the receiver operating characteristic (ROC) performance. However, the modified pipeline could identify new potential PPIs that were undetected by InterPred.
14

PePIP : a Pipeline for Peptide-Protein Interaction-site Prediction / PePIP : en Pipeline for Förutsägelse av Peptid-Protein Bindnings-site

Johansson-Åkhe, Isak January 2017 (has links)
Protein-peptide interactions play a major role in several biological processes, such as cellproliferation and cancer cell life-cycles. Accurate computational methods for predictingprotein-protein interactions exist, but few of these method can be extended to predictinginteractions between a protein and a particularly small or intrinsically disordered peptide. In this thesis, PePIP is presented. PePIP is a pipeline for predicting where on a given proteina given peptide will most probably bind. The pipeline utilizes structural aligning to perusethe Protein Data Bank for possible templates for the interaction to be predicted, using thelarger chain as the query. The possible templates are then evaluated as to whether they canrepresent the query protein and peptide using a Random Forest classifier machine learningalgorithm, and the best templates are found by using the evaluation from the Random Forest in combination with hierarchical clustering. These final templates are then combined to givea prediction of binding site. PePIP is proven to be highly accurate when testing on a set of 502 experimentally determinedprotein-peptide structures, suggesting a binding site on the correct part of the protein- surfaceroughly 4 out of 5 times.
15

Computational Protein Structure Analysis : Kernel And Spectral Methods

Bhattacharya, Sourangshu 08 1900 (has links)
The focus of this thesis is to develop computational techniques for analysis of protein structures. We model protein structures as points in 3-dimensional space which in turn are modeled as weighted graphs. The problem of protein structure comparison is posed as a weighted graph matching problem and an algorithm motivated from the spectral graph matching techniques is developed. The thesis also proposes novel similarity measures by deriving kernel functions. These kernel functions allow the data to be mapped to a suitably defined Reproducing kernel Hilbert Space(RKHS), paving the way for efficient algorithms for protein structure classification. Protein structure comparison (structure alignment)is a classical method of determining overall similarity between two protein structures. This problem can be posed as the approximate weighted subgraph matching problem, which is a well known NP-Hard problem. Spectral graph matching techniques provide efficient heuristic solution for the weighted graph matching problem using eigenvectors of adjacency matrices of the graphs. We propose a novel and efficient algorithm for protein structure comparison using the notion of neighborhood preserving projections (NPP) motivated from spectral graph matching. Empirically, we demonstrate that comparing the NPPs of two protein structures gives the correct equivalences when the sizes of proteins being compared are roughly similar. Also, the resulting algorithm is 3 -20 times faster than the existing state of the art techniques. This algorithm was used for retrieval of protein structures from standard databases with accuracies comparable to the state of the art. A limitation of the above method is that it gives wrong results when the number of unmatched residues, also called insertions and deletions (indels), are very high. This problem was tackled by matching neighborhoods, rather than entire structures. For each pair of neighborhoods, we grow the neighborhood alignments to get alignments for entire structures. This results in a robust method that has outperformed the existing state of the art methods on standard benchmark datasets. This method was also implemented using MPI on a cluster for database search. Another important problem in computational biology is classification of protein structures into classes exhibiting high structural similarity. Many manual and semi-automatic structural classification databases exist. Kernel methods along with support vector machines (SVM) have proved to be a robust and principled tool for classification. We have proposed novel positive semidefinite kernel functions on protein structures based on spatial neighborhoods. The kernels were derived using a general technique called convolution kernel, and showed to be related to the spectral alignment score in a limiting case. These kernels have outperformed the existing tools when validated on a well known manual classification scheme called SCOP. The kernels were designed keeping the general problem of capturing structural similarity in mind, and have been successfully applied to problems in other domains, e.g. computer vision.
16

Resolução do problema de alinhamento estrutural entre proteínas via técnicas de otimização global / Resolution of the problem of structural protein alignment by means of global optimization techniques

Gouveia, Paulo Sergio da Silva 17 August 2018 (has links)
Orientadores: Ana Friedlander de Martinez Perez, Roberto Andreani / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-17T18:28:36Z (GMT). No. of bitstreams: 1 Gouveia_PauloSergiodaSilva_D.pdf: 2266379 bytes, checksum: 85bb53a412744c3d168ac6fed4b701e0 (MD5) Previous issue date: 2011 / Resumo: A comparação estrutural entre proteínas é um problema fundamental na Biologia Molecular, pois estruturas similares entre proteínas, frequentemente refletem uma funcionalidade ou origem em comum entre as mesmas. No Problema de Alinhamento Estrutural entre Proteínas, buscamos encontrar o melhor alinhamento estrutural entre duas proteínas, ou seja, a melhor sobreposição entre duas estruturas proteicas, uma vez que alinhamentos locais podem levar a conclusões distorcidas sobre as características c funcionalidades das proteínas em estudo. A maioria dos métodos atuais para abordar este problema ou tem um custo computacional muito elevado ou não tem nenhuma garantia de convergência para o melhor alinhamento entre duas proteínas. Neste trabalho, propomos métodos computacionais para o Problema de Alinhamento Estrutural entre Proteínas que tenham boas garantias de encontrar o melhor alinhamento, mas em um tempo computacional razoável, utilizando as mais variadas técnicas de Otimização Global. A análise sobre os desempenhos de cada método tanto em termos quantitativos quanto qualitativos, além de um gráfico de Pareto, são apresentados de forma a facilitar a comparação entre os métodos com respeito à qualidade da solução e ao tempo computacional / Abstract: The structural comparison of proteins is a fundamental problem in Molecular Biology because similar structures often reflect a comrnon origin or funcionality. In the Protein Alignment problem onc seeks the best structural alignment between two proteins, i.e. the best overlap between two protein structures. Merely local alignments can lead to distorted conclusions on the problem features and functions. Most methods addressing this problem have a very high computational cost or are not supported with guarantecs of convergence to the best alignment. In this work we des-cribe computational methods for Protein Structural Alignment with good certificatea of optimality and reasonable computational execution time. We employ several Global Op-timization techniques. The performance is visualized by means of profile graphics and Pareto curves in order to take into account simultaneously emeiency and robustness of the methods / Doutorado / Otimização / Doutor em Matemática Aplicada
17

Structural bioinformatics tools for the comparison and classification of protein interactions

Garma, L. D. (Leonardo D.) 08 August 2017 (has links)
Abstract Most proteins carry out their functions through interactions with other molecules. Thus, proteins taking part in similar interactions are likely to carry out related functions. One way to determine whether two proteins do take part in similar interactions is by quantifying the likeness of their structures. This work focuses on the development of methods for the comparison of protein-protein and protein-ligand interactions, as well as their application to structure-based classification schemes. A method based on the MultiMer-align (or MM-align) program was developed and used to compare all known dimeric protein complexes. The results of the comparison demonstrates that the method improves over MM-align in a significant number of cases. The data was employed to classify the complexes, resulting in 1,761 different protein-protein interaction types. Through a statistical model, the number of existing protein-protein interaction types in nature was estimated at around 4,000. The model allowed the establishment of a relationship between the number of quaternary families (sequence-based groups of protein-protein complexes) and quaternary folds (structure-based groups). The interactions between proteins and small organic ligands were studied using sequence-independent methodologies. A new method was introduced to test three similarity metrics. The best of these metrics was subsequently employed, together with five other existing methodologies, to conduct an all-to-all comparison of all the known protein-FAD (Flavin-Adenine Dinucleotide) complexes. The results demonstrates that the new methodology captures the best the similarities between complexes in terms of protein-ligand contacts. Based on the all-to-all comparison, the protein-FAD complexes were subsequently separated into 237 groups. In the majority of cases, the classification divided the complexes according to their annotated function. Using a graph-based description of the FAD-binding sites, each group could be further characterized and uniquely described. The study demonstrates that the newly developed methods are superior to the existing ones. The results indicate that both the known protein-protein and the protein-FAD interactions can be classified into a reduced number of types and that in general terms these classifications are consistent with the proteins' functions. / Tiivistelmä Suurin osa proteiinien toiminnasta tapahtuu vuorovaikutuksessa muiden molekyylien kanssa. Proteiinit, jotka osallistuvat samanlaisiin vuorovaikutuksiin todennäköisesti toimivat samalla tavalla. Kahden proteiinin todennäköisyys esiintyä samanlaisissa vuorovaikutustilanteissa voidaan määrittää tutkimalla niiden rakenteellista samankaltaisuutta. Tämä väitöskirjatyö käsittelee proteiini-proteiini- ja proteiini-ligandi -vuorovaikutusten vertailuun käytettyjen menetelmien kehitystä, ja niiden soveltamista rakenteeseen perustuvissa luokittelujärjestelmissä. Tunnettuja dimeerisiä proteiinikomplekseja tutkittiin uudella MultiMer-align-ohjelmaan (MM-align) perustuvalla menetelmällä. Vertailun tulokset osoittavat, että uusi menetelmä suoriutui MM-alignia paremmin merkittävässä osassa tapauksista. Tuloksia käytettiin myös kompleksien luokitteluun, jonka tuloksena oli 1761 erilaista proteiinien välistä vuorovaikutustyyppiä. Luonnossa esiintyvien proteiinien välisten vuorovaikutusten määrän arvioitiin tilastollisen mallin avulla olevan noin 4000. Tilastollisen mallin avulla saatiin vertailtua sekä sekvenssin (”quaternary families”) sekä rakenteen (”quaternary folds”) mukaan ryhmiteltyjen proteiinikompleksien määriä. Proteiinien ja pienien orgaanisten ligandien välisiä vuorovaikutuksia tutkittiin sekvenssistä riippumattomilla menetelmillä. Uudella menetelmällä testattiin kolmea eri samankaltaisuutta mittaavaa metriikkaa. Näistä parasta käytettiin viiden muun tunnetun menetelmän kanssa vertailemaan kaikkia tunnettuja proteiini-FAD (Flavin-Adenine-Dinucleotide, flaviiniadeniinidinukleotidi) -komplekseja. Proteiini-ligandikontaktien osalta uusi menetelmä kuvasi kompleksien samankaltaisuutta muita menetelmiä paremmin. Vertailun tuloksia hyödyntäen proteiini-FAD-kompleksit luokiteltiin edelleen 237 ryhmään. Suurimmassa osassa tapauksista luokittelujärjestelmä oli onnistunut jakamaan kompleksit ryhmiin niiden toiminnallisuuden mukaisesti. Ryhmät voitiin määritellä yksikäsitteisesti kuvaamalla FAD:n sitoutumispaikka graafisesti. Väitöskirjatyö osoittaa, että siinä kehitetyt menetelmät ovat parempia kuin aikaisemmin käytetyt menetelmät. Tulokset osoittavat, että sekä proteiinien väliset että proteiini-FAD -vuorovaikutukset voidaan luokitella rajattuun määrään vuorovaikutustyyppejä ja yleisesti luokittelu on yhtenevä proteiinien toiminnan suhteen.
18

Analyse mixte de protéines basée sur la séquence et la structure - applications à l'annotation fonctionnelle / Mixed sequence-structure based analysis of proteins, with applications to functional annotations

Tetley, Romain 21 November 2018 (has links)
Dans cette thèse, l'emphase est mise sur la réconciliation de l'analyse de structure et de séquence pour les protéines. L'analyse de séquence brille lorsqu'il s'agit de comparer des protéines présentant une forte identité de séquence (≤ 30\%) mais laisse à désirer pour identifier des homologues lointains. L'analyse de structure est une alternative intéressante. Cependant, les méthodes de résolution de structures sont coûteuses et complexes - lorsque toutefois elles produisent des résultats. Ces observations rendent évident la nécessité de développer des méthodes hybrides, exploitant l'information extraite des structures disponibles pour l'injecter dans des modèles de séquence. Cette thèse produit quatre contributions principales dans ce domaine. Premièrement, nous présentons une nouvelle distance structurale, le RMSDcomb, basée sur des patterns de conservation structurale locale, les motifs structuraux. Deuxièmement, nous avons développé une méthode pour identifier des motifs structuraux entre deux structures exploitant un bootstrap dépendant de filtrations. Notre approche n'est pas un compétiteur direct des aligneurs flexibles mais permet plutôt de produire des analyses multi-échelles de similarités structurales. Troisièmement, nous exploitons les méthodes suscitées pour construire des modèles de Markov cachés hybrides biaisés vers des régions mieux conservées structurellement. Nous utilisons un tel modèle pour caractériser les protéines de fusion virales de classe II, une tâche particulièrement ardue du fait de leur faible identité de séquence et leur conservation structurale moyenne. Ce faisant, nous parvenons à trouver un certain nombre d'homologues distants connues des protéines virales, notamment chez la Drosophile. Enfin, en formalisant un sous-problème rencontré lors de la comparaison de filtrations, nous présentons un nouveau problème théorique - le D-family matching - sur lequel nous démontrons des résultats algorithmiques variés. Nous montrons - d'une façon analogue à la comparaison de régions de deux conformations d'une protéine - comment exploiter ce modèle théorique pour comparer deux clusterings d'un même jeu de données. / In this thesis, the focus is set on reconciling the realms of structure and sequence for protein analysis. Sequence analysis tools shine when faced with proteins presenting high sequence identity (≤ 30\%), but are lack - luster when it comes to remote homolog detection. Structural analysis tools present an interesting alternative, but solving structures - when at all possible- is a tedious and expensive process. These observations make the need for hybrid methods - which inject information obtained from available structures in a sequence model - quite clear. This thesis makes four main contributions toward this goal. First we present a novel structural measure, the RMSDcomb, based on local structural conservation patterns - the so called structural motifs. Second, we developed a method to identify structural motifs between two structures using a bootstrap method which relies on filtrations. Our approach is not a direct competitor to flexible aligners but can provide useful to perform a multiscale analysis of structural similarities. Third, we build upon the previous methods to design hybrid Hidden Markov Models which are biased towards regions of increased structural conservation between sets of proteins. We test this tool on the class II fusion viral proteins - particularly challenging because of their low sequence identity and mild structural homology. We find that we are able to recover known remote homologs of the viral proteins in the Drosophila and other organisms. Finally, formalizing a sub - problem encountered when comparing filtrations, we present a new theoretical problem - the D-family matching - on which we present various algorithmic results. We show - in a manner that is analogous to comparing parts of two protein conformations - how it is possible to compare two clusterings of the same data set using such a theoretical model.
19

Multiple sequence analysis in the presence of alignment uncertainty

Herman, Joseph L. January 2014 (has links)
Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.

Page generated in 0.3148 seconds