Global ETD Search

31	Computational Protein Structure Analysis : Kernel And Spectral Methods Bhattacharya, Sourangshu 08 1900 (has links) The focus of this thesis is to develop computational techniques for analysis of protein structures. We model protein structures as points in 3-dimensional space which in turn are modeled as weighted graphs. The problem of protein structure comparison is posed as a weighted graph matching problem and an algorithm motivated from the spectral graph matching techniques is developed. The thesis also proposes novel similarity measures by deriving kernel functions. These kernel functions allow the data to be mapped to a suitably defined Reproducing kernel Hilbert Space(RKHS), paving the way for efficient algorithms for protein structure classification. Protein structure comparison (structure alignment)is a classical method of determining overall similarity between two protein structures. This problem can be posed as the approximate weighted subgraph matching problem, which is a well known NP-Hard problem. Spectral graph matching techniques provide efficient heuristic solution for the weighted graph matching problem using eigenvectors of adjacency matrices of the graphs. We propose a novel and efficient algorithm for protein structure comparison using the notion of neighborhood preserving projections (NPP) motivated from spectral graph matching. Empirically, we demonstrate that comparing the NPPs of two protein structures gives the correct equivalences when the sizes of proteins being compared are roughly similar. Also, the resulting algorithm is 3 -20 times faster than the existing state of the art techniques. This algorithm was used for retrieval of protein structures from standard databases with accuracies comparable to the state of the art. A limitation of the above method is that it gives wrong results when the number of unmatched residues, also called insertions and deletions (indels), are very high. This problem was tackled by matching neighborhoods, rather than entire structures. For each pair of neighborhoods, we grow the neighborhood alignments to get alignments for entire structures. This results in a robust method that has outperformed the existing state of the art methods on standard benchmark datasets. This method was also implemented using MPI on a cluster for database search. Another important problem in computational biology is classification of protein structures into classes exhibiting high structural similarity. Many manual and semi-automatic structural classification databases exist. Kernel methods along with support vector machines (SVM) have proved to be a robust and principled tool for classification. We have proposed novel positive semidefinite kernel functions on protein structures based on spatial neighborhoods. The kernels were derived using a general technique called convolution kernel, and showed to be related to the spectral alignment score in a limiting case. These kernels have outperformed the existing tools when validated on a well known manual classification scheme called SCOP. The kernels were designed keeping the general problem of capturing structural similarity in mind, and have been successfully applied to problems in other domains, e.g. computer vision. Protein - Structure Protein Structure - Data Processing Protein Structure Alignment Kernel Method Structural Bioinformatics Spectral Graph Theory Machine Learning Neighborhood Alignments Structural Alignment Protein Structure Classification Bioinformatics
32	Loop Modeling in Proteins Using a Database Approach with Multi-Dimensional Scaling Holtby, Daniel James 09 1900 (has links) Modeling loops is an often necessary step in protein structure and function determination, even with experimental X-ray and NMR data. It is well known to be difficult. Database techniques have the advantage of producing a higher proportion of predictions with sub-angstrom accuracy when compared with ab initio techniques, but the disadvantage of often being able to produce usable results as they depend entirely on the loop already being represented within the database. My contribution is the LoopWeaver protocol, a database method that uses multidimensional scaling to rapidly achieve better clash-free, low energy placement of loops obtained from a database of protein structures. This maintains the above- mentioned advantage while avoiding the disadvantage by permitting the use of lower quality matches that would not otherwise fit. Test results show that this method achieves significantly better results than all other methods, including Modeler, Loopy, SuperLooper, and Rapper before refinement. With refinement, the results (LoopWeaver and Loopy combined) are better than ROSETTA's, with 0.53Å RMSD on average for 206 loops of length 6, 0.75Å local RMSD for 168 loops of length 7, 0.93Å RMSD for 117 loops of length 8, and 1.13Å RMSD loops of length 9, while ROSETTA scores 0.66Å , 0.93Å , 1.23Å , 1.56Å , respectively, at the same average time limit (3 hours on a 2.2 GHz Opteron). When ROSETTA is allowed to run for over a week against LoopWeaver's and Loopy's combined 3 hours, it approaches, but does not surpass, this accuracy. loop modeling protein structure Computer Science
33	A coarse-grained Langevin molecular dynamics approach to de novo protein structure prediction Sasai, Masaki, Cetin, Hikmet, Sasaki, Takeshi N. 05 1900 (has links) No description available. Fragment assembly Langevin dynamics Protein structure prediction
34	Reconstruction of ancient evolution : protien domains and phylogenies / Cantarel, Brandi Lynn. January 2006 (has links) Thesis (Ph. D.)--University of Virginia, 2006. / Includes bibliographical references (leaves 86-104). Also available online through Digital Dissertations.
35	Structural Determination of the ZZ Domain of Cytoplasmic Polyadenlation Element Binding Protein Merkel, Daniel 01 August 2012 (has links) Cytoplasmic polyadenylation-element binding protein (CPEB) is required for the translational regulation in multiple cell types. CPEB is known to play important roles in early germ cell development, in neuronal synaptic plasticity, and in the process of cellular senescence. CPEB is able to control translation by first interacting with a specific sequence of mRNA known as the CPE site. CPEB recognizes a specific sequence of mRNA, called the cytoplasmic polyadenylation element. This is a uracil rich sequence that is located on the 3' UTR of mRNA. Once CPEB is bound to the CPE site, CPEB can interact with other proteins. CPEB is most notably known for interacting with a cleavage and polyadenylation specificity factor (CPSF), with a poly(A)-specific ribonuclease, and with a poly(A) polymerase in the Gld2 family. This complex of proteins controls polyadenylation on the 3' end of mRNA. By controlling the lengthening of the poly(A) tail, translation can be regulated. CPEB is believed to contain two RNA recognition motifs and a zinc binding region on the N-terminus. The zinc binding region contains six cysteine and two histidine residues that bind to two zinc atoms in a tetrahedral geometry. Using NMR spectroscopy, the structure of zinc binding region of CPEB1 was determined. This protein was shown to bind to two zinc ions in a cross-braced topology. The zinc binding region of CPEB was also determined that the correct classification for this zinc finger is a ZZ domain. NMR polyadenylation protein structure translational regulation zinc
36	Studium molekulárních mechanismů regulace signálních proteinů / Study of molecular mechanisms of the signaling proteins regulation Kylarová, Salome January 2018 (has links) EN The aim of this study was to investigate the regulatory mechanisms of two important signaling proteinkinases and promising therapeutic targets, ASK1 and CaMKK2. ASK1 kinase is a member of the mitogen-activated protein kinase kinase kinase (MAP3K) family that activates c-JNK kinase and p38 MAP kinase pathways in response to various stress stimuli, including oxidative stress. The function of ASK1 is associated with the activation of apoptosis and thus plays a key role in the pathogenesis of multiple diseases including cancer, neurodegeneration or cardiovascular diseases. The natural inhibitor of ASK1 is a ubiquitous oxidoreductase, thioredoxin, which is probably bound to N-terminus of ASK1, thus preventing a homophilic interaction and subsequent ASK1 activation. It has been suggested, that upon oxidative stress and oxidation of thioredoxin active site, thioredoxin dissociates from ASK1, but the structural basis of this interaction remains unclear. Calcium/calmodulin-dependent protein kinase kinase 2 (CaMKK2) is a member of CaM kinase pathway that activates CaMKI, CaMKIV and AMPK involved in gene expression regulation or apoptosis activation. Function of this protein is often associated with neuropathology, carcinogenesis and obesity. CaM kinases are activated via binding Ca2+ sensor protein...
37	Unraveling the Molecular Impact of Missense Variants: Insights into Protein Structure and Disease Associations Alvarez, Ana C. Gonzalez 07 1900 (has links) One of the primary challenges in clinical genetics is the interpretation of the numerous genetic variants identified through sequencing applications. Assessing the impact of missense variants where only one amino acid is substituted is particularly difficult. In this study, we examined the structural characteristics of amino acids affected by missense substitutions in 26,690 pathogenic variants and compared them to 11,302 common variants found in the general population. This analysis was conducted across 6,747 protein structures. The residues were annotated using 7 protein features with a total of 35 feature subtypes. Subsequently, we assessed the burden of both common and pathogenic missense variants across these features. Additionally, we carried out separate analyses relative to protein function (with variants grouped in 24 protein functional classes) and relative to diseases (with variants grouped in 86 diseases). Through a comprehensive analysis of the entire dataset, we identified 25 pathogenic features that play a crucial role in the overall fitness and stability of proteins. Additionally, when we conducted individual analyses for 24 protein functional classes, we discovered specific features that are relevant to each function. For the disease analysis we identified 3 main clusters. Type I diseases primarily result from ordered mutations and are mainly affected by charge loss. This cluster is dominated by transporter protein class and includes diseases linked to X-chromosome. Type II diseases involve hydrolases and are characterized by enriched variants at the protein core, resulting in protein destabilization. Type III diseases involve extracellular matrix proteins (mainly collagen), are predominantly found in disordered regions, and are affected by charge gain and introduction of polar residues. Gly variants are particularly relevant in this cluster, as collagen proteins require Gly in every third residue in the collagen triple-helix. Considering the structural aspects when interpreting mutations associated with diseases offers valuable insights into their underlying mechanisms. Our work can serve as resource to delineate and understand variant pathogenicity by mapping a genetic variant into its structural context. missense variants protein structure protein class
38	When a domain is not a domain, and why it is important to properly filter proteins in databases: conflicting definitions and fold classification systems for structural domains make filtering of such databases imperative Towse, Clare-Louise, Daggett, V. 26 October 2012 (has links) No / Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our consensus domain dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1,695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predictions suggest that 40% of proteins have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets. / NIH
39	The dynameomics entropy dictionary: a large-scale assessment of conformational entropy across protein fold space Towse, Clare-Louise, Akke, M., Daggett, V. 04 April 2017 (has links) Yes / Molecular dynamics (MD) simulations contain considerable information with regard to the motions and fluctuations of a protein, the magnitude of which can be used to estimate conformational entropy. Here we survey conformational entropy across protein fold space using the Dynameomics database, which represents the largest existing dataset of protein MD simulations for representatives of essentially all known protein folds. We provide an overview of MD-derived entropies accounting for all possible degrees of dihedral freedom on an unprecedented scale. Although different side chains might be expected to impose varying restrictions on the conformational space that the backbone can sample, we found that the backbone entropy and side chain size are not strictly coupled. An outcome of these analyses is the Dynameomics Entropy Dictionary, the contents of which have been compared with entropies derived by other theoretical approaches and experiment. As might be expected, the conformational entropies scale linearly with the number of residues, demonstrating that conformational entropy is an extensive property of proteins. The calculated conformational entropies of folding agree well with previous estimates. Detailed analysis of specific cases identify deviations in conformational entropy from the average values that highlight how conformational entropy varies with sequence, secondary structure, and tertiary fold. Notably, alpha-helices have lower entropy on average than do beta-sheets, and both are lower than coil regions. / National Institutes of Health, US Department of Energy Office of Biological Research, National Energy Research Scientific Computing Center, Swedish Research Council, Knut and Alic Wallenberg Foundation
40	Adaptive Balancing of Exploitation with Exploration to Improve Protein Structure Prediction Brunette, TJ 13 May 2011 (has links) The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. Conformation space search methods thus have to focus exploration on a small fraction of the search space. The ability to choose appropriate regions, i.e. regions that are highly likely to contain the native state, critically impacts the effectiveness of search. To make the choice of where to explore requires information, with higher quality information resulting in better choices. Most current search methods are designed to work in as many domains as possible, which leads to less accurate information because of the need for generality. However, most domains provide unique, and accurate information. To best utilize domain specific information search needs to be customized for each domain. The first contribution of this thesis customizes search for protein structure prediction, resulting in significantly more accurate protein structure predictions. Unless information is perfect, mistakes will be made, and search will focus on regions that do not contain the native state. How search recovers from mistakes is critical to its effectiveness. To recover from mistakes, this thesis introduces the concept of adaptive balancing of exploitation with exploration. Adaptive balancing of exploitation with exploration allows search to use information only to the extent to which it guides exploration toward the native state. Existing methods of protein structure prediction rely on information from known proteins. Currently, this information is from either full-length proteins that share similar sequences, and hence have similar structures (homologs), or from short protein fragments. Homologs and fragments represent two extremes on the spectrum of information from known proteins. Significant additional information can be found between these extremes. However, current protein structure prediction methods are unable to use information between fragments and homologs because it is difficult to identify the correct information from the enormous amount of incorrect information. This thesis makes it possible to use information between homologs and fragments by adaptively balancing exploitation with exploration in response to an estimate of template protein quality. My results indicate that integrating the information between homologs and fragments significantly improves protein structure prediction accuracy, resulting in several proteins predicted with <1>°A RMSD resolution. Optimization Protein Structure Prediction Search Computer Sciences

Search results