Return to search

Machine Learning Algorithms for Characterization and Prediction of Protein Structural Properties

Proteins are large biomolecules which are functional building blocks of living organisms. There are about 22,000 protein-coding genes in the human genome. Each gene encodes a unique protein sequence of a typical 100-1000 length which is built using a 20-letter alphabet of amino acids. Each protein folds up into a unique 3D shape that enables it to perform its function. Each protein structure consists of some number of helical segments, extended segments called sheets, and loops that connect these elements. In the last two decades, machine learning methods coupled with exponentially expanding biological knowledge databases and computational power are enabling significant progress in the field of computational biology. In this dissertation, I carry out machine learning research for three major interconnected problems to advance protein structural biology as a field. A separate chapter in this dissertation is devoted to each problem. After the three chapters I conclude this doctoral research with a summary and direction of our future work. Chapter 1 describes design, training and application of a convolutional neural network (SecNet) to achieve 84% accuracy for the 60-year-old problem of predicting protein secondary structure given a protein sequence. Our accuracy is 2-3% better than any previous result, which had only risen 5% in last 20 years. We identified the key factors for successful prediction in a detailed ablation study. A paper submitted for publication includes our secondary-structure prediction software, data set generation, and training and testing protocols [1]. Chapter 2 characterizes the design and development of a protocol for clustering of beta turns, i.e. short structural motifs responsible for U-turns in protein loops. We identified 18 turn types, 11 of which are newly described [2]. We also developed a turn library and cross-platform software for turn assignment in new structures. In Chapter 3 I build upon the results from these two problems and predict geometries in loops of unknown structure with custom Residual Neural Networks (ResNet). I demonstrate solid results on (a) locating turns and predicting 18 types and (b) prediction of backbone torsion angles in loops. Given the recent progress in machine learning, these two results provide a strong foundation for successful loop modeling and encourage us to develop a new loop structure prediction program, a critical step in protein structure prediction and modeling. / Computer and Information Science

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/2356
Date January 2019
CreatorsShapovalov, Maxim V
ContributorsVucetic, Slobodan, Vucetic, Slobodan, Obradovic, Zoran, Zhang, Kai, Dunbrack, Roland L., Carnevale, Vincenzo
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format164 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/2338, Theses and Dissertations

Page generated in 0.0018 seconds