Proteins are the basic building blocks of biological organisms, and are responsible for a variety of functions within them. Proteins are composed of unique amino acid sequences. Some has only one sequence, while others contain several sequences that are combined together. These combined amino acid sequences fold to form a unique three-dimensional (3D) shape. Although the sequences may fold proteins into different 3D shapes in diverse environments, proteins with similar amino acid sequences typically have similar 3D shapes and functions. Knowledge of the 3D shape of a protein is important in both protein function analysis and drug design, for example when assessing the toxicity reduction associated with a given drug. Due to the complexity of protein 3D shapes and the close relationship between shapes and functions, the prediction of protein 3D shapes has become an important topic in bioinformatics.
This research introduces a new approach to predict proteins’ 3D shapes, utilizing a multilayer artificial neural network. Our novel solution allows one to learn and predict the representations of the 3D shape associated with a protein by starting directly from its amino acid sequence descriptors. The input of the artificial neural network is a set of amino acid sequence descriptors we created based on a set of probability density functions. In our algorithm, the probability density functions are calculated by the correlation between the constituent amino acids, according to the substitution matrix. The output layer of the network is formed by 3D shape descriptors provided by an information retrieval system, called CAPRI. This system contains the pose invariant 3D shape descriptors, and retrieves proteins having the closest structures. The network is trained by proteins with known amino acid sequences and 3D shapes. Once the network has been trained, it is able to predict the 3D shape descriptors of the query protein. Based on the predicted 3D shape descriptors, the CAPRI system allows the retrieval of known proteins with 3D shapes closest to the query protein. These retrieved proteins may be verified as to whether they are in the same family as the query protein, since proteins in the same family generally have similar 3D shapes.
The search for similar 3D shapes is done against a database of more than 45,000 known proteins. We present the results when evaluating our approach against a number of protein families of various sizes. Further, we consider a number of different neural network architectures and optimization algorithms. When the neural network is trained with proteins that are from large families where the proteins in the same family have similar amino acid sequences, the accuracy for finding proteins from the same family is 100%. When we employ proteins whose family members have dissimilar amino acid sequences, or those from a small protein family, in which case, neural networks with one hidden layer produce more promising results than networks with two hidden layers, and the performance may be improved by increasing the number of hidden nodes when the networks have one hidden layer.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OOU.#10393/23636 |
Date | 10 January 2013 |
Creators | Zhao, Jing |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Thèse / Thesis |
Page generated in 0.0019 seconds