Residue coupling in protein families has received much attention as an important indicator toward predicting protein structures and revealing functional insight into proteins. Existing coupling methods identify largely pairwise couplings and express couplings over amino acid combinations, which do not yield a mechanistic explanation. Most of these methods primarily use a multiple protein sequence alignment---most likely a resultant alignment---which better exposes couplings and is obtained through manual tweaking of an alignment constructed by a classical alignment algorithm. Classical alignment algorithms primarily focus on capturing conservations and may not fully unveil couplings in the alignment. In this dissertation, we propose methods for capturing both pairwise and higher-order couplings in protein families. Our methods provide mechanistic explanations for couplings using physicochemical properties of amino acids and discernibility between orders. We also investigate a method for mining frequent episodes---called coupled patterns---in an alignment produced by a classical algorithm for proteins and for exploiting the coupled patterns for improving the alignment quality in terms of exposition of couplings. We demonstrate the effectiveness of our proposed methods on a large collection of sequence datasets for protein families. / Ph. D. / Proteins are biomolecules that comprise amino acid compounds. A chain of amino acid (a.k.a. protein sequence) forms the primary structure of a protein, and the shaping of this chain into various folds gives rise to a more complex 3D structure, a natural state of proteins. It is through structures protein performs various activities. To preserve these activities in proteins, evolution allows only those changes in protein sequences that do not disrupt the overall structures and functions of proteins. Coupling is a evolutionary phenomenon that helps proteins preserve their structures and functions. Two or more amino acid positions are coupled if changes of amino acids at a position is compensated by changes in the other position(s). In this thesis, we propose a set of probabilistic methods for modeling such couplings between two or more positions. Our methods identify the most probable couplings in a set of protein sequences and express them with probabilistic graphical models (a powerful and interpretable framework), which can be used for answering questions related to protein structures, functions, and protein synthesis. Using this notion of coupling, we also develop a method for improving the quality of multiple protein sequence alignment, a widely used tool for protein sequence analyses. We evaluate our methods with a large collection of sequence datasets for protein families, and the results substantiate the efficacy of our methods.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/83218 |
Date | 16 November 2016 |
Creators | Hossain, K.S.M. Tozammel |
Contributors | Computer Science, Ramakrishnan, Naren, Bailey-Kellogg, Chris, Prakash, B. Aditya, Onufriev, Alexey V., Baker, Nathan A. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0021 seconds