Global ETD Search

1	A Computational Approach to Predicting Distance Maps from Contact Maps Kuo, Tony Chien-Yen 23 May 2012 (has links) One approach to protein structure prediction is to first predict from sequence, a thresholded and binary 2D representation of a protein's topology known as a contact map. Then, the predicted contact map can be used as distance constraints to construct a 3D structure. We focus on the latter half of the process and aim to obtain a set of non-binary distance constraints from contacts maps. This thesis proposes an approach to extend the traditional binary definition of “in contact” by incorporating fuzzy logic to construct fuzzy contact maps from a set of contact maps at different thresholds, providing a vehicle for error handling. Then, a novel template-based similarity search and distance geometry methods were applied to predict distance constraints in the form of a distance map. The three-dimensional coordinates were then calculated from the predicted distance constraints. Experiments were conducted to test our approach for various levels of noise. As well, we compare the performance of fuzzy contact maps to binary contact maps in the framework of our methodology. Our results showed that fuzzy contact map similarity was indicative of distance map similarity. Thus, we were able to retrieved similar distance map regions using fuzzy contact map similarity. The retrieved distance map regions provided a good starting point for adaptation and allowed for the extrapolation of missing distance values. We were thus able to predict distance maps from which, the three-dimensional coordinates were able to be calculated. Testing of this framework on binary contact maps revealed that fuzzy contact maps had better performance with or without noise due to a stronger correlation between fuzzy contact map similarity and distance map similarity. Thus, the methodology described in this thesis is able to predict good distance maps from fuzzy contact maps in the presence of noise and the resulting coordinates were highly correlated to the performance of the predicted distance maps. / Thesis (Ph.D, Computing) -- Queen's University, 2012-05-23 13:59:28.12 Protein Structure Contact Map
2	The Universal Similarity Metric, Applied to Contact Maps Comparison in A Two-Dimensional Space Rahmati, Sara 27 September 2008 (has links) Comparing protein structures based on their contact maps is an important problem in structural proteomics. Building a system for reconstructing protein tertiary structures from their contact maps is one of the motivations for devising novel contact map comparison algorithms. Several methods that address the contact map comparison problem have been designed which are briefly discussed in this thesis. However, they suggest scoring schemes that do not satisfy the two characteristics of “metricity” and “universality”. In this research we investigate the applicability of the Universal Similarity Metric (USM) to the contact map comparison problem. The USM is an information theoretical measure which is based on the concept of Kolmogorov complexity. The ultimate goal of this research is to use the USM in case-based reasoning system to predict protein structures from their predicted contact maps. The fact that the contact maps that will be used in such a system are the ones which are predicted from the protein sequences and are not noise-free, implies that we should investigate the noise-sensitivity of the USM. This is the first attempt to study the noise-tolerance of the USM. In this research, as the first implementation of the USM we converted the two-dimensional data structures (contact maps) to one-dimensional data structures (strings). The results of this implementation motivated us to circumvent the dimension reduction in our second attempt to implement the USM. Our suggested method in this thesis has the advantage of obtaining a measure which is noise tolerant. We assess the effectiveness of this noise tolerance by testing different USM implementation schemes against noise-contaminated versions of distinguished data-sets. / Thesis (Master, Computing) -- Queen's University, 2008-09-27 05:53:31.988 bioinformatics proteomics kolmogorov complexity universal similarity metric protein contact map contact map comparison protein structure prediction
3	Multi-Regional Analysis of Contact Maps for Protein Structure Prediction Ahmed, Hazem Radwan A. 24 April 2009 (has links) 1D protein sequences, 2D contact maps and 3D structures are three different representational levels of detail for proteins. Predicting protein 3D structures from their 1D sequences remains one of the complex challenges of bioinformatics. The "Divide and Conquer" principle is applied in our research to handle this challenge, by dividing it into two separate yet dependent subproblems, using a Case-Based Reasoning (CBR) approach. Firstly, 2D contact maps are predicted from their 1D protein sequences; secondly, 3D protein structures are then predicted from their predicted 2D contact maps. We focus on the problem of identifying common substructural patterns of protein contact maps, which could potentially be used as building blocks for a bottom-up approach for protein structure prediction. We further demonstrate how to improve identifying these patterns by combining both protein sequence and structural information. We assess the consistency and the efficiency of identifying common substructural patterns by conducting statistical analyses on several subsets of the experimental results with different sequence and structural information. / Thesis (Master, Computing) -- Queen's University, 2009-04-23 22:01:04.528 Protein Structure Contact Map Sequence Similarity Protein Homology
4	Transcription factor binding dynamics and spatial co-localization in human genome Ma, Xiaoyan January 2017 (has links) Transcription factor (TF) binding has been studied extensively in relation to binding site affinity and chromosome modifications; however, the relationship between genome spatial organisation and transcription factor binding is not well studied. Using the recently available high resolution Hi-C contact map of human GM12878 lymphoblastoid cells, we investigated computationally the genome-wide spatial co-localization of transcription factor binding sites, for both within the same type and between different types. First, we observed a strong positive correlation between site occupancy and homotypic TF co-localization based on Hi-C contacts, consistent with our predictions from biophysical simulations of TF target search. This trend is more prominent in binding sites with weak binding sequences and within enhancers, suggesting genome spatial organisation plays an essential role in determining binding site occupancy, especially for weak regulatory elements. Furthermore, when investigating spatial co-localization between different TFs, we discovered two distinct co-localization networks of TFs in lymphoblastoid cells, one of which is enriched in lymphocyte specific pathways and distal enhancer binding. These two TF networks have strong biases for either the A1 or A2 chromosome subcompartment, but nonetheless are still preserved within each, indicating a potential causal link between cell-type-specific transcription factor binding and chromosome subcompartment segregation. We called 40 pairs of significantly co-localized TFs according to the genome wide Hi-C contact map, which are enriched in previously reported, physical interactions, thus linking TF spatial network to co-functioning. In addition to the above main project, I also worked on a side project to find compute-efficient ways in scaling binding site strength across different TFs based on Position-Weight-Matrices (PWM). While common bioinformatics tools produce scores that can reflect the binding strength between a specific TF and the DNA, these scores are not directly comparable between different TFs. We provided two approaches in estimating a scaling parameter $\lambda$ to the PWM score for different TFs. The first approach uses a PWM and background genomic sequence as input to estimate $\lambda$ for a specific TF, which we applied to show that $\lambda$ distributions for different TF families correspond with their DNA binding properties. Our second method can reliably convert $\lambda$ between different PWMs of the same TF, which allows us to directly compare PWMs that were generated by different approaches.
5	Bayesian models and algoritms for protein secondary structure and beta-sheet prediction Aydin, Zafer 17 September 2008 (has links) In this thesis, we developed Bayesian models and machine learning algorithms for protein secondary structure and beta-sheet prediction problems. In protein secondary structure prediction, we developed hidden semi-Markov models, N-best algorithms and training set reduction procedures for proteins in the single-sequence category. We introduced three residue dependency models (both probabilistic and heuristic) incorporating the statistically significant amino acid correlation patterns at structural segment borders. We allowed dependencies to positions outside the segments to relax the condition of segment independence. Another novelty of the models is the dependency to downstream positions, which is important due to asymmetric correlation patterns observed uniformly in structural segments. Among the dataset reduction methods, we showed that the composition based reduction generated the most accurate results. To incorporate non-local interactions characteristic of beta-sheets, we developed two N-best algorithms and a Bayesian beta-sheet model. In beta-sheet prediction, we developed a Bayesian model to characterize the conformational organization of beta-sheets and efficient algorithms to compute the optimum architecture, which includes beta-strand pairings, interaction types (parallel or anti-parallel) and residue-residue interactions (contact maps). We introduced a Bayesian model for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework by combining the amino acid pairing potentials with a priori knowledge of beta-strand organizations. To select the optimum beta-sheet architecture, we analyzed the space of possible conformations by efficient heuristics, in which we significantly reduce the search space by enforcing the amino acid pairs that have strong interaction potentials. For proteins with more than six beta-strands, we first computed beta-strand pairings using the BetaPro method. Then, we computed gapped alignments of the paired beta-strands in parallel and anti-parallel directions and chose the interaction types and beta-residue pairings with maximum alignment scores. Accurate prediction of secondary structure, beta-sheets and non-local contacts should improve the accuracy and quality of the three-dimensional structure prediction. Bayesian models Machine learning Hidden Markov models Contact map prediction Protein beta-sheet prediction Protein secondary structure prediction Molecular biology Amino acid sequence Bioinformatics

1

Page generated in 0.0566 seconds