Global ETD Search

21	Application of Graphical Models in Protein-Protein Interactions and Dynamics Vajdi Hoojghan, Amir 30 January 2019 (has links) <p> Every organism contains a few hundred to thousands of proteins. A protein is made of a sequence of molecular building blocks named amino acids. Amino acids will be referred to as residues. Every protein performs one or more functions in the cell. In order for a protein to do its job, it requires to bind properly to other partner proteins. Many genetic diseases such as cancer are caused by mutations (changes) of specific residues which cause disturbances in the functions of those proteins. The problem of prediction of protein binding site is a crucial topic in computational biology. A protein is usually made up of 50 to a few thousand residues. A contact site can occur within a protein or with other proteins. By having a robust and accurate model for identifying residues that are involved in the binding site, scientists can investigate the impact of critical mutations and residues that can cause genetic diseases. </p><p> The main focus of this thesis is to propose a machine learning model for predicting the binding site between two proteins. By extracting structural information from a protein, we can have additional knowledge of binding sites. This structural information can be converted into a penalty matrix for a graphical model to be learned from the protein sequence. The second part of this thesis is mostly focused on motion planning algorithms for proteins and simulation of the protein pathway changes using a Monte Carlo based method. Later, by applying a novel geometry based scoring function, we cluster the intermediate conformations into corresponding subsets that may indicate interesting intermediate states.</p><p>
22	Structure-Preserving Rearrangements\| Algorithms for Structural Comparison and Protein Analysis Bliven, Spencer Edward 13 August 2015 (has links) <p> Protein structure is fundamental to a deep understanding of how proteins function. Since structure is highly conserved, structural comparison can provide deep information about the evolution and function of protein families. The Protein Data Bank (PDB) continues to grow rapidly, providing copious opportunities for advancing our understanding of proteins through large-scale searches and structural comparisons. In this work I present several novel structural comparison methods for specific applications, as well as apply structure comparison tools systematically to better understand global properties of protein fold space. </p><p> Circular permutation describes a relationship between two proteins where the N-terminal portion of one protein is related to the C-terminal portion of the other. Proteins that are related by a circular permutation generally share the same structure despite the rearrangement of their primary sequence. This non-sequential relationship makes them difficult for many structure alignment tools to detect. Combinatorial Extension for Circular Permutations (CE-CP) was developed to align proteins that may be related by a circular permutation. It is widely available due to its incorporation into the RCSB PDB website. </p><p> Symmetry and structural repeats are common in protein structures at many levels. The CE-Symm tool was developed in order to detect internal pseudosymmetry within individual polypeptide chains. Such internal symmetry can arise from duplication events, so aligning the individual symmetry units provides insights about conservation and evolution. In many cases, internal symmetry can be shown to be important for a number of functions, including ligand binding, allostery, folding, stability, and evolution. </p><p> Structural comparison tools were applied comprehensively across all PDB structures for systematic analysis. Pairwise structural comparisons of all proteins in the PDB have been computed using the Open Science Grid computing infrastructure, and are kept continually up-to-date with the release of new structures. These provide a network-based view of protein fold space. CE-Symm was also applied to systematically survey the PDB for internally symmetric proteins. It is able to detect symmetry in ~20% of all protein families. Such PDB-wide analyses give insights into the complex evolution of protein folds. </p>
23	The Paladin Suite\| Multifaceted Characterization of Whole Metagenome Shotgun Sequences Westbrook, Anthony 14 March 2018 (has links) <p> Whole metagenome shotgun sequencing is a powerful approach for assaying many aspects of microbial communities, including the functional and symbiotic potential of each contributing community member. The research community currently lacks tools that efficiently align DNA reads against protein references, the technique necessary for constructing functional profiles. This thesis details the creation of PALADIN—a novel modification of the Burrows-Wheeler Aligner that provides orders-of-magnitude improved efficiency by directly mapping in protein space. In addition to performance considerations, utilizing PALADIN and associated tools as the foundation of metagenomic pipelines also allows for novel characterization and downstream analysis. </p><p> The accuracy and efficiency of PALADIN were compared against existing applications that employ nucleotide or protein alignment algorithms. Using both simulated and empirically obtained reads, PALADIN consistently outperformed all compared alignment tools across a variety of metrics, mapping reads nearly 8,000 times faster than the widely utilized protein aligner, BLAST. A variety of analysis techniques were demonstrated using this data, including detecting horizontal gene transfer, performing taxonomic grouping, and generating declustered references.</p><p>
24	Using sequence similarity to predict the function of biological sequences. Jones, Craig E. January 2007 (has links) In this thesis we examine issues surrounding the development of software that predicts the function of biological sequences using sequence similarity. There is a pressing need for high throughput software that can annotate protein or DNA sequences with functional information due to the exponential growth in sequence data. In Chapter 1 we briefly introduce the molecular biology and bioinformatics that is assumed knowledge, and the objectives for the research presented here. In Chapter 2 we discuss the development of a method of comparing competing designs for software annotators, using precision and recall metrics, and a benchmark method referred to as Best BLAST. From this we conclude that data-mining approaches may be useful in the development of annotation algorithms, and that any new annotator should demonstrate its effectiveness against other approaches before being adopted. As any new annotator that utilises sequence similarity to predict the function of a sequence will rely on the quality of existing annotations, we examine the error rate of existing sequence annotations in Chapter 3. We develop a new method that allows for the estimation of annotation error rates. This involves adding annotation errors at known rates to a sample of reference sequence annotations that was found to be similar to query sequences. The precision at each error rate treatment is determined, and linear regression then used to find the error rate at estimated values for the maximum precision possible given assumptions concerning the impact of semantic variation on precision. We found that the error rate of curated annotations based on sequence similarity (ISS) is far higher than those that use other forms of evidence (49% versus 13-18%, respectively). As such we conclude that software annotators should avoid basing predictions on ISS annotations where possible. In Chapter 4 we detail the development of GOSLING, Gene Ontology Similarity Listing using Information Graphs, a software annotator with a design based on the principles discovered in previous chapters. Chapter 5 concludes the thesis by discussing the major findings from the research presented. / http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1280882 / Thesis (M.Sc.(M&CS)) -- School of Computer Science, 2007 bioinformatics, computer science Bioinformatics Computer science
25	Algorithms for DNA Sequence Assembly and Motif Search Dinh, Hieu Trung 10 January 2013 Algorithms for DNA Sequence Assembly and Motif Search
26	Analysis and Visualization of Local Phylogenetic Structure within Species Wang, Jeremy R. 03 July 2013 (has links) <p> While it is interesting to examine the evolutionary history and phylogenetic relationship between species, for example, in a sort of "tree of life", there is also a great deal to be learned from examining population structure and relationships within species. A careful description of phylogenetic relationships within species provides insights into causes of phenotypic variation, including disease susceptibility. The better we are able to understand the patterns of genotypic variation within species, the better these populations may be used as models to identify causative variants and possible therapies, for example through targeted genome-wide association studies (GWAS). My thesis describes a model of local phylogenetic structure, how it can be effectively derived under various circumstances, and useful applications and visualizations of this model to aid genetic studies. </p><p> I introduce a method for discovering phylogenetic structure among individuals of a population by partitioning the genome into a minimal set of intervals within which there is no evidence of recombination. I describe two extensions of this basic method. The first allows it to be applied to heterozygous, in addition to homozygous, genotypes and the second makes it more robust to errors in the source genotypes. </p><p> I demonstrate the predictive power of my local phylogeny model using a novel method for genome-wide genotype imputation. This imputation method achieves very high accuracy—on the order of the accuracy rate in the sequencing technology—by imputing genotypes in regions of shared inheritance based on my local phylogenies. </p><p> Comparative genomic analysis within species can be greatly aided by appropriate visualization and analysis tools. I developed a framework for web-based visualization and analysis of multiple individuals within a species, with my model of local phylogeny providing the underlying structure. I will describe the utility of these tools and the applications for which they have found widespread use.</p>
27	Deciphering human gene regulation using computational and statistical methods Guturu, Harendra 23 July 2014 (has links) <p> It is estimated that at least 10-20% of the mammalian genome is dedicated towards regulating the 1-2% of the genome that codes for proteins. This non-coding, regulatory layer is a necessity for the development of complex organisms, but is poorly understood compared to the genetic code used to translate coding DNA into proteins. In this dissertation, I will discuss methods developed to better understand the gene regulatory layer. I begin, in Chapter 1, with a broad overview of gene regulation, motivation for studying it, the state of the art with a historically context and where to look forward.</p><p> In Chapter 2, I discuss a computational method developed to detect transcription factor (TF) complexes. The method compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid transcription factor (TF) complexes. Structural data were integrated to explore overlapping motif arrangements while ensuring physical plausibility of the TF complex. Using this approach, I predicted 422 physically realistic TF complex motifs at 18% false discovery rate (FDR). I found that the set of complexes is enriched in known TF complexes. Additionally, novel complexes were supported by chromatin immunoprecipitation sequencing (ChIP-seq) datasets. Analysis of the structural modeling revealed three cooperativity mechanisms and a tendency of TF pairs to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. The TF complexes and associated binding site predictions are made available as a web resource at http://complex.stanford.edu.</p><p> Next, in Chapter 3, I discuss how gene enrichment analysis can be applied to genome-wide conserved binding sites to successfully infer regulatory functions for a given TF complex. A genomic screen predicted 732,568 combinatorial binding sites for 422 TF complex motifs. From these predictions, I inferred 2,440 functional roles, which are consistent with known functional roles of TF complexes. In these functional associations, I found interesting themes such as promiscuous partnering of TFs (such as ETS) in the same functional context (T cells). Additionally, functional enrichment identified two novel TF complex motifs associated with spinal cord patterning genes and mammary gland development genes, respectively. Based on these predictions, I discovered novel spinal cord patterning enhancers (5/9, 56% validation rate) and enhancers active in MCF7 cells (11/19, 53% validation rate). This set replete with thousands of additional predictions will serve as a powerful guide for future studies of regulatory patterns and their functional roles.</p><p> Then, in Chapter 4, I outline a method developed to predict disease susceptibility due to gene mis-regulation. The method interrogates ensembles of conserved binding sites of regulatory factors disrupted by an individual's variants and then looks for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is reflective of their very different medical histories. These results suggest that erosion of gene regulation results in function specific mutation loads that manifest as disease predispositions in a familial lineage. Additionally, this aggregate analysis method addresses the problem that although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing loci.</p><p> Finally, I conclude in Chapter 5 with a summary of my findings throughout my research and future directions of research based on my findings.</p>
28	NEPIC, a Semi-Automated Tool with a Robust and Extensible Framework that Identifies and Tracks Fluorescent Image Features Parmidge, Amelia J. 19 June 2014 (has links) <p> As fluorescent imaging techniques for biological systems have advanced in recent years, scientists have used fluorescent imaging more and more to capture the state of biological systems at different moments in time. For many researchers, analysis of the fluorescent image data has become the limiting factor of this new technique. Although identification of fluorescing neurons in an image is (seemingly) easily done by the human visual system, manual delineation of the exact pixels comprising these fluorescing regions of interest (or fROIs) in digital images does not scale up well, being time-consuming, reiterative, and error-prone. This thesis introduces NEPIC, the Neuron-to- Environment Pixel Intensity Calculator, which seeks to help resolve this issue. NEPIC is a semi-automated tool for finding and tracking the cell body of a single neuron over an entire movie of grayscale calcium image data. NEPIC also provides a highly extensible, open source framework that could easily support finding and tracking other kinds of fROIs. When tested on calcium image movies of the AWC neuron in <i>C. elegans</i> under highly variant conditions, NEPIC correctly identified the neuronal cell body in 95.48% of the movie frames, and successfully tracked this cell body feature across 98.60% of the frame transitions in the movies. Although support for finding and tracking multiple fROIs has yet to be implemented, NEPIC displays promise as a tool for assisting researchers in the bulk analysis of fluorescent imaging data.</p>
29	Customizing scoring functions in molecular docking Pham, Tuan Anh. January 2007 (has links) Thesis (Ph. D.)--University of California, San Francisco, 2007. / Source: Dissertation Abstracts International, Volume: 68-11, Section: B, page: 7047. Adviser: Ajay N. Jain.

Page generated in 0.1106 seconds