Global ETD Search

231	RNA Backbone Rotamers and Chiropraxis Murray, Laura Weston 25 July 2007 (has links) RNA backbone is biologically important with many roles in reactions and interactions, but has historically been a challenge in structural determination. It has many atoms and torsions to place, and often there is less data on it than one might wish. This problem leads to both random and systematic error, producing noise in an already high-dimensional and complex distribution to further complicate data-driven analysis. With the advent of the ribosomal subunit structures published in 2000, large RNA structures at good resolution, it became possible to apply the Richardson laboratory's quality-filtering, visualization, and analysis techniques to RNA and develop new tools for RNA as well. A first set of 42 RNA backbone rotamers was identified, developed, and published in 2003; it has since been thoroughly overhauled in conjunction with the backbone group of the RNA Ontology Consortium to combine the strengths of different approaches, incorporate new data, and produce a consensus set of 46 conformers. Meanwhile, extensive work has taken place on developing validation and remodeling tools to correct and improve existing structures as well as to assist in initial fitting. The use of base-phosphate perpendicular distances to identify sugar pucker has proven very useful in both hand-refitting and the semi-automated process of using RNABC (RNA Backbone Correction), a program developed in conjunction with Dr. Jack Snoeyink's laboratory. The guanine riboswitch structure ur0039/1U8D, by Dr. Rob Batey's laboratory, has been collaboratively refit and rerefined as a successful test case of the utility of these tools and techniques. Their testing and development will continue, and they are expected to help to improve RNA structure determination in both ease and quality. / Dissertation Biophysics, General Biology, Bioinformatics Chemistry, Biochemistry RNA RNA backbone rotamers base phosphate perpendiculars conformers RNA structure
232	Mechanistic and Genetic Biases in Human Immunoglobulin Heavy Chain Development Volpe, Joseph M 23 April 2008 (has links) Broadly neutralizing antibodies against HIV are rare; most patients never develop them at detectable levels. The discovery of four such antibodies therefore warrants research into their origins and their presumed unique characteristics. Such studies, however, require baseline knowledge about commonalities and biases affecting human immunoglobulin development. Obtaining that knowledge requires large sets of gene sequence data and the appropriate statistical techniques and tools. The Genbank repository provides a free and easily accessible source for such data. Several large datasets cumulatively comprising over 10,000 human Ig heavy chain genes were identified, downloaded, and carefully filtered. We then developed a special software tool called SoDA, which employs a unique dynamic programming algorithm to provide a statistical reconstruction of the events that led to a given antigen receptor gene. Once developed, tested, and peer-reviewed, we used SoDA to provide initial data about each downloaded gene with respect to gene segment usage, n-nucleotide addition, CDR3 length, and mutation frequency, thereby establishing the most precise estimates currently available for human Ig heavy chain gene segment usage frequencies. We compared data from productive non-autoreactive Ig to non-productive Ig and found evidence for gene segment usage biases, D/J segment pairing preferences resulting from multiple sequential D-to-J recombination events, and biases in TdT action between the V-D and D-J. Further analysis of autoreactive Ig genes yielded evidence that n-nucleotide addition comes at a cost: the higher the ratio of n-nucleotides to germline-encoded nucleotides for a given CDR3 length, the greater the probability of autoreactivity. These results suggest that the germline gene segments have been selected for lack of autoreactivity. It has previously been shown that human Ig gene segments have evolved efficient evolvability under somatic hypermutation. We have now extended these results, showing that Ig gene sequences are "tuned" to preferentially produce consequential mutations in the antigen-binding domains, and synonymous mutations in the framework regions. Together, these analyses provide new insights into the genetic and mechanistic biases shaping the human Ig repertoire. / Dissertation Biology, Bioinformatics Health Sciences, Immunology immunoglobulin heavy chain V(D)J recombination autoreactivity antibodies
233	Computer Aided Detection of Masses in Breast Tomosynthesis Imaging Using Information Theory Principles Singh, Swatee 18 September 2008 (has links) <p>Breast cancer screening is currently performed by mammography, which is limited by overlying anatomy and dense breast tissue. Computer aided detection (CADe) systems can serve as a double reader to improve radiologist performance. Tomosynthesis is a limited-angle cone-beam x-ray imaging modality that is currently being investigated to overcome mammography's limitations. CADe systems will play a crucial role to enhance workflow and performance for breast tomosynthesis.</p><p>The purpose of this work was to develop unique CADe algorithms for breast tomosynthesis reconstructed volumes. Unlike traditional CADe algorithms which rely on segmentation followed by feature extraction, selection and merging, this dissertation instead adopts information theory principles which are more robust. Information theory relies entirely on the statistical properties of an image and makes no assumptions about underlying distributions and is thus advantageous for smaller datasets such those currently used for all tomosynthesis CADe studies.</p><p>The proposed algorithm has two 2 stages (1) initial candidate generation of suspicious locations (2) false positive reduction. Images were accrued from 250 human subjects. In the first stage, initial suspicious locations were first isolated in the 25 projection images per subject acquired by the tomosynthesis system. Only these suspicious locations were reconstructed to yield 3D Volumes of Interest (VOI). For the second stage of the algorithm false positive reduction was then done in three ways: (1) using only the central slice of the VOI containing the largest cross-section of the mass, (2) using the entire volume, and (3) making decisions on a per slice basis and then combining those decisions using either a linear discriminant or decision fusion. A 92% sensitivity was achieved by all three approaches with 4.4 FPs / volume for approach 1, 3.9 for the second approach and 2.5 for the slice-by-slice based algorithm using decision fusion.</p><p>We have therefore developed a novel CADe algorithm for breast tomosynthesis. The techniques uses an information theory approach to achieve very high sensitivity for cancer detection while effectively minimizing false positives.</p> / Dissertation Engineering, Biomedical Biology, Bioinformatics Computer Aided Detection Computer Aided Diagnosis Tomosynthesis Mammography Information Theory Mutual Information
234	Regulation of Global Transcription Dynamics During Cell Division and Root Development Orlando, David Anthony January 2009 (has links) <p>The successful completion of many critical biological processes depends on the proper execution of complex spatial and temporal gene expression programs. With the advent of high-throughput microarray technology, it is now possible to measure the dynamics of these expression programs on a genome-wide level. In this thesis we present work focused on utilizing this technology, in combination with novel computational techniques, to examine the role of transcriptional regulatory mechanisms in controlling the complex gene expression programs underlying two fundamental biological processes---the cell cycle and the development and differentiation of an organ.</p><p>We generate a dataset describing the genomic expression program which occurs during the cell division cycle of <italic>Saccharomyces cerevisiae</italic>. By concurrently measuring the dynamics in both wild-type and mutant cells that do not express either S-phase or mitotic cyclins we quantify the relative contributions of cyclin-CDK complexes and transcriptional regulatory networks in the regulation the cell cell expression program. We show that CDKs are not the sole regulators of periodic transcription as contrary to previously accepted models; and we hypothesize an oscillating transcriptional regulatory network which could work independent of, or in tandem with, the CDK oscillator to control the cell cell expression program.</p><p>To understand the acquisition of cellular identity, we generate a nearly complete gene expression map of the <italic>Arabidopsis Thaliana</italic> root at the resolution of individual cell-types and developmental stages. An analysis of this data reveals a representative set of dominant expression patterns which are used to begin defining the spatiotemporal transcriptional programs that control development within the root.</p><p>Additionally, we develop computational tools that improve the interpretability and power of these data. We present CLOCCS, a model for the dynamics of population synchrony loss in time-series experiments. We demonstrate the utility of CLOCCS in integrating disparate datasets and present a CLOCCS based deconvolution of the cell-cycle expression data. A deconvolution method is also developed for the <italic>Arabidopsis</italic> dataset, increasing its resolution to cell-type/section subregion specificity. Finally, a method for identifying biological processes occurring on multiple timescales is presented and applied to both datasets.</p><p>It is through the combination of these new genome-wide expression studies and computational tools that we begin to elucidate the transcriptional regulatory mechanisms controlling fundamental biological processes.</p> / Dissertation Biology, Bioinformatics Biology, Genetics arabidopsis cell cycle computational biology gene expression regulatory networks yeast
235	Automated Microscopy and High Throughput Image Analysis in Arabidopsis and Drosophila Mace, Daniel L. January 2009 (has links) <p>Development of a single cell into an adult organism is accomplished through an elaborate and complex cascade of spatiotemporal gene expression. While methods exist for capturing spatiotemporal expression patterns---in situ hybridization, reporter constructs, fluorescent tags---these methods have been highly laborious, and results are frequently assessed by subjective qualitative comparisons. To address these issues, methods must be developed for automating the capture of images, as well as for the normalization and quantification of the resulting data. In this thesis, I design computational approaches for high throughput image analysis which can be grouped into three main areas. First, I develop methods for the capture of high resolution images from high throughput platforms. In addition to the informatics aspect of this problem, I also devise a novel multiscale probabilistic model that allows us to identify and segment objects in an automated fashion. Second, high resolution images must be registered and normalized to a common frame of reference for cross image comparisons. To address these issues, I implement approaches for image registration using statistical shape models and non-rigid registration. Lastly, I validate the spatial expression data obtained from microscopy images to other known spatial expression methods, and develop methods for comparing and calculating the significance between spatial expression patterns. I demonstrate these methods on two model developmental organisms: Arabidopsis and Drosophila.</p> / Dissertation Biology, Bioinformatics Computer Science Statistics arabidopsis developmental drosophila imaging microscopy probabilistic
236	Analysis and Error Correction in Structures of Macromolecular Interiors and Interfaces Headd, Jeffrey John January 2009 (has links) <p>As of late 2009, the Protein Data Bank (PDB) has grown to contain over 70,000 models. This recent increase in the amount of structural data allows for more extensive explication of the governing principles of macromolecular folding and association to complement traditional studies focused on a single molecule or complex. PDB-wide characterization of structural features yields insights that are useful in prediction and validation of the 3D structure of macromolecules and their complexes. Here, these insights lead to a deeper understanding of protein--protein interfaces, full-atom critical assessment of increasingly more accurate structure predictions, a better defined library of RNA backbone conformers for validation and building 3D models, and knowledge-based automatic correction of errors in protein sidechain rotamers. </p><p>My study of protein--protein interfaces identifies amino acid pairing preferences in a set of 146 transient interfaces. Using a geometric interface surface definition devoid of arbitrary cutoffs common to previous studies of interface composition, I calculate inter- and intrachain amino acid pairing preferences. As expected, salt-bridges and hydrophobic patches are prevalent, but likelihood correction of observed pairing frequencies reveals some surprising pairing preferences, such as Cys-His interchain pairs and Met-Met intrachain pairs. To complement my statistical observations, I introduce a 2D visualization of the 3D interface surface that can display a variety of interface characteristics, including residue type, atomic distance and backbone/sidechain composition. </p><p>My study of protein interiors finds that 3D structure prediction from sequence (as part of the CASP experiment) is very close to full-atom accuracy. Validation of structure prediction should therefore consider all atom positions instead of the traditional Calpha-only evaluation. I introduce six new full-model quality criteria to assess the accuracy of CASP predictions, which demonstrate that groups who use structural knowledge culled from the PDB to inform their prediction protocols produce the most accurate results. </p><p>My study of RNA backbone introduces a set of rotamer-like "suite" conformers. Initially hand-identified by the Richardson laboratory, these 7D conformers represent backbone segments that are found to be genuine and favorable. X-ray crystallographers can use backbone conformers for model building in often poor backbone density and in validation after refinement. Increasing amounts of high quality RNA data allow for improved conformer identification, but also complicate hand-curation. I demonstrate that affinity propagation successfully differentiates between two related but distinct suite conformers, and is a useful tool for automated conformer clustering. </p><p>My study of protein sidechain rotamers in X-ray structures identifies a class of systematic errors that results in sidechains misfit by approximately 180 degrees. I introduce Autofix, a method for automated detection and correction of such errors. Autofix corrects over 40% of errors for Leu, Thr, and Val residues, and a significant number of Arg residues. On average, Autofix made four corrections per PDB file in 945 X-ray structures. Autofix will be implemented into MolProbity and PHENIX for easy integration into X-ray crystallography workflows.</p> / Dissertation Biology, Bioinformatics Chemistry, Biochemistry automation prediction protein folding protein protein interactions RNA structural bioinformatics
237	Computational Methods to Study Diversification in Pathogens, and Invertebrate and Vertebrate Immune Systems Munshaw, Supriya Shaunak January 2010 (has links) <p>Pathogens and host immune systems use strikingly similar methods of diversification. Mechanisms such as point mutations and recombination help pathogens escape the host immune system and similar mechanisms help the host immune system attack rapidly evolving pathogens. Understanding the interplay between pathogen and immune system evolution is crucial to effective drug and vaccine development. In this thesis we employ various computational methods to study diversification in a pathogen, an invertebrate and a vertebrate immune system.</p> <p>First, we develop a technique for phylogenetic inference in the presence of recombination based on the principle of minimum description length, which assigns a cost-the description length-to each network topology given the observed sequence data. We show that the method performs well on simulated data and demonstrate its application on HIV <italic>env</italic> gene sequence data from 8 human subjects.</p> <p>Next, we demonstrate via phylogenetic analysis that the evolution of repeats in an immune-related gene family in <italic>Strongylocentrotus purpuratus</italic> is the result of recombination and duplication and/or deletion. These results support the evidence suggesting that invertebrate immune systems are highly complex and may employ similar mechanisms for diversification as higher vertebrates.</p> <p>Third, we develop a probabilistic model of the immunoglobulin (Ig) rearrangement process and a Bayesian method for estimating posterior probabilities for the comparison of multiple plausible rearrangements. We validate the software using various datasets and in all tests, SoDA2 performed better than other available software.</p> <p>Finally, we characterize the somatic population genetics of the nucleotide sequences of >1000 recombinant Ig pairs derived from the blood of 5 acute HIV-1 infected (AHI) subjects. We found that the Ig genes from the 20 day AHI PC showed extraordinary clonal relatedness among themselves; a single clone comprised of 52 members, with observed and inferred precursor antibodies specific for HIV-1 Env gp41. Antibodies from AHI patients show a decreased CDR3H length and an increased mutation frequency when compared to influenza vaccinated individuals. The high mutation frequency is coupled with a comparatively low synonymous to non-synonymous mutation ratio in the heavy chain. Our results may suggest presence of positive antigenic selection in previously triggered non-HIV-1 memory B cells in AHI.</p> <p>Taken together, the studies presented in this thesis provide methods to study diversification in pathogens, and invertebrate and vertebrate immune systems.</p> / Dissertation Biology, Bioinformatics Antibody response to HIV purple sea urchin Recombination in HIV
238	Studies into Location-specific cis-Regulatory Motifs Yokoyama, Ken Daigoro January 2010 (has links) <p>Gene expression and regulation are major determinants of phenotypic traits displayed across species. Although the DNA sequence elements that control gene expression play a crucial role in determining species morphology, predicting cis-regulatory elements through sequence analysis alone remains a difficult task. A few regulatory elements, such as the TATA-box and Initiator sequence, have been known to exhibit overrepresentation at specific locations within the proximal promoter. However, the extent to which this occurs among cis-regulatory elements is not well understood. Here, we take a genome-wide approach towards detecting such functional sequence elements, using location-specific overrepresentation as a criterion for regulatory function. We provide evidence that a surprisingly large number of regulatory elements exhibit locational overrepresentation with respect to the transcription start site. We then utilize this characteristic to predict novel cis-regulatory elements overrepresented at particular locations within the proximal promoter.</p><p>Transcriptional regulation is most often controlled not by single protein factors acting in isolation, but instead multiple transcription factors acting together within multi-protein complexes. As protein-protein interactions are largely determined through protein structure, we would expect to see patterns of spatial preference between motif-pairs binding interacting factors. However, in the absence of methods to predict such spatial preferences between motifs, comprehensive assessments of such inter-relationships have not been previously conducted. As our model provides a general tool for detecting positional specificities of a motif relative to a given reference point, we expanded our model to measure distance preferences between pairs of motifs on a genome-wide scale. We show that there often exist patterns of spatial dependencies between pairs of sequence elements that bind interacting protein factors. We find that regulatory motifs binding interacting proteins often have multiple inter-motif distances at which they preferentially occur, and we show that the intervals between preferred distances are highly consistent across motif-pairs. This distance preference `phasing' was empirically found to occur at consistent intervals around ~8-10 bp, corresponding to approximately the number of nucleotides within a single turn of the DNA double-helix. This finding suggests a tendency for protein factor-pairs to interact in a specific orientation with respect to the turn of the DNA molecule, and offers a convenient method by which to determine motif-pairs binding interacting transcription factors de novo. </p><p>While little is known about the mechanisms by which individual cis-regulatory elements ultimately control gene expression, even less is known about how such elements evolve over time. A single transcription factor can potentially target hundreds of genes across the genome, and thus modifications in the binding affinities of such proteins must induce conversions at a multitude of functional sites in order to preserve the set of target genes that the trans-factor regulates. It is therefore commonly assumed that such changes occur rarely and at a slow rate over the course of evolution. Despite this widespread assumption, we find that a surprisingly large number of cis-regulatory elements have been subject to significant changes in consensus sequence in a lineage-specific manner. Here, we demonstrate that the genomic landscape is highly adaptable, rapidly adjusting to global changes in preferred regulatory consensus sequences. Focusing upon regulatory elements exhibiting location-specific overrepresentation, we find that a substantial fraction of regulatory elements have been subject to evolutionary modifications, even between closely related eutherians. These findings have broad implications regarding evolving phenotypes observed across species.</p> / Dissertation Biology, Bioinformatics Biology, Molecular Biology, Genetics Evolution Expression Gene regulation Regulatory element
239	Using Genome-wide Approaches to Characterize the Relationship Between Genomic Variation and Disease: A Case Study in Oligodendroglioma and Staphylococcus arueus Johnson, Nicole January 2010 (has links) <p>Genetic variation is a natural occurrence in the genome that contributes to the phenotypic differences observed between individuals as well as the phenotypic outcomes of various diseases, including infectious disease and cancer. Single nucleotide polymorphisms (SNPs) have been identified as genetic factors influencing host susceptibility to infectious disease while the study of copy number variation (CNV) in various cancers has led to the identification of causal genetic factors influencing tumor formation and severity. In this work, we evaluated the association between genomic variation and disease phenotypes to identify SNPs contributing to host susceptibility in Staphylococcus aureus (<italic>S. aureus</italic>) infection and to characterize a nervous system brain tumor, known as oligodendroglioma (OD), using the CNV observed in tumors with varying degree of malignancy.</p><p>Using SNP data, we utilized a computational approach, known as in silico haplotype mapping (ISHM), to identify genomic regions significantly associated with susceptibility to <italic>S. aureus</italic> infection in inbred mouse strains. We conducted ISHM on four phenotypes collected from <italic>S. aureus</italic> infected mice and identified genes contained in the significant regions, which were considered to be potential candidate genes. Gene expression studies were then conducted on inbred mice considered to be resistant or susceptible to <italic>S. aureus</italic> infection to identify genes differentially expressed between the two groups, which provided biological validation of the genes identified in significant ISHM regions. Genes identified by both analyses were considered our top priority genes and known biological information about the genes was used to determine their function roles in susceptibility to <italic>S. aureus</italic> infection.</p><p> We then evaluated CNV in subtypes of ODs to characterize the tumors by their genomic aberrations. We conducted array-based comparative genomic hybridization (CGH) on 74 ODs to generate genomic profiles that were classified by tumor grade, providing insight about the genomic changes that typically occur in patients with OD ranging from the less to more severe tumor types. Additionally, smaller genomic intervals with substantial copy number differences between normal and OD DNA samples, known as minimal critical regions (MCRs), were identified among the tumors. The genomic regions with copy number changes were interrogated for genes and assessed for their biological roles in the tumors' phenotype and formation. This information was used to assess the validity of using genomic variation in these tumors to further classify these tumors in addition to standard classification techniques. </p><p> The studies described in this project demonstrate the utility of using genetic variation to study disease phenotypes and applying computational and experimental techniques to identify the underlying genetic factors contributing to disease pathogenesis. Moreover, the continued development of similar approaches could aid in the development of new diagnostic procedures as well as novel therapeutics for the generation of more personalized treatments. The genomic regions with copy number changes were interrogated for genes and assessed for their biological roles in the tumors' phenotype and formation. This information was used to assess the validity of using genomic variation in these tumors to further classify these tumors in addition to standard classification techniques.</p><p> The studies described in this project demonstrate the utility of using genetic variation to study disease phenotypes and applying computational and experimental techniques to identify the underlying genetic factors contributing to disease pathogenesis. Moreover, the continued development of similar approaches could aid in the development of new diagnostic procedures as well as novel therapeutics for the generation of more personalized treatments.</p> / Dissertation Biology, Bioinformatics Health Sciences, Immunology Bioinformatics comparative genomic hybridization in silico haplotype mapping Oligodendroglioma Staphylococcus aureus
240	Représentation et recherche de motifs cycliques et structuraux d’ARN connus dans les structures secondaires Louis-Jeune, Caroline 04 1900 (has links) L'acide désoxyribonucléique (ADN) et l'acide ribonucléique (ARN) sont des polymères de nucléotides essentiels à la cellule. À l'inverse de l'ADN qui sert principalement à stocker l'information génétique, les ARN sont impliqués dans plusieurs processus métaboliques. Par exemple, ils transmettent l’information génétique codée dans l’ADN. Ils sont essentiels pour la maturation des autres ARN, la régulation de l’expression génétique, la prévention de la dégradation des chromosomes et le ciblage des protéines dans la cellule. La polyvalence fonctionnelle de l'ARN résulte de sa plus grande diversité structurale. Notre laboratoire a développé MC-Fold, un algorithme pour prédire la structure des ARN qu'on représente avec des graphes d'interactions inter-nucléotidiques. Les sommets de ces graphes représentent les nucléotides et les arêtes leurs interactions. Notre laboratoire a aussi observé qu'un petit ensemble de cycles d'interactions à lui seul définit la structure de n'importe quel motif d'ARN. La formation de ces cycles dépend de la séquence de nucléotides et MC-Fold détermine les cycles les plus probables étant donnée cette séquence. Mon projet de maîtrise a été, dans un premier temps, de définir une base de données des motifs structuraux et fonctionnels d'ARN, bdMotifs, en terme de ces cycles. Par la suite, j’ai implanté un algorithme, MC-Motifs, qui recherche ces motifs dans des graphes d'interactions et, entre autres, ceux générés par MC-Fold. Finalement, j’ai validé mon algorithme sur des ARN dont la structure est connue, tels que les ARN ribosomaux (ARNr) 5S, 16S et 23S, et l'ARN utilisé pour prédire la structure des riborégulateurs. Le mémoire est divisé en cinq chapitres. Le premier chapitre présente la structure chimique, les fonctions cellulaires de l'ARN et le repliement structural du polymère. Dans le deuxième chapitre, je décris la base de données bdMotifs. Dans le troisième chapitre, l’algorithme de recherche MC-Motifs est introduit. Le quatrième chapitre présente les résultats de la validation et des prédictions. Finalement, le dernier chapitre porte sur la discussion des résultats suivis d’une conclusion sur le travail. / Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) are polymers of nucleotides essential for the survival of the cell. Contrary to DNA, whose main role is to store genetic information, RNA is involved in multiple metabolic processes. For example, RNA is involved in the transfer of information from DNA to protein, the processing and modification of other RNAs, the regulation of gene expression, the end-maintenance of chromosomes, and the sorting of proteins within the cell. This functional versatility of RNA comes from its structural diversity. Our laboratory developed MC-Fold, an algorithm that predicts RNA structures by representing them with nucleotide interaction graphs. The nodes in these graphs represent the nucleotides, and the edges the interactions between them. Our laboratory also observed that a limited number of interaction cycles can define the structure of any RNA motif. The formation of these cycles is determined by the nucleotide sequence and MC-Fold determines the most likely cycles based on that sequence. In this Master Degree project, I first built a database of structural and functional RNA motifs, bdMotifs, based on their constituent cycles. Then, I implemented an algorithm, MC-Motifs, which detects motifs within interaction graphs generated either by MC-Fold or by any other method. Finally, I validated my algorithm on known RNA structures such as the 5S, 16S and 23S ribosomal RNA (rRNA) and predicted structure of riboswitches. The Master thesis is divided into five chapters. The first chapter presents the chemical structure of RNA, its cellular functions and the structural folding of the polymer. In the second chapter, the database bdMotifs is described. In the third chapter, the MC-Motifs algorithm is introduced. In the fourth chapter, I present the results of MC-Motifs. Finally, in the last chapter, I discuss theses results and I give a conclusion on the project. ARN Structure secondaire Motif Cycle RNA Secondary structure

Search results