Spelling suggestions: "subject:"biology - bioinformatics"" "subject:"biology - ioinformatics""
231 |
On the evolution of effector gene families in potato cyst nematodesLaetsch, Dominik Robert January 2018 (has links)
Potato cyst nematodes (PCN) are economically relevant plant parasites that infect potato crops. The genomes of three PCN species are available and genome data have been generated for several populations of PCN, to address questions related to the molecular basis of plant parasitism. In this thesis, I employ approaches of comparative genomics to highlight differences and similarities between PCNs and other nematode species. I present two new software solutions to address challenges associated with the field of comparative genomics: BlobTools, a taxonomic interrogation toolkit for quality control of genome assemblies, and KinFin, a solution for the analysis of protein orthology data. I apply both software solutions to genomic datasets of nematodes, platyhelminths, and tardigrades. Based on KinFin analysis of plant parasitic nematodes, I identify protein families in PCNs likely to be involved in host-parasitic interaction, termed effectors, and discuss their functions. I highlight examples of horizontal gene transfer from bacteria to plant parasitic nematodes. Through genomic data of European and South American populations of PCNs, I address variation in populations, infer phylogenetic relationships, and try to estimate the effect of selection on effector genes identified through KinFin. Furthermore, I estimate the rate of variation across the reference genomes of two PCNs.
|
232 |
RNA Backbone Rotamers and ChiropraxisMurray, Laura Weston 25 July 2007 (has links)
RNA backbone is biologically important with many roles in reactions and interactions, but has historically been a challenge in structural determination. It has many atoms and torsions to place, and often there is less data on it than one might wish. This problem leads to both random and systematic error, producing noise in an already high-dimensional and complex distribution to further complicate data-driven analysis. With the advent of the ribosomal subunit structures published in 2000, large RNA structures at good resolution, it became possible to apply the Richardson laboratory's quality-filtering, visualization, and analysis techniques to RNA and develop new tools for RNA as well. A first set of 42 RNA backbone rotamers was identified, developed, and published in 2003; it has since been thoroughly overhauled in conjunction with the backbone group of the RNA Ontology Consortium to combine the strengths of different approaches, incorporate new data, and produce a consensus set of 46 conformers. Meanwhile, extensive work has taken place on developing validation and remodeling tools to correct and improve existing structures as well as to assist in initial fitting. The use of base-phosphate perpendicular distances to identify sugar pucker has proven very useful in both hand-refitting and the semi-automated process of using RNABC (RNA Backbone Correction), a program developed in conjunction with Dr. Jack Snoeyink's laboratory. The guanine riboswitch structure ur0039/1U8D, by Dr. Rob Batey's laboratory, has been collaboratively refit and rerefined as a successful test case of the utility of these tools and techniques. Their testing and development will continue, and they are expected to help to improve RNA structure determination in both ease and quality. / Dissertation
|
233 |
Mechanistic and Genetic Biases in Human Immunoglobulin Heavy Chain DevelopmentVolpe, Joseph M 23 April 2008 (has links)
Broadly neutralizing antibodies against HIV are rare; most patients never develop them at detectable levels. The discovery of four such antibodies therefore warrants research into their origins and their presumed unique characteristics. Such studies, however, require baseline knowledge about commonalities and biases affecting human immunoglobulin development. Obtaining that knowledge requires large sets of gene sequence data and the appropriate statistical techniques and tools.
The Genbank repository provides a free and easily accessible source for such data. Several large datasets cumulatively comprising over 10,000 human Ig heavy chain genes were identified, downloaded, and carefully filtered. We then developed a special software tool called SoDA, which employs a unique dynamic programming algorithm to provide a statistical reconstruction of the events that led to a given antigen receptor gene. Once developed, tested, and peer-reviewed, we used SoDA to provide initial data about each downloaded gene with respect to gene segment usage, n-nucleotide addition, CDR3 length, and mutation frequency, thereby establishing the most precise estimates currently available for human Ig heavy chain gene segment usage frequencies.
We compared data from productive non-autoreactive Ig to non-productive Ig and found evidence for gene segment usage biases, D/J segment pairing preferences resulting from multiple sequential D-to-J recombination events, and biases in TdT action between the V-D and D-J. Further analysis of autoreactive Ig genes yielded evidence that n-nucleotide addition comes at a cost: the higher the ratio of n-nucleotides to germline-encoded nucleotides for a given CDR3 length, the greater the probability of autoreactivity. These results suggest that the germline gene segments have been selected for lack of autoreactivity.
It has previously been shown that human Ig gene segments have evolved efficient evolvability under somatic hypermutation. We have now extended these results, showing that Ig gene sequences are "tuned" to preferentially produce consequential mutations in the antigen-binding domains, and synonymous mutations in the framework regions.
Together, these analyses provide new insights into the genetic and mechanistic biases shaping the human Ig repertoire. / Dissertation
|
234 |
Computer Aided Detection of Masses in Breast Tomosynthesis Imaging Using Information Theory PrinciplesSingh, Swatee 18 September 2008 (has links)
<p>Breast cancer screening is currently performed by mammography, which is limited by overlying anatomy and dense breast tissue. Computer aided detection (CADe) systems can serve as a double reader to improve radiologist performance. Tomosynthesis is a limited-angle cone-beam x-ray imaging modality that is currently being investigated to overcome mammography's limitations. CADe systems will play a crucial role to enhance workflow and performance for breast tomosynthesis.</p><p>The purpose of this work was to develop unique CADe algorithms for breast tomosynthesis reconstructed volumes. Unlike traditional CADe algorithms which rely on segmentation followed by feature extraction, selection and merging, this dissertation instead adopts information theory principles which are more robust. Information theory relies entirely on the statistical properties of an image and makes no assumptions about underlying distributions and is thus advantageous for smaller datasets such those currently used for all tomosynthesis CADe studies.</p><p>The proposed algorithm has two 2 stages (1) initial candidate generation of suspicious locations (2) false positive reduction. Images were accrued from 250 human subjects. In the first stage, initial suspicious locations were first isolated in the 25 projection images per subject acquired by the tomosynthesis system. Only these suspicious locations were reconstructed to yield 3D Volumes of Interest (VOI). For the second stage of the algorithm false positive reduction was then done in three ways: (1) using only the central slice of the VOI containing the largest cross-section of the mass, (2) using the entire volume, and (3) making decisions on a per slice basis and then combining those decisions using either a linear discriminant or decision fusion. A 92% sensitivity was achieved by all three approaches with 4.4 FPs / volume for approach 1, 3.9 for the second approach and 2.5 for the slice-by-slice based algorithm using decision fusion.</p><p>We have therefore developed a novel CADe algorithm for breast tomosynthesis. The techniques uses an information theory approach to achieve very high sensitivity for cancer detection while effectively minimizing false positives.</p> / Dissertation
|
235 |
Regulation of Global Transcription Dynamics During Cell Division and Root DevelopmentOrlando, David Anthony January 2009 (has links)
<p>The successful completion of many critical biological processes depends on the proper execution of complex spatial and temporal gene expression programs. With the advent of high-throughput microarray technology, it is now possible to measure the dynamics of these expression programs on a genome-wide level. In this thesis we present work focused on utilizing this technology, in combination with novel computational techniques, to examine the role of transcriptional regulatory mechanisms in controlling the complex gene expression programs underlying two fundamental biological processes---the cell cycle and the development and differentiation of an organ.</p><p>We generate a dataset describing the genomic expression program which occurs during the cell division cycle of <italic>Saccharomyces cerevisiae</italic>. By concurrently measuring the dynamics in both wild-type and mutant cells that do not express either S-phase or mitotic cyclins we quantify the relative contributions of cyclin-CDK complexes and transcriptional regulatory networks in the regulation the cell cell expression program. We show that CDKs are not the sole regulators of periodic transcription as contrary to previously accepted models; and we hypothesize an oscillating transcriptional regulatory network which could work independent of, or in tandem with, the CDK oscillator to control the cell cell expression program.</p><p>To understand the acquisition of cellular identity, we generate a nearly complete gene expression map of the <italic>Arabidopsis Thaliana</italic> root at the resolution of individual cell-types and developmental stages. An analysis of this data reveals a representative set of dominant expression patterns which are used to begin defining the spatiotemporal transcriptional programs that control development within the root.</p><p>Additionally, we develop computational tools that improve the interpretability and power of these data. We present CLOCCS, a model for the dynamics of population synchrony loss in time-series experiments. We demonstrate the utility of CLOCCS in integrating disparate datasets and present a CLOCCS based deconvolution of the cell-cycle expression data. A deconvolution method is also developed for the <italic>Arabidopsis</italic> dataset, increasing its resolution to cell-type/section subregion specificity. Finally, a method for identifying biological processes occurring on multiple timescales is presented and applied to both datasets.</p><p>It is through the combination of these new genome-wide expression studies and computational tools that we begin to elucidate the transcriptional regulatory mechanisms controlling fundamental biological processes.</p> / Dissertation
|
236 |
Automated Microscopy and High Throughput Image Analysis in Arabidopsis and DrosophilaMace, Daniel L. January 2009 (has links)
<p>Development of a single cell into an adult organism is accomplished through an elaborate and complex cascade of spatiotemporal gene expression. While methods exist for capturing spatiotemporal expression patterns---in situ hybridization, reporter constructs, fluorescent tags---these methods have been highly laborious, and results are frequently assessed by subjective qualitative comparisons. To address these issues, methods must be developed for automating the capture of images, as well as for the normalization and quantification of the resulting data. In this thesis, I design computational approaches for high throughput image analysis which can be grouped into three main areas. First, I develop methods for the capture of high resolution images from high throughput platforms. In addition to the informatics aspect of this problem, I also devise a novel multiscale probabilistic model that allows us to identify and segment objects in an automated fashion. Second, high resolution images must be registered and normalized to a common frame of reference for cross image comparisons. To address these issues, I implement approaches for image registration using statistical shape models and non-rigid registration. Lastly, I validate the spatial expression data obtained from microscopy images to other known spatial expression methods, and develop methods for comparing and calculating the significance between spatial expression patterns. I demonstrate these methods on two model developmental organisms: Arabidopsis and Drosophila.</p> / Dissertation
|
237 |
Analysis and Error Correction in Structures of Macromolecular Interiors and InterfacesHeadd, Jeffrey John January 2009 (has links)
<p>As of late 2009, the Protein Data Bank (PDB) has grown to contain over 70,000 models. This recent increase in the amount of structural data allows for more extensive explication of the governing principles of macromolecular folding and association to complement traditional studies focused on a single molecule or complex. PDB-wide characterization of structural features yields insights that are useful in prediction and validation of the 3D structure of macromolecules and their complexes. Here, these insights lead to a deeper understanding of protein--protein interfaces, full-atom critical assessment of increasingly more accurate structure predictions, a better defined library of RNA backbone conformers for validation and building 3D models, and knowledge-based automatic correction of errors in protein sidechain rotamers. </p><p>My study of protein--protein interfaces identifies amino acid pairing preferences in a set of 146 transient interfaces. Using a geometric interface surface definition devoid of arbitrary cutoffs common to previous studies of interface composition, I calculate inter- and intrachain amino acid pairing preferences. As expected, salt-bridges and hydrophobic patches are prevalent, but likelihood correction of observed pairing frequencies reveals some surprising pairing preferences, such as Cys-His interchain pairs and Met-Met intrachain pairs. To complement my statistical observations, I introduce a 2D visualization of the 3D interface surface that can display a variety of interface characteristics, including residue type, atomic distance and backbone/sidechain composition. </p><p>My study of protein interiors finds that 3D structure prediction from sequence (as part of the CASP experiment) is very close to full-atom accuracy. Validation of structure prediction should therefore consider all atom positions instead of the traditional Calpha-only evaluation. I introduce six new full-model quality criteria to assess the accuracy of CASP predictions, which demonstrate that groups who use structural knowledge culled from the PDB to inform their prediction protocols produce the most accurate results. </p><p>My study of RNA backbone introduces a set of rotamer-like "suite" conformers. Initially hand-identified by the Richardson laboratory, these 7D conformers represent backbone segments that are found to be genuine and favorable. X-ray crystallographers can use backbone conformers for model building in often poor backbone density and in validation after refinement. Increasing amounts of high quality RNA data allow for improved conformer identification, but also complicate hand-curation. I demonstrate that affinity propagation successfully differentiates between two related but distinct suite conformers, and is a useful tool for automated conformer clustering. </p><p>My study of protein sidechain rotamers in X-ray structures identifies a class of systematic errors that results in sidechains misfit by approximately 180 degrees. I introduce Autofix, a method for automated detection and correction of such errors. Autofix corrects over 40% of errors for Leu, Thr, and Val residues, and a significant number of Arg residues. On average, Autofix made four corrections per PDB file in 945 X-ray structures. Autofix will be implemented into MolProbity and PHENIX for easy integration into X-ray crystallography workflows.</p> / Dissertation
|
238 |
Computational Methods to Study Diversification in Pathogens, and Invertebrate and Vertebrate Immune SystemsMunshaw, Supriya Shaunak January 2010 (has links)
<p>Pathogens and host immune systems use strikingly similar methods of diversification. Mechanisms such as point mutations and recombination help pathogens escape the host immune system and similar mechanisms help the host immune system attack rapidly evolving pathogens. Understanding the interplay between pathogen and immune system evolution is crucial to effective drug and vaccine development. In this thesis we employ various computational methods to study diversification in a pathogen, an invertebrate and a vertebrate immune system.</p>
<p>First, we develop a technique for phylogenetic inference in the presence of recombination based on the principle of minimum description length, which assigns a cost-the description length-to each network topology given the observed sequence data. We show that the method performs well on simulated data and demonstrate its application on HIV <italic>env</italic> gene sequence data from 8 human subjects.</p>
<p>Next, we demonstrate via phylogenetic analysis that the evolution of repeats in an immune-related gene family in <italic>Strongylocentrotus purpuratus</italic> is the result of recombination and duplication and/or deletion. These results support the evidence suggesting that invertebrate immune systems are highly complex and may employ similar mechanisms for diversification as higher vertebrates.</p>
<p>Third, we develop a probabilistic model of the immunoglobulin (Ig) rearrangement process and a Bayesian method for estimating posterior probabilities for the comparison of multiple plausible rearrangements. We validate the software using various datasets and in all tests, SoDA2 performed better than other available software.</p>
<p>Finally, we characterize the somatic population genetics of the nucleotide sequences of >1000 recombinant Ig pairs derived from the blood of 5 acute HIV-1 infected (AHI) subjects. We found that the Ig genes from the 20 day AHI PC showed extraordinary clonal relatedness among themselves; a single clone comprised of 52 members, with observed and inferred precursor antibodies specific for HIV-1 Env gp41. Antibodies from AHI patients show a decreased CDR3H length and an increased mutation frequency when compared to influenza vaccinated individuals. The high mutation frequency is coupled with a comparatively low synonymous to non-synonymous mutation ratio in the heavy chain. Our results may suggest presence of positive antigenic selection in previously triggered non-HIV-1 memory B cells in AHI.</p>
<p>Taken together, the studies presented in this thesis provide methods to study diversification in pathogens, and invertebrate and vertebrate immune systems.</p> / Dissertation
|
239 |
Studies into Location-specific cis-Regulatory MotifsYokoyama, Ken Daigoro January 2010 (has links)
<p>Gene expression and regulation are major determinants of phenotypic traits displayed across species. Although the DNA sequence elements that control gene expression play a crucial role in determining species morphology, predicting cis-regulatory elements through sequence analysis alone remains a difficult task. A few regulatory elements, such as the TATA-box and Initiator sequence, have been known to exhibit overrepresentation at specific locations within the proximal promoter. However, the extent to which this occurs among cis-regulatory elements is not well understood. Here, we take a genome-wide approach towards detecting such functional sequence elements, using location-specific overrepresentation as a criterion for regulatory function. We provide evidence that a surprisingly large number of regulatory elements exhibit locational overrepresentation with respect to the transcription start site. We then utilize this characteristic to predict novel cis-regulatory elements overrepresented at particular locations within the proximal promoter.</p><p>Transcriptional regulation is most often controlled not by single protein factors acting in isolation, but instead multiple transcription factors acting together within multi-protein complexes. As protein-protein interactions are largely determined through protein structure, we would expect to see patterns of spatial preference between motif-pairs binding interacting factors. However, in the absence of methods to predict such spatial preferences between motifs, comprehensive assessments of such inter-relationships have not been previously conducted. As our model provides a general tool for detecting positional specificities of a motif relative to a given reference point, we expanded our model to measure distance preferences between pairs of motifs on a genome-wide scale. We show that there often exist patterns of spatial dependencies between pairs of sequence elements that bind interacting protein factors. We find that regulatory motifs binding interacting proteins often have multiple inter-motif distances at which they preferentially occur, and we show that the intervals between preferred distances are highly consistent across motif-pairs. This distance preference `phasing' was empirically found to occur at consistent intervals around ~8-10 bp, corresponding to approximately the number of nucleotides within a single turn of the DNA double-helix. This finding suggests a tendency for protein factor-pairs to interact in a specific orientation with respect to the turn of the DNA molecule, and offers a convenient method by which to determine motif-pairs binding interacting transcription factors de novo. </p><p>While little is known about the mechanisms by which individual cis-regulatory elements ultimately control gene expression, even less is known about how such elements evolve over time. A single transcription factor can potentially target hundreds of genes across the genome, and thus modifications in the binding affinities of such proteins must induce conversions at a multitude of functional sites in order to preserve the set of target genes that the trans-factor regulates. It is therefore commonly assumed that such changes occur rarely and at a slow rate over the course of evolution. Despite this widespread assumption, we find that a surprisingly large number of cis-regulatory elements have been subject to significant changes in consensus sequence in a lineage-specific manner. Here, we demonstrate that the genomic landscape is highly adaptable, rapidly adjusting to global changes in preferred regulatory consensus sequences. Focusing upon regulatory elements exhibiting location-specific overrepresentation, we find that a substantial fraction of regulatory elements have been subject to evolutionary modifications, even between closely related eutherians. These findings have broad implications regarding evolving phenotypes observed across species.</p> / Dissertation
|
240 |
Using Genome-wide Approaches to Characterize the Relationship Between Genomic Variation and Disease: A Case Study in Oligodendroglioma and Staphylococcus arueusJohnson, Nicole January 2010 (has links)
<p>Genetic variation is a natural occurrence in the genome that contributes to the phenotypic differences observed between individuals as well as the phenotypic outcomes of various diseases, including infectious disease and cancer. Single nucleotide polymorphisms (SNPs) have been identified as genetic factors influencing host susceptibility to infectious disease while the study of copy number variation (CNV) in various cancers has led to the identification of causal genetic factors influencing tumor formation and severity. In this work, we evaluated the association between genomic variation and disease phenotypes to identify SNPs contributing to host susceptibility in Staphylococcus aureus (<italic>S. aureus</italic>) infection and to characterize a nervous system brain tumor, known as oligodendroglioma (OD), using the CNV observed in tumors with varying degree of malignancy.</p><p>Using SNP data, we utilized a computational approach, known as in silico haplotype mapping (ISHM), to identify genomic regions significantly associated with susceptibility to <italic>S. aureus</italic> infection in inbred mouse strains. We conducted ISHM on four phenotypes collected from <italic>S. aureus</italic> infected mice and identified genes contained in the significant regions, which were considered to be potential candidate genes. Gene expression studies were then conducted on inbred mice considered to be resistant or susceptible to <italic>S. aureus</italic> infection to identify genes differentially expressed between the two groups, which provided biological validation of the genes identified in significant ISHM regions. Genes identified by both analyses were considered our top priority genes and known biological information about the genes was used to determine their function roles in susceptibility to <italic>S. aureus</italic> infection.</p><p> We then evaluated CNV in subtypes of ODs to characterize the tumors by their genomic aberrations. We conducted array-based comparative genomic hybridization (CGH) on 74 ODs to generate genomic profiles that were classified by tumor grade, providing insight about the genomic changes that typically occur in patients with OD ranging from the less to more severe tumor types. Additionally, smaller genomic intervals with substantial copy number differences between normal and OD DNA samples, known as minimal critical regions (MCRs), were identified among the tumors. The genomic regions with copy number changes were interrogated for genes and assessed for their biological roles in the tumors' phenotype and formation. This information was used to assess the validity of using genomic variation in these tumors to further classify these tumors in addition to standard classification techniques. </p><p> The studies described in this project demonstrate the utility of using genetic variation to study disease phenotypes and applying computational and experimental techniques to identify the underlying genetic factors contributing to disease pathogenesis. Moreover, the continued development of similar approaches could aid in the development of new diagnostic procedures as well as novel therapeutics for the generation of more personalized treatments. The genomic regions with copy number changes were interrogated for genes and assessed for their biological roles in the tumors' phenotype and formation. This information was used to assess the validity of using genomic variation in these tumors to further classify these tumors in addition to standard classification techniques.</p><p> The studies described in this project demonstrate the utility of using genetic variation to study disease phenotypes and applying computational and experimental techniques to identify the underlying genetic factors contributing to disease pathogenesis. Moreover, the continued development of similar approaches could aid in the development of new diagnostic procedures as well as novel therapeutics for the generation of more personalized treatments.</p> / Dissertation
|
Page generated in 0.0953 seconds