461 |
Influência das abordagens metodológicas na reconstrução filogenética de Ceriantharia (Cnidaria, Anthozoa) /Costa, Lucas Bassi. January 2018 (has links)
Orientador: Sérgio Nascimento Stampar / Banca: Carlos Camargo Alberts / Banca: Julia Silva Beneti / Resumo: O filo Cnidaria pode ser considerado um dos mais distintos do reino animal. Dentre suas subclasses, encontra-se Ceriantharia, constituída pelas anêmonas de tubo. Esses animais são até o momento um desafio para a sistemática, uma vez que após diversas propostas, nenhuma se fez unânime até o momento. Grande parte da divergência encontrada em classificar o grupo, deve-se aos métodos/materiais utilizados para análise. Sendo assim, o presente trabalho buscou, através de dados moleculares, estudar as relações de Ceriantharia dentre os Anthozoa e os demais Cnidaria e comparar os resultados obtidos pelos diferentes métodos de reconstruções filogenéticas. Para tal, foram utilizados marcadores ribossomais completos (28S e 18S). Por meio de softwares específicos, as sequências genéticas foram analisadas e as reconstruções foram realizadas seguindo dois dos métodos mais utilizados atualmente (máxima verossimilhança e inferência bayesiana). Tendo em mãos inúmeras ferramentas, o presente trabalho é uma oportunidade de gerar conhecimento e buscar conceitos mais precisos dentro da sistemática do grupo / Abstract: The Cnidaria phylum can be considered one of the most distinguished of the animal kingdom. Among them, there is the subclass Ceriantharia, constituted by the tube anemones. These animals are so far a challenge to systematics, since after several proposals, none has been considered as unanimous yet. Much of the divergence found in classifying the group is due to the methods / materials used for analysis. Thus, the present work sough trough molecular data, to study the relationships of Ceriantharia among the Anthozoa and the other Cnidaria and to compare the results obtained by the different methods of phylogenetic reconstruction. For this, complete ribosomal markers (28S and 18S), were used. By means of specific softwares, the genetic sequences were analyzed and the reconstruction were performed following two of the most commonly used methods (maximum likelihood and Bayesian inference). Having in hand numerous tools, the present work was a very important opportunity to generate knowledge and to search for more precise concepts within the systematics of the group / Mestre
|
462 |
Genetic and Genomic Bases of Evolved Increases in Stickleback DentitionHart, James Clinton 11 September 2018 (has links)
<p> Evolution—the great tinkerer—has produced the astounding diversity of form within and between existing species. It is a fundamental goal of evolutionary biology to understand the origin of such diversity. What types of genes underlie evolved changes in morphology? Are certain types of mutations (notably changes within regulatory regions) more likely to be used to produce adaptive changes in form? When distinct populations evolve similar morphological changes, are the underlying genetic bases changes to the same genes, the same genetic pathways, or largely independent? Are changes in form modular, or are their concerted changes to multiple developmentally similar organs? The ever cheapening cost of sequencing, coupled the availability of high-quality reference genomes, allows high-throughput approaches to identifying the loci of evolution. The emergence of a robust genome engineering system, CRISPR/Cas9, allows for efficient and direct testing of a gene's phenotype. Combining both of these techniques with a model system with naturally evolved phenotypic variation, the threespine stickleback, allows for systems-level answers to the many evolutionary questions. </p><p> Chapter one outlines the field of evolutionary developmental biology. It proposes two alternative viewpoints for thinking about the evolution of form. The first is the view of the `Modern Synthesis', linking Mendelian inheritance with Darwinian natural selection, which explains evolution as the change in allele frequencies over time. The second views evolution through the lens of deep homology, focusing on changes to developmental programs over time, even across related organs within the same animal. It then introduces key concepts within evolutionary and developmental biology, including <i> cis</i>-regulation of gene expression, and gene regulatory networks. It then provides examples of evolution reusing similar gene regulatory networks, including <i>Hox</i> genes, <i>Pax6</i> dependent eye initiation, and ectodermal placode development. Teeth use highly conserved signaling pathways, during both their initiation and replacement. Threespine sticklebacks <i>Gasterosteus aculeatus</i> have repeatedly adapted following a shift from marine to freshwater environments, with many independently derived populations sharing common morphological traits, including a gain in tooth number. The following chapters investigate this gain in tooth number in multiple distinct populations of sticklebacks. </p><p> Chapter two describes the discovery and mapping of a spontaneous stickleback albino mutation, named <i>casper</i>. <i>casper</i> is a sex-linked recessive mutation that results in oculocutaneous albinism, defective swim bladders, and blood clotting defects. Bulked segregant mapping of <i> casper</i> mutants revealed a strong genetic signal on chromosome 19, the stickleback X chromosome, proximal to the gene <i>Hps5</i>. <i> casper</i> mutants had a unique insertion of a G in the 6<sup>th</sup> exon on <i>Hps5</i>. As mutants in the human orthologue of <i> Hps5</i> resulted in similar albino and blood clotting phenotypes, <i> Hps5</i> is a strong candidate underlying the <i>casper</i> phenotype. Further supporting this model, genome editing of <i>Hps5</i> phenocopied <i>casper</i>. Lastly, we show that <i>casper</i> is an excellent tool for visualizing the activity of fluorescent transgenes at late developmental stages due to the near-translucent nature of the mutant animals. </p><p> Chapter three details the fine mapping of a quantitative trail locus (QTL) on chromosome 21 controlling increases in tooth number in a Canadian freshwater stickleback population. Recombinant mapping reduced the QTL-containing region to an 884kb window. Repeated QTL mapping experiments showed the presence of this QTL on multiple, but not all, wild derived chromosomes from the Canadian population. Comparative genome sequencing revealed the perfect correlation with genetic data of ten variants, spanning 4.4kb, all within the 4<i> th</i> intron of the gene <i>Bmp6</i>. Transgenic analysis of this intronic region uncovered its role as a robust tooth enhancer. TALEN induced mutations in <i>Bmp6</i> revealed required roles for the gene in stickleback tooth development. Finally, comparative RNA-seq between <i> Bmp6</i> wild-type and mutant dental tissue showed a loss of mouse hair stem cell genes in <i>Bmp6</i> mutant fish teeth, suggesting deep homology of the regeneration of these two organs. </p><p> Chapter four investigates the evolved changes in gene expression that accompany evolved increases in tooth number in two distinct freshwater populations. Independently derived stickleback populations from California and Canada have both evolved increases in tooth number, and previous work suggested that these populations used distinct genetic changes during their shared morphological changes. RNA-seq analysis of dental tissue from both freshwater populations compared to marine revealed a gain in critical regulators of tooth development in both freshwater populations. These evolved changes in gene expression can be partitioned in <i>cis</i> changes (mutations within regulatory elements of a gene) and trans changes (changes to the overall regulatory environment) using phased RNA-seq data from marine-freshwater F1 hybrids. Many genes show evidence for stabilizing selection of expression levels, with <i>cis </i> and <i>trans</i> changes in opposing directions (Abstract shortened by ProQuest.). </p><p>
|
463 |
Quantifying Nucleotide Variation in RNA Virus Populations by Next-generation SequencingFedewa, Gregory 24 October 2018 (has links)
<p> RNA viruses include several notable human pathogens including HIV, hepatitis C virus, West Nile virus, influenza, and Ebola virus. This group of viruses includes viruses with incredibly diverse genome structures, such as single-stranded genomes, double-stranded genomes, multipart genomes, negative-stranded genomes, and positive-stranded genomes. They also exist as heterogeneous populations that can mutate and rapidly evolve due to their error-prone polymerases. These errors then accumulate as they are passed down through generation. They can, therefore, be used as a historical marker for genetic relationships. If these errors result in a change of fitness for the virus they can then be used to locate areas in the genome that are undergoing selection pressures.</p><p> In this work, I use these principles to examine what changes are necessary for Ebola virus to infect boa constrictor cells and how high priority RNA viruses mutate as a function of routine viral passaging and propagation. In <i> Chapter 2</i>, I show that Ebola virus requires no additional mutations in order to replicate efficiently in boa constrictor cells. In <i>Chapter 3</i>, I show that SNV analysis can be used to track the identity and passage history of different RNA viruses.</p><p>
|
464 |
The Emergence of Cardioprotection from the Brain-Gut-Heart NetworkGorky, Jonathan 24 October 2018 (has links)
<p> Cardiovascular disease is the largest cause of mortality with more than 2,200 individuals dying daily in the United States alone. Many of these patients suffer from heart failure, the treatments for which have not been able to improve the 50% five year mortality. The use of nonpharmacological interventions like vagal stimulation and ischemic preconditioning have demonstrated great potential, but poor consistency, in treating patients, highlighting the need to better understand the mechanisms by which these treatments work in order to improve consistency and efficacy. The dorsal motor nucleus of the vagus (DMV), which gives rise to vagal efferent projections, has been shown to be essential in mediating these effects. However there is currently limited understanding of the functional heterogeneity of neurons in the DMV, which limits the ability to design better vagal stimulation treatments or to capitalize on recapitulating other vagally mediated cardioprotective effects. In order to develop a better understanding of DMV heterogeneity and elucidate how this heterogeneity might shift to drive cardioprotection, we have taken molecular profiling approach on a single cell level. Such an approach takes advantage of gene regulatory networks and transcriptional patterns to discern biological function. We start with a first-of-its-kind manipulation of gene regulatory networks in the dorsal vagal complex using antisense locked nucleic acids targeted against two specific microRNAs that renormalizes blood pressure in the spontaneously hypertensive rat. These effects are specific to the hypertensive strain with little effect on the normotensive strains due in part to different underlying regulatory network structure. Such networks can be nudged by altering microRNA expression enough to drive physiological effects in the whole body system of hypertensive animals without appreciable perturbation of the already healthy networks of the normotensive animals. From here we develop a framework for understanding the transcriptional heterogeneity in DMV neurons specifically, generating phenotypic classifications. The results suggest that the traditional means of classifying neurons, by neurotransmitters or connectivity, does not have a strong underlying rationale based upon the transcriptional patterns we have observed. The rate limiting enzymes that generate neurotransmitters are often coexpressed with several others in the same neuron. This foundation permits even subtle shifts in neuronal populations to be observed, as is the case under remote ischemic preconditioning. We observe an increase in the number of neurons expressing excitatory H1 histamine receptors, which also have increased expression of tachykinin precursors and atrial naturetic peptide. This suggests a novel role for tuberomammillary projections to the DMV in the mediation of cardioprotection. Over several weeks in the development of heart failure after myocardial infarction, we observe a phenotypic shift in DMV neurons toward a neurosecretory phenotype, driven in part by transcription factors primarily active during embryonic development. This suggests not only that the DMV is responsive to heart failure, but also that the neurons are able to change phenotype to do so. Such phenotypic plasticity leads to consideration of the DMV, and the autonomic nervous system, as capable of adaptive responses rather than mere reflex mediation. Given the large number of DMV projections to the gut, there is further evidence of a brain-gut-heart network that mediates vagal cardioprotection and cardiovascular health as a whole. If we are to find more successful treatments of cardiovascular disease, it is important to consider this and not just treat the heart, but treat the whole network supporting it.</p><p>
|
465 |
A Robust scRNA-seq Data Analysis Pipeline for Measuring Gene Expression NoiseJanuary 2017 (has links)
abstract: The past decade has seen a drastic increase in collaboration between Computer Science (CS) and Molecular Biology (MB). Current foci in CS such as deep learning require very large amounts of data, and MB research can often be rapidly advanced by analysis and models from CS. One of the places where CS could aid MB is during analysis of sequences to find binding sites, prediction of folding patterns of proteins. Maintenance and replication of stem-like cells is possible for long terms as well as differentiation of these cells into various tissue types. These behaviors are possible by controlling the expression of specific genes. These genes then cascade into a network effect by either promoting or repressing downstream gene expression. The expression level of all gene transcripts within a single cell can be analyzed using single cell RNA sequencing (scRNA-seq). A significant portion of noise in scRNA-seq data are results of extrinsic factors and could only be removed by customized scRNA-seq analysis pipeline. scRNA-seq experiments utilize next-gen sequencing to measure genome scale gene expression levels with single cell resolution.
Almost every step during analysis and quantification requires the use of an often empirically determined threshold, which makes quantification of noise less accurate. In addition, each research group often develops their own data analysis pipeline making it impossible to compare data from different groups. To remedy this problem a streamlined and standardized scRNA-seq data analysis and normalization protocol was designed and developed. After analyzing multiple experiments we identified the possible pipeline stages, and tools needed. Our pipeline is capable of handling data with adapters and barcodes, which was not the case with pipelines from some experiments. Our pipeline can be used to analyze single experiment scRNA-seq data and also to compare scRNA-seq data across experiments. Various processes like data gathering, file conversion, and data merging were automated in the pipeline. The main focus was to standardize and normalize single-cell RNA-seq data to minimize technical noise introduced by disparate platforms. / Dissertation/Thesis / Masters Thesis Bioengineering 2017
|
466 |
Computational Pan-Genomics| Algorithms and ApplicationsCleary, Alan Michael 02 June 2018 (has links)
<p> As the cost of sequencing DNA continues to drop, the number of sequenced genomes rapidly grows. In the recent past, the cost dropped so low that it is no longer prohibitively expensive to sequence multiple genomes for the same species. This has led to a shift from the single reference genome per species paradigm to the more comprehensive pan-genomics approach, where populations of genomes from one or more species are analyzed together. </p><p> The total genomic content of a population is vast, requiring algorithms for analysis that are more sophisticated and scalable than existing methods. In this dissertation, we explore new algorithms and their applications to pan-genome analysis, both at the nucleotide and genic resolutions. Specifically, we present the Approximate Frequent Subpaths and Frequented Regions problems as a means of mining syntenic blocks from pan-genomic de Bruijn graphs and provide efficient algorithms for mining these structures. We then explore a variety of analyses that mining synteny blocks from pan-genomic data enables, including meaningful visualization, genome classification, and multidimensional-scaling. We also present a novel interactive data mining tool for pan-genome analysis—the Genome Context Viewer—which allows users to explore pan-genomic data distributed across a heterogeneous set of data providers by using gene family annotations as a unit of search and comparison. Using this approach, the tool is able to perform traditionally cumbersome analyses on-demand in a federated manner.</p><p>
|
467 |
Predicting the Effects of Protein Variants using Structural Modeling, Large-Scale Data Integration, and Machine LearningBaugh, Evan H. 22 March 2017 (has links)
<p> High-throughput sequencing technologies and new computational techniques for analyzing population genetics data are rapidly improving our understanding of disease susceptibility in humans and adaptation in a wide variety of organisms. These studies often discover nonsynonymous variation with large effects as even a single amino acid change can disrupt the folding, catalytic activity, and physical interactions of proteins. Current estimates predict that every human genome contains 10,000-11,000 nonsynonymous variations and, while we cannot currently characterize all this diversity experimentally, many variants that alter protein function can be identified computationally from destabilization of structural models or amino acid conservation. Methods for annotating variant effects in genome-wide association studies and exome sequencing studies use conservation and other sequence-based features to identify damaging variants but cannot predict the effect these variants have on protein function. Recent studies of de novo variants have demonstrated the power of these methods but also the need for additional information, such as physical models from the Protein Data Bank, to identify causal variants in disease association studies. </p><p> I present VIPUR, a computational framework that integrates sequence analysis and structural modeling using the Rosetta protein modeling suite to identify and interpret deleterious protein variants. To train VIPUR, I collected 9,477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to variants in any organism’s proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. I show that VIPUR’s predictions of deleteriousness match the biological phenotypes for pathogenicity in ClinVar despite being trained on a different label. I use VIPUR to interpret mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation functional effects. </p><p> Generalizable tools for interpreting genetic variants are especially needed with individualized exome sequencing, where clear indications of confident predictions are necessary to identify causal variation. I demonstrate VIPUR’s ability to select candidate variants associated with human diseases by predicting the effects of <i>de novo</i> variants associated with Autism Spectrum Disorders (ASD) in the Simons Simplex Collection. Compared to existing methods, VIPUR deleterious predictions have the greatest enrichment for mutations found in children with ASD. VIPUR’s predictions of deleterious effects are easily combined with other protein functional data to produce a small set of candidate genes and variants with specific mechanistic predictions. </p><p> Although designed to aid in the discovery of causal variants, VIPUR can also simulate mutations to better understand specific protein functions. The distribution of VIPUR scores across all positions in a protein can be used to highlight conserved residues and provides an overall measure of protein conservation. When applied to levoglucosan kinase, a bacterial enzyme of interest for biofuel processing, VIPUR neutral predictions have a five fold enrichment for beneficial growth mutations. While VIPUR is not designed to detect gain-of-function mutations, this enrichment suggests VIPUR scores can identify potentially beneficial mutations by removing clearly deleterious ones. When applied to TP53, a human protein that is mutated in nearly half of all cancers, VIPUR score trends highlight the most common mutations in the COSMIC database, suggesting other variants that may have similar effects on tumor growth. VIPUR and the large-scale data analysis empowering it will aid in the interpretation of protein variation by providing a detailed feature space to characterize protein functional effects and confident predictions of deleterious variation in Genome-Wide Association Studies, exome sequencing initiatives, and protein engineering. </p><p>
|
468 |
Probabilistic modelling of genomic trajectoriesCampbell, Kieran January 2017 (has links)
The recent advancement of whole-transcriptome gene expression quantification technology - particularly at the single-cell level - has created a wealth of biological data. An increasingly popular unsupervised analysis is to find one dimensional manifolds or trajectories through such data that track the development of some biological process. Such methods may be necessary due to the lack of explicit time series measurements or due to asynchronicity of the biological process at a given time. This thesis aims to recast trajectory inference from high-dimensional "omics" data as a statistical latent variable problem. We begin by examining sources of uncertainty in current approaches and examine the consequences of propagating such uncertainty to downstream analyses. We also introduce a model of switch-like differentiation along trajectories. Next, we consider inferring such trajectories through parametric nonlinear factor analysis models and demonstrate that incorporating information about gene behaviour as informative Bayesian priors improves inference. We then consider the case of bifurcations in data and demonstrate the extent to which they may be modelled using a hierarchical mixture of factor analysers. Finally, we propose a novel type of latent variable model that performs inference of such trajectories in the presence of heterogeneous genetic and environmental backgrounds. We apply this to both single-cell and population-level cancer datasets and propose a nonparametric extension similar to Gaussian Process Latent Variable Models.
|
469 |
Multi-Class Computational Evolution| Development, Benchmark Comparison, and Application to RNA-Seq Biomarker DiscoveryCrabtree, Nathaniel Mark 24 October 2017 (has links)
<p> A computational evolution system (CES) is a knowledge-discovery engine that constructs and evolves classifiers with a small number of features to identify subtle, synergistic relationships among features and to discriminate groups in high-dimensional data analysis. CESs have previously been designed to only analyze binary datasets. In this work, the CES method has been expanded to accommodate multi-class data.</p><p> The multi-class CES was compared to three common classification and feature selection methods: random forest, random k-nearest neighbor, and support vector machines. The four classifiers were evaluated on three real RNA sequencing datasets. Performance was evaluated via cross validation to assess classification accuracy, number of features selected, stability of the selected feature sets, and run-time.</p><p> The three common classification and feature selection methods were originally designed for microarray data, which is fundamentally different from RNA-Seq data. In order to preprocess RNA-Seq count data for classification, the data was normalized and transformed via a variance stabilizing transformation to remove the variance-mean relationship that is commonly observed in RNA-Seq count data.</p><p> Compared to the three competing methods, the multi-class CES selected far fewer features. The identified features are potential biomarkers that may be more relevant than the longer lists of features identified by the competing methods. The CES performed best on the dataset with the smallest sample size, indicating that it has a unique advantage in these situations since most classification algorithms suffer in terms of accuracy when the sample size is small.</p><p> The CES identified numerous potentially-important biomarkers in each of the three real datasets that are validated by previous research and worthy of additional investigation. CES was especially helpful at identifying important features in the rat blood RNA-Seq data set. Subsequent ontological analysis of these selected features revealed protein folding as an important process in that dataset. The other contribution of this research to science was to extend the applicability of CES to biomarker discovery in multi-class settings. New software algorithms based on CES have already been developed, and the multi-class modifications presented here are directly applicable and would also benefit the newer software.</p><p>
|
470 |
The NuA4 Histone Acetyltransferase Complex Affects Epigenetic Regulation of Regeneration in Schmidtea mediterraneaAyala, Ivan A. 31 October 2017 (has links)
<p>Nuclear functions in eukaryotic cells are regulated by the NuA4 histone acetyltransferase complex. This is a highly-conserved protein complex that regulates multiple vital nuclear functions like the cell cycle, DNA repair and transcription. Gene expression is regulated by this complex though epigenetics by adding acetyl groups to lysine residues on histone H4. This affects the expression of genes in the regions of the chromosome where the addition occurred. The planarian flatworm Schmidtea mediterranea is thought to be, effectively, immortal due to its amazing ability to regenerate and maintain pluripotent stem cells throughout its life time. Better understanding of the genes that control differentiation and pluripotency is needed. Humans have gene homologs to the planarian counterparts; therefore, it could be possible to gain knowledge about our own stem cells from these worms. I have identified planarian homologs of 15 proteins in the human NuA4 complex (Ruvbl2, Morf4l2, Mrg15, Epc1, Tip60, Trrap, Gas41, Ruvbl1, Brd8, Yl-1, Baf53a, Dmap1, Ing3, hEaf6-1 and hEaf6-2) and silenced them by RNA interference (RNAi) to examine the role of the complex in stem cell maintenance and regeneration. The RNAi method involves feeding the worms double-stranded RNA with a sequence matching the gene of interest to target the destruction of the mRNA expressed from that gene, thus knocking down its expression. I will observe two groups of RNAi worms; a regenerating group and a homeostasis group. The regenerating will be cut following the knockdown to observe how well they restore their lost tissue. The homeostasis group will be fixed and stained to mark mitotic cells and find out if the stem cells are dividing normally. I will also use in-situ staining to determine where each of these genes are being expressed. I hypothesize that knockdown of these important regulatory complex genes will result in reduced regenerative ability and that the worms? stem cell population will not be properly maintained.
|
Page generated in 0.1495 seconds