Global ETD Search

11	Co-transcriptional splicing in two yeasts Herzel, Lydia 10 September 2015 (has links) Cellular function and physiology are largely established through regulated gene expression. The first step in gene expression, transcription of the genomic DNA into RNA, is a process that is highly aligned at the levels of initiation, elongation and termination. In eukaryotes, protein-coding genes are exclusively transcribed by RNA polymerase II (Pol II). Upon transcription of the first 15-20 nucleotides (nt), the emerging nascent RNA 5’ end is modified with a 7-methylguanosyl cap. This is one of several RNA modifications and processing steps that take place during transcription, i.e. co-transcriptionally. For example, protein-coding sequences (exons) are often disrupted by non-coding sequences (introns) that are removed by RNA splicing. The two transesterification reactions required for RNA splicing are catalyzed through the action of a large macromolecular machine, the spliceosome. Several non-coding small nuclear RNAs (snRNAs) and proteins form functional spliceosomal subcomplexes, termed snRNPs. Sequentially with intron synthesis different snRNPs recognize sequence elements within introns, first the 5’ splice site (5‘ SS) at the intron start, then the branchpoint and at the end the 3’ splice site (3‘ SS). Multiple conformational changes and concerted assembly steps lead to formation of the active spliceosome, cleavage of the exon-intron junction, intron lariat formation and finally exon-exon ligation with cleavage of the 3’ intron-exon junction. Estimates on pre-mRNA splicing duration range from 15 sec to several minutes or, in terms of distance relative to the 3‘ SS, the earliest detected splicing events were 500 nt downstream of the 3‘ SS. However, the use of indirect assays, model genes and transcription induction/blocking leave the question of when pre-mRNA splicing of endogenous transcripts occurs unanswered. In recent years, global studies concluded that the majority of introns are removed during the course of transcription. In principal, co-transcriptional splicing reduces the need for post-transcriptional processing of the pre-mRNA. This could allow for quicker transcriptional responses to stimuli and optimal coordination between the different steps. In order to gain insight into how pre-mRNA splicing might be functionally linked to transcription, I wanted to determine when co-transcriptional splicing occurs, how transcripts with multiple introns are spliced and if and how the transcription termination process is influenced by pre-mRNA splicing. I chose two yeast species, S. cerevisiae and S. pombe, to study co-transcriptional splicing. Small genomes, short genes and introns, but very different number of intron-containing genes and multi-intron genes in S. pombe, made the combination of both model organisms a promising system to study by next-generation sequencing and to learn about co-transcriptional splicing in a broad context with applicability to other species. I used nascent RNA-Seq to characterize co-transcriptional splicing in S. pombe and developed two strategies to obtain single-molecule information on co-transcriptional splicing of endogenous genes: (1) with paired-end short read sequencing, I obtained the 3’ nascent transcript ends, which reflect the position of Pol II molecules during transcription, and the splicing status of the nascent RNAs. This is detected by sequencing the exon-intron or exon-exon junctions of the transcripts. Thus, this strategy links Pol II position with intron splicing of nascent RNA. The increase in the fraction of spliced transcripts with further distance from the intron end provides valuable information on when co-transcriptional splicing occurs. (2) with Pacific Biosciences sequencing (PacBio) of full-length nascent RNA, it is possible to determine the splicing pattern of transcripts with multiple introns, e.g. sequentially with transcription or also non-sequentially. Part of transcription termination is cleavage of the nascent transcript at the polyA site. The splicing status of cleaved and non-cleaved transcripts can provide insights into links between splicing and transcription termination and can be obtained from PacBio data. I found that co-transcriptional splicing in S. pombe is similarly prevalent to other species and that most introns are removed co-transcriptionally. Co-transcriptional splicing levels are dependent on intron position, adjacent exon length, and GC-content, but not splice site sequence. A high level of co-transcriptional splicing is correlated with high gene expression. In addition, I identified low abundance circular RNAs in intron-containing, as well as intronless genes, which could be side-products of RNA transcription and splicing. The analysis of co-transcriptional splicing patterns of 88 endogenous S. cerevisiae genes showed that the majority of intron splicing occurs within 100 nt downstream of the 3‘ SS. Saturation levels vary, and confirm results of a previous study. The onset of splicing is very close to the transcribing polymerase (within 27 nt) and implies that spliceosome assembly and conformational rearrangements must be completed immediately upon synthesis of the 3‘ SS. For S. pombe genes with multiple introns, most detected transcripts were completely spliced or completely unspliced. A smaller fraction showed partial splicing with the first intron being most often not spliced. Close to the polyA site, most transcripts were spliced, however uncleaved transcripts were often completely unspliced. This suggests a beneficial influence of pre-mRNA splicing for efficient transcript termination. Overall, sequencing of nascent RNA with the two strategies developed in this work offers significant potential for the analysis of co-transcriptional splicing, transcription termination and also RNA polymerase pausing by profiling nascent 3’ ends. I could define the position of pre-mRNA splicing during the process of transcription and provide evidence for fast and efficient co-transcriptional splicing in S. cerevisiae and S. pombe, which is associated with highly expressed genes in both organisms. Differences in S. pombe co-transcriptional splicing could be linked to gene architecture features, like intron position, GC-content and exon length. info:eu-repo/classification/ddc/570 ddc:570
12	Integrative analysis of data from multiple experiments Ronen, Jonathan 22 July 2020 (has links) Auf die Entwicklung der Hochdurchsatz-Sequenzierung (HTS) folgte eine Reihe von speziellen Erweiterungen, die erlauben verschiedene zellbiologischer Aspekte wie Genexpression, DNA-Methylierung, etc. zu messen. Die Analyse dieser Daten erfordert die Entwicklung von Algorithmen, die einzelne Experimenteberücksichtigen oder mehrere Datenquellen gleichzeitig in betracht nehmen. Der letztere Ansatz bietet besondere Vorteile bei Analyse von einzelligen RNA-Sequenzierung (scRNA-seq) Experimenten welche von besonders hohem technischen Rauschen, etwa durch den Verlust an Molekülen durch die Behandlung geringer Ausgangsmengen, gekennzeichnet sind. Um diese experimentellen Defizite auszugleichen, habe ich eine Methode namens netSmooth entwickelt, welche die scRNA-seq-Daten entrascht und fehlende Werte mittels Netzwerkdiffusion über ein Gennetzwerk imputiert. Das Gennetzwerk reflektiert dabei erwartete Koexpressionsmuster von Genen. Unter Verwendung eines Gennetzwerks, das aus Protein-Protein-Interaktionen aufgebaut ist, zeige ich, dass netSmooth anderen hochmodernen scRNA-Seq-Imputationsmethoden bei der Identifizierung von Blutzelltypen in der Hämatopoese, zur Aufklärung von Zeitreihendaten unter Verwendung eines embryonalen Entwicklungsdatensatzes und für die Identifizierung von Tumoren der Herkunft für scRNA-Seq von Glioblastomen überlegen ist. netSmooth hat einen freien Parameter, die Diffusionsdistanz, welche durch datengesteuerte Metriken optimiert werden kann. So kann netSmooth auch dann eingesetzt werden, wenn der optimale Diffusionsabstand nicht explizit mit Hilfe von externen Referenzdaten optimiert werden kann. Eine integrierte Analyse ist auch relevant wenn multi-omics Daten von mehrerer Omics-Protokolle auf den gleichen biologischen Proben erhoben wurden. Hierbei erklärt jeder einzelne dieser Datensätze nur einen Teil des zellulären Systems, während die gemeinsame Analyse ein vollständigeres Bild ergibt. Ich entwickelte eine Methode namens maui, um eine latente Faktordarstellungen von multiomics Daten zu finden. / The development of high throughput sequencing (HTS) was followed by a swarm of protocols utilizing HTS to measure different molecular aspects such as gene expression (transcriptome), DNA methylation (methylome) and more. This opened opportunities for developments of data analysis algorithms and procedures that consider data produced by different experiments. Considering data from seemingly unrelated experiments is particularly beneficial for Single cell RNA sequencing (scRNA-seq). scRNA-seq produces particularly noisy data, due to loss of nucleic acids when handling the small amounts in single cells, and various technical biases. To address these challenges, I developed a method called netSmooth, which de-noises and imputes scRNA-seq data by applying network diffusion over a gene network which encodes expectations of co-expression patterns. The gene network is constructed from other experimental data. Using a gene network constructed from protein-protein interactions, I show that netSmooth outperforms other state-of-the-art scRNA-seq imputation methods at the identification of blood cell types in hematopoiesis, as well as elucidation of time series data in an embryonic development dataset, and identification of tumor of origin for scRNA-seq of glioblastomas. netSmooth has a free parameter, the diffusion distance, which I show can be selected using data-driven metrics. Thus, netSmooth may be used even in cases when the diffusion distance cannot be optimized explicitly using ground-truth labels. Another task which requires in-tandem analysis of data from different experiments arises when different omics protocols are applied to the same biological samples. Analyzing such multiomics data in an integrated fashion, rather than each data type (RNA-seq, DNA-seq, etc.) on its own, is benefitial, as each omics experiment only elucidates part of an integrated cellular system. The simultaneous analysis may reveal a comprehensive view. integrierte Analyse Darmkrebs Zelllinien einzelligen RNA-Sequenzierung integrative data analysis multi-omics deep learning network smoothing colorectal cancer cancer cell lines single cell sequencing 570 Biologie WC 7700 ddc:570
13	Transcriptomes of testis and pituitary from male Nile tilapia (O. niloticus L.) in the context of social status Thönnes, Michelle, Prause, Rebecca, Levavi-Sivan, Berta, Pfennig, Frank 18 April 2024 (has links) African cichlids are well established models for studying social hierarchies in teleosts and elucidating the effects social dominance has on gene expression. Ascension in the social hierarchy has been found to increase plasma levels of steroid hormones, follicle stimulating hormone (Fsh) and luteinizing hormone (Lh) as well as gonadosomatic index (GSI). Furthermore, the expression of genes related to gonadotropins and steroidogenesis and signaling along the brain-pituitary-gonad axis (BPG-axis) is affected by changes of an animal’s social status. In this study, we use RNA-sequencing to obtain an in-depth look at the transcriptomes of testes and pituitaries from dominant and subordinate male Nile tilapia living in long-term stable social hierarchies. This allows us to draw conclusions about factors along the brain-pituitary-gonad axis that are involved in maintaining dominance over weeks or even months. We identify a number of genes that are differentially regulated between dominant and subordinate males and show that in high-ranking fish this subset of genes is generally upregulated. Genes differentially expressed between the two social groups comprise growth factors, related binding proteins and receptors, components of Wnt-, Tgfβ- and retinoic acid-signaling pathway, gonadotropin signaling and steroidogenesis pathways. The latter is backed up by elevated levels of 11-ketotestosterone, testosterone and estradiol in dominant males. Luteinizing hormone (Lh) is found in higher concentration in the plasma of long-term dominant males than in subordinate animals. Our results both strengthen the existing models and propose new candidates for functional studies to expand our understanding of social phenomena in teleost fish. info:eu-repo/classification/ddc/610 ddc:610 info:eu-repo/classification/ddc/500 ddc:500
14	Development and application of new statistical methods for the analysis of multiple phenotypes to investigate genetic associations with cardiometabolic traits Konigorski, Stefan 27 April 2018 (has links) Die biotechnologischen Entwicklungen der letzten Jahre ermöglichen eine immer detailliertere Untersuchung von genetischen und molekularen Markern mit multiplen komplexen Traits. Allerdings liefern vorhandene statistische Methoden für diese komplexen Analysen oft keine valide Inferenz. Das erste Ziel der vorliegenden Arbeit ist, zwei neue statistische Methoden für Assoziationsstudien von genetischen Markern mit multiplen Phänotypen zu entwickeln, effizient und robust zu implementieren, und im Vergleich zu existierenden statistischen Methoden zu evaluieren. Der erste Ansatz, C-JAMP (Copula-based Joint Analysis of Multiple Phenotypes), ermöglicht die Assoziation von genetischen Varianten mit multiplen Traits in einem gemeinsamen Copula Modell zu untersuchen. Der zweite Ansatz, CIEE (Causal Inference using Estimating Equations), ermöglicht direkte genetische Effekte zu schätzen und testen. C-JAMP wird in dieser Arbeit für Assoziationsstudien von seltenen genetischen Varianten mit quantitativen Traits evaluiert, und CIEE für Assoziationsstudien von häufigen genetischen Varianten mit quantitativen Traits und Ereigniszeiten. Die Ergebnisse von umfangreichen Simulationsstudien zeigen, dass beide Methoden unverzerrte und effiziente Parameterschätzer liefern und die statistische Power von Assoziationstests im Vergleich zu existierenden Methoden erhöhen können - welche ihrerseits oft keine valide Inferenz liefern. Für das zweite Ziel dieser Arbeit, neue genetische und transkriptomische Marker für kardiometabolische Traits zu identifizieren, werden zwei Studien mit genom- und transkriptomweiten Daten mit C-JAMP und CIEE analysiert. In den Analysen werden mehrere neue Kandidatenmarker und -gene für Blutdruck und Adipositas identifiziert. Dies unterstreicht den Wert, neue statistische Methoden zu entwickeln, evaluieren, und implementieren. Für beide entwickelten Methoden sind R Pakete verfügbar, die ihre Anwendung in zukünftigen Studien ermöglichen. / In recent years, the biotechnological advancements have allowed to investigate associations of genetic and molecular markers with multiple complex phenotypes in much greater depth. However, for the analysis of such complex datasets, available statistical methods often don’t yield valid inference. The first aim of this thesis is to develop two novel statistical methods for association analyses of genetic markers with multiple phenotypes, to implement them in a computationally efficient and robust manner so that they can be used for large-scale analyses, and evaluate them in comparison to existing statistical approaches under realistic scenarios. The first approach, called the copula-based joint analysis of multiple phenotypes (C-JAMP) method, allows investigating genetic associations with multiple traits in a joint copula model and is evaluated for genetic association analyses of rare genetic variants with quantitative traits. The second approach, called the causal inference using estimating equations (CIEE) method, allows estimating and testing direct genetic effects in directed acyclic graphs, and is evaluated for association analyses of common genetic variants with quantitative and time-to-event traits. The results of extensive simulation studies show that both approaches yield unbiased and efficient parameter estimators and can improve the power of association tests in comparison to existing approaches, which yield invalid inference in many scenarios. For the second goal of this thesis, to identify novel genetic and transcriptomic markers associated with cardiometabolic traits, C-JAMP and CIEE are applied in two large-scale studies including genome- and transcriptome-wide data. In the analyses, several novel candidate markers and genes are identified, which highlights the merit of developing, evaluating, and implementing novel statistical approaches. R packages are available for both methods and enable their application in future studies. Genomweite Assoziationsstudien Multiple Phänotypen Copula Modelle Kausale Inferenz Kardiometabolische Traits Seltene genetische Varianten R Pakete RNA Sequenzierung Genome-wide association studies Multiple phenotypes Copula models Causal inference Cardiometabolic traits Rare genetic variants R packages RNA Sequencing 004 Datenverarbeitung; Informatik 576 Genetik und Evolution 610 Medizin und Gesundheit WC 7700 ddc:519 ddc:004 ddc:576 ddc:610

Page generated in 0.0718 seconds