Global ETD Search

1	Human Promoter Recognition Based on Principal Component Analysis Li, Xiaomeng January 2008 (has links) Master of Engineering / This thesis presents an innovative human promoter recognition model HPR-PCA. Principal component analysis (PCA) is applied on context feature selection DNA sequences and the prediction network is built with the artificial neural network (ANN). A thorough literature review of all the relevant topics in the promoter prediction field is also provided. As the main technique of HPR-PCA, the application of PCA on feature selection is firstly developed. In order to find informative and discriminative features for effective classification, PCA is applied on the different n-mer promoter and exon combined frequency matrices, and principal components (PCs) of each matrix are generated to construct the new feature space. ANN built classifiers are used to test the discriminability of each feature space. Finally, the 3 and 5-mer feature matrix is selected as the context feature in this model. Two proposed schemes of HPR-PCA model are discussed and the implementations of sub-modules in each scheme are introduced. The context features selected by PCA are III used to build three promoter and non-promoter classifiers. CpG-island modules are embedded into models in different ways. In the comparison, Scheme I obtains better prediction results on two test sets so it is adopted as the model for HPR-PCA for further evaluation. Three existing promoter prediction systems are used to compare to HPR-PCA on three test sets including the chromosome 22 sequence. The performance of HPR-PCA is outstanding compared to the other four systems. Promoter Recognition Sequence Feature CpG Islands Transcription Start Sites Principal Component Analysis
2	Human Promoter Recognition Based on Principal Component Analysis Li, Xiaomeng January 2008 (has links) Master of Engineering / This thesis presents an innovative human promoter recognition model HPR-PCA. Principal component analysis (PCA) is applied on context feature selection DNA sequences and the prediction network is built with the artificial neural network (ANN). A thorough literature review of all the relevant topics in the promoter prediction field is also provided. As the main technique of HPR-PCA, the application of PCA on feature selection is firstly developed. In order to find informative and discriminative features for effective classification, PCA is applied on the different n-mer promoter and exon combined frequency matrices, and principal components (PCs) of each matrix are generated to construct the new feature space. ANN built classifiers are used to test the discriminability of each feature space. Finally, the 3 and 5-mer feature matrix is selected as the context feature in this model. Two proposed schemes of HPR-PCA model are discussed and the implementations of sub-modules in each scheme are introduced. The context features selected by PCA are III used to build three promoter and non-promoter classifiers. CpG-island modules are embedded into models in different ways. In the comparison, Scheme I obtains better prediction results on two test sets so it is adopted as the model for HPR-PCA for further evaluation. Three existing promoter prediction systems are used to compare to HPR-PCA on three test sets including the chromosome 22 sequence. The performance of HPR-PCA is outstanding compared to the other four systems. Promoter Recognition Sequence Feature CpG Islands Transcription Start Sites Principal Component Analysis
3	Genome-wide analysis of transcription initiation and promoter architecture in eukaryotes Raborn, R. Taylor 01 January 2012 (has links) The transcriptome represents the entirety of RNA molecules within a cell or tissue at a given time. Recent advances have facilitated the production of large-scale, global interrogations of transcriptomes, finding that genomes are extensively transcribed and contain diverse classes of RNAs (Dinger et al., 2009). Information generated by high-throughput analyses of mRNA transcription start sites (TSSs) such as CAGE (Cap Analysis of Gene Expression) indicate that eukaryotic genomes have complex landscapes of transcription initiation. The TSS is important for the annotation of cis-regulatory sequences, because it provides a link between the mRNA transcript and the promoter. The patterns of TSS distributions observed within mRNA 5' end profiling studies prevent straightforward annotation of putative promoters. To address this challenge, we developed a method to identify- on a genome-wide basis- the putative promoter, which we define by TSS distributions and designate the transcription start region (TSR). We applied a clustering method to identify and annotate TSRs within the budding yeast Saccharomyces cerevisiae using a full-length cDNA dataset (Miura et al., 2006). To validate these TSR annotations, we performed an integrative genomic analysis using multiple datasets. Our method identified TSRs at positions consistent with bona fide promoters in S. cerevisiae. In addition, using 5'RACE, we find overall agreement between computationally-defined TSRs and TSSs identified experimentally. From this analysis, we find that a significant proportion of genes exhibiting alternative promoter usage within sporulation are associated with respiration, suggesting that this is regulated on a condition-specific basis in budding yeast. We further developed our TSS clustering method into a bioinformatics tool called TSRchitect, which identifies and annotates TSRs from large-scale TSS profiling information. TSRchitect is capable of handling both tag and sequence-based TSS information and efficiently computes TSRs from global TSS datasets on a desktop computer. We find support for TSRchitect's annotations in human from a CAGE experiment from the ENCODE (Encyclopedia of DNA Elements) project. Finally, we use TSRchitect to identify TSRs from the transcriptomes of diverse eukaryotes. We investigated the conservation of TSRs among orthologous genes. We frequently identify multiple TSRs for a given gene, suggesting that alternative promoter usage is widespread. Overall, using TSS profiling data derived from separate tissues within mouse and human, we find that the positions of TSRs are relatively stable across tissues surveyed; however, a small fraction of genes exhibit tissue-specific differences in TSR use. As transcriptome profiling information continues to be generated at an rapid pace, computational approaches are increasingly important. It is anticipated that the method and approach we describe within this dissertation will contribute to an improved of gene regulation and promoter architecture in eukaryotes. Bioinformatics Comparative Genomics Promoter annotation Transcription Initiation Transcription Start Site Transcriptome Biology
4	Comparative promoter region analysis powered by CORG Dieterich, Christoph, Grossmann, Steffen, Tanzer, Andrea, Röpcke, Stefan, Arndt, Peter F., Stadler, Peter F., Vingron, Martin 11 December 2018 (has links) Background Promoters are key players in gene regulation. They receive signals from various sources (e.g. cell surface receptors) and control the level of transcription initiation, which largely determines gene expression. In vertebrates, transcription start sites and surrounding regulatory elements are often poorly defined. To support promoter analysis, we present CORG http://corg.molgen.mpg.de, a framework for studying upstream regions including untranslated exons (5' UTR). Description The automated annotation of promoter regions integrates information of two kinds. First, statistically significant cross-species conservation within upstream regions of orthologous genes is detected. Pairwise as well as multiple sequence comparisons are computed. Second, binding site descriptions (position-weight matrices) are employed to predict conserved regulatory elements with a novel approach. Assembled EST sequences and verified transcription start sites are incorporated to distinguish exonic from other sequences. As of now, we have included 5 species in our analysis pipeline (man, mouse, rat, fugu and zebrafish). We characterized promoter regions of 16,127 groups of orthologous genes. All data are presented in an intuitive way via our web site. Users are free to export data for single genes or access larger data sets via our DAS server http://tomcat.molgen.mpg.de:8080/das. The benefits of our framework are exemplarily shown in the context of phylogenetic profiling of transcription factor binding sites and detection of microRNAs close to transcription start sites of our gene set. Conclusion The CORG platform is a versatile tool to support analyses of gene regulation in vertebrate promoter regions. Applications for CORG cover a broad range from studying evolution of DNA binding sites and promoter constitution to the discovery of new regulatory sequence elements (e.g. microRNAs and binding sites).
5	Bioinformatic prediction of conserved promoters across multiple whole genomes of Chlamydia Grech, Brian James January 2007 (has links) The genome sequencing projects have generated a wealth of genomic data and the analysis of this data has provided many interesting findings. However, genome wide analysis of bacteria for promoters has lagged behind, because it has been difficult to accurately predict the promoters with so much background noise that are found in bacterial genomes. One approach to overcome this problem is to predict phylogenetically conserved promoters across multiple genomes of different bacteria, thus filtering out many of the false positives, which are predicted by the current methods. However, there are no programmes capable of doing this. Therefore, the work presented in this thesis has developed a position weight matrix (PWM) based programme called Multiscan that predicts conserved promoters across multiple bacterial genomes. Since Chlamydia is one of the most sequenced bacterial genera and has a high level of conservation of genes and large-scale conservation of gene order between species, Multiscan was developed and tested on Chlamydia. When Multiscan analysed a genome wide dataset of equivalent non-coding regions (NCRs) upstream of genes, from Chlamydia trachomatis, Chlamydia pneumoniae and Chlamydia caviae for σ66 promoters that are phylogenetically conserved, Multiscan predicted 42 promoters. Since only one of the 42 promoters predicted by Multiscan had previously available biological data to confirm its prediction, an additional subset of 10 of the remaining 41 σ66 promoters were analysed in C. trachomatis by mapping the 5' end of the transcripts. The primer extension assay synthesised cDNA products of the correct length for seven of the 10 genes chosen. When the performance of Multiscan was compared to one of the accepted method for genome wide prediction of promoters in bacteria, the &quotstandard PWM method", Multiscan predicted 32 more promoters than the &quotstandard PWM method" in Chlamydia. Furthermore, the promoters predicted by Multiscan were up to three more mismatches from the Escherichia coli σ70 consensus sequence than the promoters predicted by the standard PWM method. Although Multiscan predicted 42 promoters that were well conserved across the three chlamydial species, the analysis was unable to identify the 14 known σ66 promoters in C. trachomatis. These promoters were missed (1) because they were dissimilar to the E. coli σ70 consensus sequence and/or (2) because the promoters were poorly conserved across the three chlamydial species. To address the second possibility, the 14 false negatives were analysed by another phylogenetic footprinting method. Fourteen sets of equivalent NCRs located upstream of the homologous genes from the three chlamydiae were aligned with the computer programme Clustal W and the alignment analysed &quotby eye" for evidence of phylogenetic footprints containing the 14 false negatives. The analysis identified that seven of the 14 false negatives were poorly conserved across the chlamydial species. Analysis of two of the seven promoters that could not be footprinted, the promoters of ltuA and ltuB, by mapping the transcriptional start sites in C. caviae, confirmed their poor conservation across C. trachomatis and C. caviae. This analysis showed that substantial differences exist in chlamydial σ66 promoters from equivalent NCRs upstream of genes. This study has developed a new computer programme for genome wide prediction of promoters that are phylogenetically conserved and has shown the value of this programme by identifying seven new well conserved promoters and seven candidate poorly conserved promoters in Chlamydia. algorithm bioinformatic Chlamydia comparative genomics gene expression regulation phylogenetic footprinting phylogeny promoter sigma factor transcription transcription factor transcription start site
6	In silico investigation of glossina morsitans promoters Mwangi, Sarah Wambui January 2013 (has links) Philosophiae Doctor - PhD / Tsetse flies (Glossina spp) are the biological vectors for Trypanosomes, the causative magents of Human African Trypanosomiasis (HAT). HAT is a debilitating disease that continues to present a major public health problem and a key factor limiting rural development in vast regions of tropical Africa. To augment vector control efforts, the International Glossina Genome Initiative (IGGI) was established in 2004 with the ultimate goal of generating a fully annotated whole genome sequence for Glossina morsitans. A working draft genome of Glossina morsitans was availed in 2011. In this thesis, transcriptional regulatory features in Glossina morsitans were analysed using the draft genome. A method for TSS identification in the newly sequenced Glossina morsitans genome was developed using TSS-seq tags sampled from two developmental stages of Glossina morsitans. High throughput next generation sequencing reads obtained from Glossina morsitans larvae and pupae were used to locate transcription start sites (TSS) in the Glossina morsitans genome. TSS-seq tag clusters, defined as a minimum number of reads at the 5’ predicted UTR or first coding exon, were used to define transcription start sites. A total of 3134 tag clusters were identified on the Glossina genome. Approximately 45.4% (1424) of the tag clusters mapped to the first coding exons or their proximal predicted 5’UTR regions and include 31 tag clusters that mapped to transposons. A total of 1101 (35.1%) tag clusters mapped outside the genic region and/or scaffolds without gene predictions and may correspond to previously un-annotated transcripts or noncoding RNA TSS. The core promoter regions were classified as narrow or broad based on the number of TSS positions within a TSS-seq cluster. Majority (95%) of the core promoters analysed in this study were of the broad type while only 5% were of the narrow type. Comparison of canonical core promoter motif occurences between random and bona fide core promoters showed that, generally, the number of motifs in biologically functional genomic windows in the true dataset exceeded those in the random dataset (p <= 0.00164, 0.00135, 0.00185 for the narrow, broad with peak and broad without peak categories respectively). Frequency of motif co-occurrence in core promoter was found to be fundamentally different across various initiation patterns. Narrow core promoters recorded higher frequency of the TATA-box and INR motifs and two-way motif co-occurrence showed that the TATA-box-INR pair is over-represented in the narrow category. Broad core promoters showed higher frequency of the BREd and MTE motifs and two-way motif co-occurrence showed that the MTE-DPE pair is over-represented in broad core promoters. TATA-less promoters account for 77% of the core promoters in this analysis. TATA-less core promoters showed a higher frequency of the MTE and INR motifs in contrast to observations in Drosophila where the DPE motif has been reported to occur frequently in TATA-less promoters. These motif combinations suggest their equal importance to transcription in their corresponding promoter classes in Glossina morsitans. Glossina morsitans Human African trypanosomiasis Genome TSS-seq Transcription Transcription start site Promoter Transcription regulation Transcription factor binding site Database
7	A transcrição pervasiva na archaea Halobacterium salinarum NRC-1 e a identificação de novos transcritos / Pervasive transcription in the archaeon Halobacterium salinarum NRC- 1 and the identification of new transcripts. Caten, Felipe ten 15 February 2017 (has links) A caracterização em larga escala do transcritoma de diferentes organismos revelou um cenário complexo da expressão gênica, levando a identificação de inúmeros transcritos produzidos ao longo dos genomas de eucariotos e procariotos. Esse fenômeno recebeu o nome de transcrição pervasiva e tem sido fonte de estudos na busca de novos RNAs com importâncias regulatórias e também transcritos envolvidos na tradução de proteínas ainda não caracterizadas. A abundância de dados de transcritômica e proteômica, além de informações completas a respeito do genoma, fazem do extremófilo halofílico Halobacterium salinarum, um organismo modelo ideal para os estudos da transcrição pervasiva. Esse micro-organismo pertence ao grupo Archaea, o último dos três domínios da vida a ser descrito e com características compartilhadas entre bactérias e eucariotos. Através do uso da técnica de differential RNA-seq (dRNA-seq), a qual permite a distinção entre transcritos primários e processados, identificamos 179 TSSaRNAs em H. salinarum, esses pequenos RNAs estão associados ao início de transcrição e ainda não haviam sido descritos em archaea. A aplicação do dRNA-seq em amostras de RNA extraídas ao longo da curva de crescimento permitiu a identificação de 4540 TSS no genoma de H. salinarum NRC-1. Parte desses inícios de transcrição está localizada upstream a genes conhecidos, permitindo a identificação de inícios de transcrição em 1545 genes. 59,2% desses inícios de transcrição estão localizados até 10 pb. de distância do códon de início de tradução, confirmando a ausência de regiões UTRs em grande parte dos genes. A análise de expressão, em diferentes condições, das regiões relacionadas a inícios de transcrição antisense a genes revelou que a maioria dessas regiões apresenta um perfil de expressão correlacionado com os genes na fita oposta, indicando um possível papel regulatório desses transcritos. De forma similar, a análise da expressão de inícios de transcrição intergênicos permitiu a identificação de 132 regiões diferencialmente expressas e que não estão relacionadas a nenhum outro elemento no genoma de H. salinarum NRC-1. A análise comparativa com dados de proteômica revela que algumas dessas regiões podem estar envolvidas com a produção de pequenas proteínas. Além disso, a identificação de 1365 inícios de transcrição internos a genes sugere que a produção de transcritos intragênicos (intraRNAs) seja um fenômeno amplamente distribuído no genoma desse halófilo. Experimentos de Northern blot confirmaram a produção de um transcrito correspondente a porção final do gene VNG_RS05220, e experimentos de Western blot revelaram que a tradução desses intraRNAs é responsável pela produção de pequenas proteínas correspondentes a domínios proteicos individuais, com importante papel funcional em condições específicas de crescimento. A análise de inícios de transcrição upstream a regiões codificantes de domínios similares em bactérias e outras archaea sugere que a produção de intraRNAs codificantes é um fenômeno amplamente distribuído em procariotos e pode ser responsável pelo aumento da diversidade do proteoma através da geração de isoformas de proteínas a partir de um único gene. Por fim, a análise de dados de RNA-seq, em conjunto com a busca por assinaturas conhecidas de término de transcrição em archaea, permitiu a identificação da posição final de 58 genes. Os dados obtidos a partir dos experimentos e análises realizados ajudam a traçar um panorama mais completo do transcritoma de H. salinarum NRC-1 e revelam a presença de novos transcritos que podem ser amplamente distribuídos em procariotos e apresentar importantes papéis funcionais. / The large-scale transcriptome characterization of different organisms revealed a highly complex scenario of gene expression, leading to the identification of numerous transcripts in the genomes of eukaryotes and prokaryotes. This phenomenon has been named pervasive transcription and has been an important source for the search of new RNAs with regulatory functions or involved in the translation of unknown proteins. The abundance of transcriptomic and proteomic data, as well as complete information regarding the genome, allowed the halophilic extremophile Halobacterium salinarum to be an ideal model organism for studies of pervasive transcription. This microorganism belongs to the Archaea group, the last one of the three domains of life to be described, which presents shared characteristics with bacteria and eukaryotes. The use of differential RNA-seq (dRNA-seq) approach, which allows the distinction between primary and processed transcripts, allowed the identification of 179 TSSaRNAs, small RNAs associated with the transcription initiation in H. salinarum. The application of dRNA-seq in RNA samples collected along the growth curve allowed the identification of 4540 transcription start sites (TSS) in H. salinarum NRC-1. Some of these transcription initiation are located upstream to known genes, enabling the identification of TSSs for 1545 genes. 59.2% of these positions are located up to 10 bp away from the translation initiation codon, confirming that most of genes are leaderless. The expression analysis of regions related to antisense TSS under different conditions revealed that most of these regions have a correlated expression profile with genes in the opposite strand, indicating a possible regulatory role. Similarly, analysis of the expression of intergenic TSS allowed the identification of 132 differentially expressed regions that are not related to any other element in H. salinarum NRC-1 genome. Integration with proteomic data reveals that some of these regions may be involved in the production of small proteins. The identification of 1365 TSS located within genes suggests that the production of intragenic RNAs (intraRNAs) is a widely distributed phenomenon in H. salinarum NRC-1 genome. Northern blot experiments confirmed the production of a transcript corresponding to the final portion of VNG_RS05220 gene and Western blot experiments also revealed that the translation of intraRNAs is responsible for producing small proteins corresponding to individual protein domains with important functional role in specific growth conditions. Analysis of TSS upstream to the coding regions of similar protein domains in bacteria and other archaea suggests that the production of coding intraRNAs is a widely distributed phenomenon in prokaryotes and may be responsible for the increased proteome diversity through the generation of protein isoforms from a unique gene. Finally, the RNA-seq data analysis, combined with a search for known signatures for transcription termination in archaea, allowed the identification of the final position of 58 genes. The present work help to give a more complete picture of H. salinarum transcriptional landscape and reveals the presence of new transcripts that can be widely distributed in prokaryotes, with important functional roles. Archaea Archaea Bioinformática Bioinformatics Inícios de transcrição Pervasive transcription RNA-seq RNA-seq Transcrição gênica Transcrição pervasiva Transcription Transcription start sites Transcriptome Transcritoma
8	Évolution des îlots CpG chez les primates / Evolution of CpG islands in Primates Guillet-Renard, Claire 07 October 2009 (has links) Cette thèse a pour l’objet l’étude des pressions de sélection qui s’appliquent sur les îlots CpG, courtes séquences génomiques qui échappent à la méthylation chez les mammifères. Nous avons tout d’abord étudié les caractéristiques génomiques des îlots CpG, notamment leurs liens avec l’initiation de transcription des gènes et les origines de réplication de l’ADN, en utilisant des jeux de données récemment publiés. Nous avons ensuite déterminé si les caractéristiques de séquence des îlots CpG (richesse en dinucléotides CpG et richesse en GC) étaient sous pression de sélection et pouvaient jouer un rôle dans les fonctions des îlots CpG. Nous avons montré que la richesse relative en dinucléotides CpG des îlots CpG résulte uniquement de la faible méthylation de ces séquences. De plus, la richesse en bases GC des îlots CpG n’est pas soumise à pression de sélection mais semble résulter d’un mécanisme neutre, la conversion génique biaisée vers GC. Nous discutons également du devenir des îlots CpG chez les primates, qui et avons montré que si le taux de GC de ces séquences est en train de diminuer, la richesse relative en CpG quant à elle reste stable / This thesis analyses selective pressures applying on CpG islands, short sequences which escape methylation in mammalian genomes. We first studied genomic characteristics of CpG islands. We namely studied their relationships with gene transcription start, and with DNA replication origins, using recently published data. We then determined wether base peculiar composition of CpG islands (high number of CpG dinucleotides, high GC content) may be under (negative or positive) selective pressures, and thus play a role in their function, or not. We showed that the relative CpG-richness of CpG islands is the mere consequence of the low methylation of these genomic regions. Moreover, we showed that the high GC content of CpG islands is not under selective pressures, and seem to result from a neutral mechanism, biased gene conversion toward GC. We also discussed the future of CpG islands and primates. We showed that the GC content of CpG islands is decreasing, while the relative CpG content remains constant Ilots CpG Méthylation de l' ADN Pression de sélection Conversion génique biaisée Initiation de transcription Réplication de l’ADN Primates CpG islands DNA methylation Selective pressures Biased gene conversion Transcription start DNA replication Primates 576.5
9	RNA-Seq and proteomics based analysis of regulatory RNA features and gene expression in Bacillus licheniformis Wiegand, Sandra 25 September 2013 (has links) No description available. 570 dRNA-Seq RNA-based regulation proteomics industrial production stress response sporulation lichenicidin RNA-Seq Subtilisin Carlsberg transcriptomics reannotation operon prediction differential gene expression antisense RNA transcription start site ncRNA UTR sRNA Geobacillus sp. GHH01 Biologie (PPN619462639)
10	O transcritoma antisense primário de Halobacterium salinarum NRC-1 / The antisense primary transcriptome of Halobacterium salinarum NRC-1 João Paulo Pereira de Almeida 04 September 2018 (has links) Em procariotos, RNAs antisense (asRNAs) constituem a classe de RNAs não codificantes (ncRNAs) mais numerosa detectada por métodos de avaliação de transcritoma em larga escala. Apesar da grande abundância, pouco se sabe sobre mecanismos regulatórios e aspectos da conservação evolutiva dessas moléculas, principalmente em arquéias, onde o mecanismo de degradação de RNAs dupla fita (dsRNAs) é um fenômeno pouco conhecido. No presente estudo, utilizando dados de dRNA-seq, identificamos 1626 inícios de transcrição primários antisense (aTSSs) no genoma de Halobacterium salinarum NRC-1, importante organismo modelo para estudos de regulação gênica no domínio Archaea. Integrando dados de expressão gênica obtidos a partir de 18 bibliotecas de RNA-seq paired-end, anotamos 846 asRNAs a partir dos aTSSs mapeados. Encontramos asRNAs em ~21% dos genes anotados, alguns desses relacionados a importantes características desse organismo como: codificadores de proteínas que constituem vesículas de gás e da proteína bacteriorodopsina, além de vários genes relacionados a maquinaria de tradução e transposases. Além desses, encontramos asRNAs em genes pertencentes a sistemas de toxinas-antitoxinas do tipo II e utilizando dados públicos de dRNA-seq, evidenciamos que esse é um fenômeno que ocorre em bactérias e arquéias. A interação de um ncRNA com seu RNA alvo pode ser dependente de proteínas, em arquéias, a proteína LSm é uma chaperona de RNA homóloga a Hfq de bactérias, implicada no controle pós-transcricional. Utilizamos dados de RIP-seq de RNAs imunoprecipitados com LSm e identificamos 91 asRNAs interagindo com essa proteína, para 81 desses, o mRNA do gene sense também foi encontrado interagindo. Buscando por aTSSs presentes nas mesmas regiões de genes ortólogos, identificamos 160 aTSSs que dão origem a asRNAs em H. salinarum possivelmente conservados em Haloferax volcanii. A expressão dos asRNAs anotados foi avaliada ao longo de uma curva de crescimento e em uma linhagem knockout de um gene que codifica uma RNase R, possível degradadora de dsRNAs em arquéias. Encontramos um total de 144 asRNAs diferencialmente expressos ao longo da curva de crescimento, para 56 desses o gene sense também está diferencialmente expresso, caracterizando possíveis mecanismos de regulação em cis por esses RNAs. Na linhagem knockout, encontramos cinco asRNAs diferencialmente expressos e apenas para um desses o gene sense também está diferencialmente expresso, resultado que não nos permitiu inferir um possível papel de degradação de dsRNAs da RNAse R em H. salinarum NRC-1. Nesse trabalho apresentamos um mapeamento completo do transcritoma antisense primário de H. salinarum NRC-1 com resultados que consistem em um importante passo na direção da compreensão do envolvimento da transcrição antisense na regulação gênica pós-transcricional desse organismo modelo do terceiro domínio da vida. / Antisense RNAs (asRNAs) constitute the most numerous class of non-coding RNAs (ncRNAs) detected by transcriptome highthroughput methods in prokaryotes. Despite this abundance, little is known about regulatory mechanisms and evolutionary aspects of these molecules, mainly in archaea, where the mechanism of double-strand RNA (dsRNA) degradation remains poorly understood. In this study, using dRNA-seq data, we identified 1626 antisense transcription start sites (aTSSs) in the genome of Halobacterium salinarum NRC-1, an important model organism for gene expression regulation studies in Archaea. By integrating gene expression data from 18 RNA-seq paired-end libraries, we were able to annotate 846 asRNAs from mapped aTSSs. We found asRNAs in ~21% of annotated genes including genes related to important characteristics of this organism, such as: gas vesicle proteins, bacteriorhodopsin, translation machinery and transposases. We also found asRNAs in type II toxin-antitoxin systems and using public dRNA-seq data, we show evidences that this phenomenon might be conserved in archaea and bacteria. The interaction of a ncRNA with its target may depend on intermediary proteins action. In archaea, the LSm protein is a RNA chaperone homologous to bacterial Hfq, involved in post-transcriptional regulation. We used RIP-seq data from RNAs immunoprecipitated with LSm and identified 91 asRNAs interacting with this protein, for 81 of these the mRNA of the sense gene is also interacting. We searched for aTSSs present in the same region of orthologous genes in the Haloferax volcanii. We found 160 aTSSs that originated asRNAs in H. salinarum NRC-1 that might be conserved in this two archaea. The expression of annotated asRNAs was analyzed over a growth curve and in a knockout strain for RNase R gene. We found 144 asRNA differentially expressed over the growth curve, for 56 of these the sense gene was also differentially expressed, characterizing possible cis regulators asRNAs. In the knockout strain we found five differentially expressed asRNAs and only one asRNA/gene pair, this result does not allow us to infer a dsRNA degradation in vivo activity for this RNase in H. salinarum NRC- 1. This work contributes to the discovery of the antisense transcriptome in H. salinarum NRC- 1 a relevant step to uncover the post-transcriptional gene regulatory network in this archaeon. Archaea Differential RNA-seq (dRNA-seq) Halobacterium salinarum Lsm RNAs antisense (asRNAs) RNAs dupla fita (dsRNAs) RNAs não codificantes (ncRNAs) RNAse R Sistemas toxinas-antitoxinas (TAs) Antisense RNAs (asRNAs) Archaea Differential RNA-seq (dRNA-seq) Double-strand RNAs (dsRNAs) Halobacterium salinarum LSm Non-coding RNAs (ncRNAs) RNAse R Toxins-antitoxins systems (TAs) Transcription start sites (TSSs)

Search results