Spelling suggestions: "subject:"encoding RNA."" "subject:"ancoding RNA.""
141 |
The evolution, modifications and interactions of proteins and RNAsSurappa-Narayanappa, Ananth Prakash January 2017 (has links)
Proteins and RNAs are two of the most versatile macromolecules that carry out almost all functions within living organisms. In this thesis I have explored evolutionary and regulatory aspects of proteins and RNAs by studying their structures, modifications and interactions. In the first chapter of my thesis I investigate domain atrophy, a term I coined to describe large-scale deletions of core structural elements within protein domains. By looking into truncated domain boundaries across several domain families using Pfam, I was able to identify rare cases of domains that showed atrophy. Given that even point mutations can be deleterious, it is surprising that proteins can tolerate such large-scale deletions. Some of the structures of atrophied domains show novel protein-protein interaction interfaces that appear to compensate and stabilise their folds. Protein-protein interactions are largely influenced by the surface and charge complementarity, while RNA-RNA interactions are governed by base-pair complementarity; both interaction types are inherently different and these differences might be observed in their interaction networks. Based on this hypothesis I have explored the protein-protein, RNA-protein and the RNA-RNA interaction networks of yeast in the second chapter. By analysing the three networks I found no major differences in their network properties, which indicates an underlying uniformity in their interactomes despite their individual differences. In the third chapter I focus on RNA-protein interactions by investigating post-translational modifications (PTMs) in RNA-binding proteins (RBPs). By comparing occurrences of PTMs, I observe that RBPs significantly undergo more PTMs than non-RBPs. I also found that within RBPs, PTMs are more frequently targeted at regions that directly interact with RNA compared to regions that do not. Moreover disorderedness and amino acid composition were not observed to significantly influence the differential PTMs observed between RBPs and nonRBPs. The results point to a direct regulatory role of PTMs in RNA-protein interactions of RBPs. In the last chapter, I explore regulatory RNA-RNA interactions. Using differential expression data of mRNAs and lncRNAs from mouse models of hereditary hemochromatosis, I investigated competing regulatory interactions between mRNA, lncRNA and miRNA. A mutual interaction network was created from the predicted miRNA interaction sites on mRNAs and lncRNAs to identify regulatory RNAs in the disease. I also observed interesting relations between the sense-antisense mRNA-lncRNA pairs that indicate mutual regulation of expression levels through a yet unknown mechanism.
|
142 |
Etude de la régulation transcriptionnelle de deux ARN régulateurs de Staphylococcus aureus : implication d'un facteur de transcription de la famille SarA / Transcriptional regulation study of two sRNAs in Staphylococcus aureus : involvement of a transcription factor from SarA familyMauro, Tony 09 March 2017 (has links)
Staphylococcus aureus est une bactérie pathogène portée par 30% de la population humaine. Cette bactérie agressive est responsable d'1/5ème des maladies acquises à l’hôpital (infections nosocomiales). Le passage d’un état commensal (portage) à un état infectieux implique le contrôle de l’expression de facteurs de virulence (toxines, adhésines…) ; ce qui nécessite un large arsenal de régulateurs bactériens comprenant des protéines (facteurs de transcription) et des ARN régulateurs (ARNrég). Parmi ces derniers, l’ARN Srn_3610_SprC est impliqué, entre autres, dans la prévention de la phagocytose et dans l’atténuation de la virulence de la bactérie. Or, cet ARN, dont l’expression est habituellement faible, se retrouve fortement exprimé durant les premières minutes de la phagocytose. Le but de cette thèse a été d’identifier les régulateurs transcriptionnels de srn_3610_sprC. SarA, un des facteurs de transcription majeur de S. aureus impliqué dans de nombreuses étapes clé de la virulence (antibiorésistance, formation de biofilm…), a été caractérisé comme le répresseur fort de l’expression de srn_3610_sprC. L’identification du site de fixation de SarA sur le promoteur de ce gène a permis de révéler un second ARNrég, Srn_9340, dont l’expression est également réprimée par SarA. Dans les 2 cas, SarA empêche la fixation de l’ARN polymérase sur leur promoteur, entrainant un faible niveau de transcription. La recherche du signal permettant l’induction de la transcription de ces gènes via le décrochage de SarA est en cours. En parallèle, les données de fixation de SarA sur ces 2 promoteurs ont permis d’identifier de nouvelles cibles de SarA. Nous poursuivrons cette recherche de cibles via une analyse à haut débit par RNASeq. / Staphylococcus aureus is a bacterial pathogen responsible for about 1/5 of health-care associated infections. Nevertheless, 30% of humans are healthy carriers of this bacterium. Switch from commensal to infectious mode requires that virulence factors (toxins, adhesins), involved in S. aureus pathogenicity, are regulated by transcription factors (TF) and small non-coding RNA (sRNA). One of these sRNA, Srn_3610_SprC, has a key-role in prevention of phagocytosis and in attenuation of S. aureus virulence. Whereas srn_3610_sprC is usually poorly expressed, its expression is up-regulated during the first minutes of phagocytosis process. The aim of this thesis was to identify TF regulating srn_3610_sprC expression. We characterized SarA, the main TF of S. aureus, as a repressor of srn_3610_sprC transcription. Following SarA binding site determination, we highlighted a second sRNA (Srn_9340) also transcriptionally repressed by SarA. For both sRNA, SarA prevents RNA polymerase binding on their promoters. The next challenge will be to determine SarA derepression signal allowing high level of sRNAs transcription. Meanwhile, researches on SarA binding sequences allowed us to identify new SarA targets. To better understand SarA functions in S. aureus (antibiotic resistance, biofilm formation), we are now initiating a global study for the determination of SarA targets.
|
143 |
Les ARN de transfert, une nouvelle source de petits ARN non-codants chez Arabidopsis thaliana / tRNAs a new source of small non-coding RNAs in Arabidopsis thalianaMorelle, Geoffrey 17 March 2015 (has links)
Au cours de ces 10 dernières années une nouvelle classe de petits ARN non-codants nommés "tRNA-derived fragments" (tRFs) a été caractérisée. Tandis que le rôle canonique des tRNA est bien connu, les raisons pour lesquels des fragments de tRNA s'accumulent dans la cellule restent inconnues. Actuellement, peu d'informations sont disponibles sur leurs biogenèses et leurs rôles biologiques, mais les preuves montrant leur importance dans la régulation de l'expression des gènes augmente régulièrement. Cependant, peu de données sont disponibles chez les plantes. A l'aide d’expérience de "deep-sequencing" et de northern blot nous avons confirmé l'existence d'une grande population en tRFs d'origine variée. A la suite de ces observations, trois questions sont établies. Tout d'abord, quelles sont les enzymes responsables de la biogenèse des tRFs. Ensuite, où les tRFs sont générés. Enfin, est-ce que les tRFs sont des sous-produits de la dégradation des tRNA ou ont-ils une fonction biologique? / During the last decade, a new class of small non-coding RNAs called tRNA-derived fragments (tRFs) has emerged. Whilst the canonic role of tRNA is well-known, the reason(s) why stable tRFs remains in the cell is unknown. Indeed, the number of tRFs has rapidly increased in various evolutionary divergent organisms. To date, only few data on their biogenesis and on their biological roles is known but their importance in the regulation of gene expression and in cell life is expanding. In plants, the existence of tRFs has also been reported but only few data are available. Using deep-sequencing on various small RNA libraries from Arabidopsis thaliana and Northern blots experiments, we confirmed the existence of a large but specific population of tRFs. Following these observations, three questions are addressed. First, what are the enzymes responsible for tRFs biogenesis, second where are tRFs generated and third, are tRFs merely degredation by-products or do they have biological functions?
|
144 |
Identification and characterization of microRNAs and their putative target genes in Anopheles funestus s.sAli, Mushal Allam Mohamed Alhaj January 2013 (has links)
Philosophiae Doctor - PhD / The discovery of microRNAs (miRNAs) is one of the most exciting scientific
breakthroughs in the last decade. miRNAs are short RNA molecules that do not encode proteins but instead, regulate gene expression. Over the past several years, thousands of miRNAs have been identified in various insect genomes through cloning and sequencing, and even by computational prediction. However, information concerning possible roles of miRNAs in mosquitoes is limited. Within this context, we report here the first systematic analysis of these tiny RNAs and their target mRNAs in one of the principal African malaria vectors, Anopheles funestus s.s. Firstly, to extend the known repertoire of miRNAs expressed in this insect, the small RNAs from the four developmental stages (egg, larvae, pupae
and the adult females), were sequenced using next generation sequencing
technology. A total of 98 miRNAs were identified, which included 65 known Anopheles miRNAs, 25 miRNAs conserved in other insects and 8 novel miRNAs that had not been reported in any species. We further characterized new variants for miR-2 and miR-927 and stem-loop precursors for miR-286 and miR-2944. The analysis showed that many miRNAs have stage-specific expression, and co-transcribed and co-regulated during development. Secondly, for a better understanding of the molecular details of the miRNAs function, we identified the target genes for the Anopheles miRNAs using a novel approach that identifies overlap genes among three target prediction tools followed by filtering genes based on functional enrichment of GO terms and KEGG pathways. We found that most of the miRNAs are metabolic regulators. Moreover, the results suggest implication
of some miRNAs not only in the development but also in insect-parasite interaction.
Finally, we developed the InsecTar database (http://insectar.sanbi.ac.za) for miRNA targets in the three mosquito species; Anopheles gambiae, Aedes aegypti, and Culex quinquefasciatus, which incorporates prediction and the functional analysis of these target genes. The proposed database will undoubtedly assist to explore the roles of these regulatory molecules in insects. This type of analysis is a key step towards improving our understanding of the complexity and regulationmode of miRNAs in mosquitoes. Moreover, this study opens the door for exploration of miRNA in regulation of critical physiological functions specific to vector arthropods which may lead to novel approaches to combat mosquito-borne infectious diseases.
|
145 |
Les longs ARN non codants, une nouvelle classe de régulateurs génomique tissu-spécifique : signature moléculaire spécifique des neurones dopaminergiques et sérotoninergiques / Long non coding RNA, a new class of tissu-specific genomic regulators : dopaminergic and serotoninergic neurons specific molecular signaturesGendron, Judith 30 October 2017 (has links)
Seul 1,2% du génome code des protéines :98,8% est non-codant,cependant 93% du génome est transcrit, principalement en longs ARN non-codants (lncRNA). Or ces lncRNA constituent une nouvelle classe de régulateurs génomique agissant à tous les niveaux d’expression des gènes et ils sont fortement spécifiques du tissu,modulés au cours du temps et en conditions physiopathologiques.Ainsi,nous proposons que chaque cellule spécifiée exprime son répertoire de lncRNA spécifique avec une carte des zones de chromatines ouvertes renseignant son identité cellulaire.Dans cette perspective,nous avons isolé par FACS 2types cellulaires impliqués dans des pathologies: i) des neurones dopaminergiques humains(nDA) différenciés à partir d’hiPS et ii) des neurones DA et sérotoninergiques (n5-HT)murins.Sur ces 2types neuraux isolés,nous avons identifié 1363 lncRNA exprimés dans les nDA (dont 989nouveaux) constituant le répertoire des neurones DA et 1257 lncRNA dans les n5-HT (719nouveaux) constituant le répertoire des n5-HT.Or leur comparaison a montré que seuls 194 lncRNA sont communs aux 2types cellulaires:la majorité des lncRNA est exprimée soit dans les nDA soit dans les n5-HT,attestant leur spécificité cellulaire.De plus,39%des zones de chromatines ouvertes/potentiellement régulatrices des nDA ne sont pas non plus retrouvées dans les n5-HT.Ainsi, nous avons généré un catalogue d’éléments non codants constituant des signatures moléculaires spécifiques des nDA et n5-HT,ouvrant de nouvelles pistes physiopathologiques:Dans cette optique,les signatures non codantes DA ont été comparées avec les SNP associés à la maladie de Parkinson et des études de fonction sur des lncRNA candidats ont été réalisées. / Only 1.2% of the genome codes for proteins; 98.8% is thus non-coding, despite 93% of the human genome being actively transcribed, mostly in long non-coding RNA (lncRNA).These lncRNA constitute a new class of genomic regulator capable of acting at all levels of gene expression and their expression is highly tissue-specific,modulated during the time and under normal/pathological conditions.Thus, we propose that each specified cell expresses a specific repertoire of lncRNA correlated to open/active chromatin regions specifying its cellular identity.In this context, we isolated by FACS 2neural types involved in many pathologies: i) human dopaminergic neurons (nDA) differentiated from hiPS and ii) DA and serotoninergic (n5-HT) neurons. From these 2neural types, we identified 1,363 lncRNA in nDA (among which 989 new, whether 73%) constituting the repertoire of nDA, and 1,257 lncRNA (among which 719 new) constituting the repertoire of n5-HT. Moreover,their comparison has shown that only 194 lncRNA are common to both neural types:thus the majority of lncRNA is expressed either in nDA or in n5-HT, indicating a high degree of cell-specificity.In addition, 39% of open chromatin regions, potentially regulatory, were also not detected in the n5-HT.Thus, we have generated DA and 5-HT specific catalogues of non-coding elements of the genome, which constitute DA and 5-HT specific molecular signatures, that could participate in deepening our knowledge regarding nDA or n5-HT development and dysfunctions. With this in mind,these DA specific elements have been compared with the SNP described as Parkinson Disease risk variants and candidate lncRNA were selected to perform studies of function.
|
146 |
Etude des éléments régulateurs de l'expression des gènes chez l'humain / Study of regulatory elements on gene expression in humansBessiere, Chloé 27 November 2018 (has links)
L'expression des gènes est étroitement régulée par différentes régions régulatrices afin d'assurer une grande variété de types cellulaires et de fonctions. Identifier ces régions régulatrices actives, leurs caractéristiques et comprendre comment elles interagissent entre elles dans chaque type cellulaire est un enjeu majeur. Cette connaissance permettrait notamment de mieux comprendre l'impact des variants génomiques très souvent localisés dans les régions non-codantes. Par ailleurs, le développement de cancers et autres maladies est lié à des dérégulations des contrôles de l'expression des gènes. Pour pouvoir envisager des traitements ciblés et tendre vers une médecine de précision, il est important de comprendre comment toute cette machinerie est orchestrée.Plusieurs approches ont été développées pour répondre à cette question, la plupart basées sur des données expérimentales de modification d'histones, méthylation et facteurs de transcription (TFs). Cependant, ces données sont limitées à des échantillons spécifiques et ne peuvent pas être générées pour tous les régulateurs et tous les patients. Mes travaux de thèse ont porté, dans une première partie, sur la modélisation de l'expression des gènes uniquement à partir de l'information contenue dans la séquence ADN. Nous avons utilisé un modèle linéaire avec sélection de variables, équivalent en terme de performances à des méthodes non paramétriques et simple à interpréter. Ce modèle m'a permis de comparer plusieurs types de variables basées sur la séquence ADN, comme les motifs de fixation des TFs et la composition nucléotidique. Ces variables sont déterminées pour différentes régions du gène afin d'évaluer leur pouvoir régulateur et leur contribution. Les introns seuls, dont la composition nucléotidique reflète celle de l'environnement du gène, expliquent une part importante de la variation de l'expression des gènes. De plus, nous avons démontré que les domaines topologiques (TADs), dans lesquels les interactions sont favorisées, partagent une composition génomique similaire. Notre modèle de prédiction nous permet vraisemblablement de capturer, pour chaque individu, la composition des TADs actifs.Dans un second temps de mon travail, je me suis intéressée aux régulations pouvant survenir dans les introns. Le consortium international FANTOM a fourni un des atlas de sites de départ de la transcription (TSSs) les plus importants à ce jour et nous avons noté que la majorité d'entre eux sont détectés dans les régions non-codantes, notamment les introns. Nous avons donc entrepris un travail visant à explorer ces TSS introniques. Pour déterminer si ces TSSs sont fonctionnels, je me suis intéressée à la recherche de potentiels motifs régulateurs autour de ces signaux de transcription. Une fraction de ces signaux sont localisés 2 bases en aval d'une répétition de Thymidines (T). Des évidences biochimiques et génétiques suggèrent qu'au moins une partie de ces signaux correspondent à de longs ARNs non-codants sens-introniques exprimés de manière tissu-spécifique. Il semblerait également que la longueur des répétitions de Ts ait une influence sur la présence d'un signal de transcription au niveau de ces loci et, indirectement, sur l'expression du gène hôte. Ces observations offrent une possible base moléculaire à l'effet de ces courtes répétitions en tandem de T. / Genome expression is tightly controlled by different regulatory regions to provide a wide variety of cell types and functions. Identifying these regulatory regions, their characteristics and understand how they interact with each other in a tissue-specific manner is prime importance. This knowledge should help better understand the impact of genomic variants often located in non-coding regions. Besides, cancer development is invariably linked to deregulation of gene expression controls. To pave the way for targeted treatments and precision medicine, it is important to understand how all this machinery is orchestrated.To answer this question, several approaches were developed, most of them based on experimental data of histone modification, methylation and transcription factors (TFs). However, these data are limited to specific samples and cannot be generated for all the regulators and all the patients. First, my thesis research aimed at modeling gene expression based on DNA sequence only. We used a linear model with variable selection, equivalent in term of performances with non-parametric methods and easy to interpret. This model allowed me to compare several types of variables based on the DNA sequence, as TFs binding motifs and nucleotide composition. These variables are computed for various gene regions to estimate their regulatory power and contribution. Strikingly, introns, for which nucleotide composition reflects gene environment, appear to explain an important part of gene expression variation. Furthermore, we demonstrated that the topological domains (TADs), in which interactions are favored, share similar genomic compositions. Our prediction model presumably captures, for every individual, the composition of active TADs.A second aspect of my work studied the regulations occurring in introns. The international FANTOM consortium provided one of the most important transcription start sites (TSSs) atlas and we noticed that the majority of these TSSs are detected into non-coding regions, in particular introns. We thus investigated these intronic TSSs. To determine if these TSSs are functional, we searched for new potential regulatory motifs at the vicinity of these transcription signals. We found that a fraction of them is located 2 bases downstream of a repetition of Ts. Biochemical and genetic evidences suggest that at least part of these signals correspond to sense-intronic long non-coding RNAs, which are expressed in a tissue specific manner. The length of the T repetition also appears to govern the presence of a transcription signal at these loci and indirectly impact on host gene expression. These findings provide one possible molecular explanation for the effect of these short tandem repeats of Ts.
|
147 |
Régulation de la télomérase dans un modèle de leucémie aigue promyélocytaire : rôle de l'ARN long non codant H19 / Regulation of telomerase in a model of acute promyelocytic leukemia : role of the long non coding RNA H19El hajj, Joelle 17 May 2018 (has links)
Le couple télomère/télomérase apparaît comme une cible prometteuse pour de potentiels agents anticancéreux qui seraient actifs sur un large éventail de tumeurs. Le laboratoire d’accueil a montré dans un modèle de leucémie aiguë promyélocytaire (LAP), qu'un agent utilisé en clinique, l'acide rétinoïque (ATRA), exerce une activité anti-tumorale en réprimant la transcription de la sous-unité catalytique hTERT indépendamment de la différenciation. Ce modèle (NB4) avec ses variants cellulaires résistant (NB4-LR1SFD) ou non à la répression de hTERT (NB4-LR1) par l’ATRA constitue un outil de choix pour l’identification de facteurs régulateurs de hTERT et la recherche des bases moléculaires de sa réactivation.Une approche transcriptomique a été utilisée afin d’identifier de nouveaux gènes et/ou réseaux de signalisation induits par l’ATRA et régulateurs de hTERT. L’analyse bioinformatique nous a permis de construire des profils d’expression différentielle entre les 2 lignées et des réseaux d’interaction. Parmi les candidats, H19, un ARN long de 2.5Kb, polyadénylé et non codant. H19 est classé parmi les gènes supresseurs de tumeurs : en son absence il y a développement de cancer (cas de la tumeur de Wilms, rhabdomyosarcome embryonnaire, Syndrome Beekwith-Wiedman) ; sa réintroduction par transfection conduit à une perte de tumoriginicité. Cependant H19 est reconnu de plus en plus comme un oncogène vu que son expression est élevée dans plusieurs types de cancers solides. Par contre peu d’études s’intéressent au rôle de H19 dans les leucémies, d’où notre intérêt pour l’étudier dans le modèle LAP que nous avons développé.Nous avons mis au point la mesure d’expression de H19 par RT-PCR quantitative, validé les données obtenues dans l’analyse transcriptomique et montré que le traitement ATRA induit l’expression de H19 dans les cellules NB4-LR1 alors que cette expression est plutôt diminuée dans les cellules NB4-LR1SFD. L’induction observée dans les cellules NB4-LR1 existe indépendamment de la différenciation. Par contre, cette induction peut être observée associée à la différenciation ou à l’apoptose dans la lignée cellulaire NB4-LR1SFD parallèlement à une diminution importante de l’expression de hTERT. Ce résultat important montre que la lignée NB4-LR1SFD ne présente pas de défaut général d’induction de H19. Ces données suggèrent l’existence d’une corrélation inverse entre le niveau d’expression de hTERT et celui de H19 dans ce modèle cellulaire. De façon importante, l’analyse des banques de données issues de patients LAP publiquement accessibles retrouve cette corrélation inverse.Une diminution d’activité télomérasique est observée dans des extraits cellulaires incubés en présence de l’ARN H19 transcrit in vitro. Cette diminution d’activité est observée aussi après surexpression de H19 in cellulo. Les expériences de RIP (RNA immunoprecipitation) ont montré une diminution de la quantité de hTR lié à hTERT suite à une augmentation d’expression de H19 après traitement ATRA in vitro ou après surexpression de H19 in cellulo. Une hypothèse serait que H19 induirait un déplacement de hTR du complexe hTR-hTERT. Cependant, les expériences de « pull-down » n’ont pas réussi à confirmer l’hypothèse d’une interaction possible entre l’ARN H19 et la protéine TERT.Mon travail de thèse identifie pour la première fois H19, un ARN long non codant, comme facteur régulateur potentiel de hTERT pouvant modifier son activité. Ce travail proposerait non seulement un mécanisme nouveau de régulation de l’activité télomérase mais aussi une fonction nouvelle pour H19 dans ce type de cancer. / The telomere / telomerase pair appears to be a promising target for potential anticancer agents that would be active on a wide range of tumors. The host laboratory has shown in a model of acute promyelocytic leukemia (APL), that a clinically used agent, retinoic acid (ATRA), exerts anti-tumor activity by repressing the transcription of the catalytic subunit hTERT regardless of differentiation. This model (NB4) with its resistant cell variants (NB4-LR1SFD) or not to the repression of hTERT (NB4-LR1) by ATRA is a tool of choice for the identification of hTERT regulatory factors and the search for molecular bases of its reactivation.A "microarray" approach has been used to identify new ATRA-mediated genes and / or signaling networks and potential hTERT regulators. Bioinformatic analysis allowed us to build differential expression profiles between the 2 lineages and interaction networks. Among the candidates, H19, a 2.5Kb long, polyadenylated and non-coding RNA. H19 is classified as a tumor suppressor gene: in its absence there is cancer development (case of Wilms tumor, embryonic rhabdomyosarcoma, Beckwith-Wiedman syndrome); its reintroduction by transfection leads to a loss of tumorigenicity. However H19 is increasingly recognized as an oncogene as its expression is elevated in several types of solid cancers. However, few studies are interested in the role of H19 in leukemias, hence our interest in studying it in the APL model that we have developed.We developed the H19 expression measurement by quantitative RT-PCR, validated the data obtained in the "microarray" analysis and showed that the ATRA treatment induces the expression of H19 in NB4-LR1 cells whereas this expression is rather diminished in NB4-LR1SFD cells. The induction observed in NB4-LR1 cells exists independently of differentiation. On the other hand, this induction can be observed associated with the differentiation or apoptosis in the NB4-LR1SFD cell line in parallel with a significant decrease in the expression of hTERT. This important result shows that the NB4-LR1SFD line does not have a general H19 induction defect. These data suggest the existence of an inverse correlation between the expression level of hTERT and that of H19 in this cellular model. Importantly, the analysis of publicly accessible APL patients’ databases finds this inverse correlation as well.We observed a decrease in telomerase activity in cellular extracts incubated in the presence of in vitro transcribed H19 RNA. This decrease in activity was also observed after overexpression of H19 in cellulo. The RIP (RNA immunoprecipitation) experiments showed a decrease in hTR amount bound to hTERT following an increase in H19 expression after ATRA treatment in vitro or after overexpression of H19 in cellulo. We hypothesize that H19 induces a displacement of hTR from the hTR-hTERT complex. However, the "pull-down" experiments failed to confirm the hypothesis of a possible interaction between H19 RNA and TERT protein.My thesis work identifies, for the first time, the long non-coding RNA H19, as a potential regulator of hTERT that can modify its activity. This work would propose not only a new mechanism of regulation of telomerase activity but also a new function for H19 in this type of cancer.
|
148 |
Improved Workflows for RNA Homology SearchYazbeck, Ali 24 July 2019 (has links)
Non-coding RNAs are the most abundant class of RNAs found throughout
genomes. These RNAs are key players of gene regulation and thus, the func-
tion of whole organisms. Numerous methods have been developed so far for
detecting novel classes of ncRNAs or finding homologs to the known ones.
Because of their abundance, the sequence availability of these RNAs is rapidly
increasing, as is the case for example for microRNAs. However, for classes of
them, still only incomplete information is available, invertebrates 7SK snRNA
for instance. Consequently, a lot of false positive outputs are produced in
the former case, and more accurate annotation methods are needed for the
latter cases to improve derivable knowledge. This makes the accuracy of
gathering correct homologs a challenging task and it leads directly to a not
less important problem, the curation of these data.
Finding solutions for the aforementioned problems is more complex than one
would expect as these RNAs are characterized not only by sequences informa-
tion but also structure information, in addition to distinct biological features.
In this work, data curation methods and sensitive homology search are shown
as complementary methods to solve these problems. A careful curation and
annotation method revealed new structural information in the invertebrates
7SK snRNA, which pushes the investigation in the area forward. This has
been reflected by detecting new high potential 7SK RNA genes in different
invertebrates groups. Moreover, the gaps between homology search and well-
curated data on the one side, and between experimental and computational
outputs on the other side, are closed. These gaps were bridged by a curation
method applied to the microRNA data, which was then turned into a com-
prehensive workflow implemented into an automated pipeline. MIRfix is a
microRNA curation pipeline considering the detailed sequence and structure
information of the metazoan microRNAs, together with biological features
related to the microRNA biogenesis. Moreover, this pipeline can be integrated
into existing methods and tools related to microRNA homology search and
data curation. The application of this pipeline on the biggest open source
microRNA database revealed its high capacity in detecting wrong annotated
pre-miRNA, eventually improving alignment quality of the majority of the
available data. Additionally, it was tested with artificial datasets highlighting
the high accuracy in predicting the pre-miRNA components, miRNA and
miRNA*.:Chapter 1: Introduction
Chapter 2: Biological and Computational background
2.1 Biology
2.1.1 Non-coding RNAs
2.1.2 RNA secondary structure
2.1.3 Homology versus similarity
2.1.4 Evolution
2.2 The role of computational biology
2.2.1 Alignment
2.2.1.1 Pairwise alignment
2.2.1.2 Multiple sequence alignment (MSA)
2.2.2 Homology search
2.2.2.1 Sequence-based
2.2.2.2 Structure-based
2.2.3 RNA secondary structure prediction
Chapter 3: Careful curation for snRNA
3.1 Biological background
3.2 Introduction to the problem
3.3 Methods
3.3.1 Initial seeds and models construction
3.3.2 Models anatomy then merging
3.4 Results
3.4.1 Refined model of arthropod 7SK RNA
3.4.1.1 5’ Stem
3.4.1.2 Extension of Stem A
3.4.1.3 Novel stem B in invertebrates
3.4.1.4 3’ Stem
3.4.2 Invertebrates model conserves the HEXIM1 binding site
3.4.3 Computationally high potential 7SK RNA candidate .
3.4.4 Sensitivity of the final proposed model
3.5 Conclusion
Chapter 4: Behind the scenes of microRNA driven regulation
4.1 Biological background
4.2 Databases and problems
4.3 MicroRNA detection and curation approaches
Chapter 5: Initial microRNA curation
5.1 Introduction
5.2 Methods
5.2.1 Data pre-processing
5.2.2 Initial seeds creation
5.2.3 Main course
5.3 Results and discussion
5.4 Conclusion
Chapter 6: MIRfix pipeline
6.1 Introduction
6.2 Methods
6.2.1 Inputs and Outputs
6.2.2 Prediction of the mature sequences
6.2.3 The original precursor and its alternative
6.2.4 The validation of the precursor
6.2.5 Alignment processing
6.3 Results and statistics
6.4 Applications
6.4.1 Real life examples and artificial data tests
6.4.2 miRNA and miRNA* prediction
6.4.3 Covariance models
6.5 Conclusion
Chapter 7: Discussion
|
149 |
The Effect of RNA Secondary Structures on RNA-Ligand Binding and the Modifier RNA Mechanism: A Quantitative ModelHackermüller, Jörg, Meisner, Nicole-Claudia, Auer, Manfred, Jaritz, Markus, Stadler, Peter F. 31 January 2019 (has links)
RNA-ligand binding often depends crucially on the local RNA secondary structure at the binding site. We develop here a model that quantitatively predicts the effect of RNA secondary structure on effective RNA-ligand binding activities based on equilibrium thermodynamics and the explicit computations of partition functions for the RNA structures. A statistical test for the impact of a particular structural feature on the binding affinities follows directly from this approach. The formalism is extended to describing the effects of hybridizing small \modifier RNAs' to a target RNA molecule outside its ligand binding site. We illustrate the applicability of our approach by quantitatively describing the interaction of the mRNA stabilizing protein HuR with AU-rich elements [Meisner et al. (2004), Chem. Biochem. in press]. We discuss our model and recent experimental findings demonstrating the ffectivity of modifier RNAs in vitro in the context of the current research activities in the field of non-coding RNAs. We speculate that modifier RNAs might also exist in nature; if so, they present an additional regulatory layer for fine-tuning gene expression that could evolve rapidly, leaving no obvious traces in the genomic DNA sequences.
|
150 |
Non-coding RNA annotation of the genome of Trichoplax adhaerensHertel, Jana, de Jong, Danielle, Marz, Manja, Rose, Dominic, Tafer, Hakim, Tanzer, Andrea, Schierwater, Bernd, Stadler, Peter F. 04 February 2019 (has links)
A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs—in particular rRNAs, tRNAs and some snRNAs—we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome.
|
Page generated in 0.0705 seconds