• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Computational Characterization of Long Non-Coding RNAs

Sen, Rituparno 23 June 2021 (has links)
In a cell, the DNA undergoes transcription to form mature transcripts, some of which in turn undergo translation to form proteins. Although over 85% of the human genome is transcribed, it comprises only about 2% protein-coding genes, the rest being noncoding. One of the non-coding gene elements, called long non-coding RNAs (lncRNAs), are emerging as key players in various regulatory roles in the human genome. The generally accepted theory posits lncRNAs to be over 200 nucleotides long and to be able to grow over 10 kilobases, bearing a similarity with mRNAs. The majority of lncRNAs undergo alternative splicing and are weakly polyadenylated in combination with complex secondary structures. Among the annotated lncRNAs, so far it has been only a meagre portion for which functional roles have been detected, while functions of the vast majority remain to be discovered. Observed functional roles include thus far gene expression regulation through various mechanisms at transcriptional and post-transcriptional levels. With the advent of next-generation sequencing (NGS) and advances in RNA sequencing technology (RNA-Seq), it is easier to reconstruct the transcriptome by extracting information about the splicing machinery. RNA-Seq has helped consortia like GENCODE, ENCODE, and others to curate their annotation catalogues. In this PhD thesis, certain aspects of the human lncRNA transcriptome will be explored, such as the challenges in lncRNA annotation. Those challenges stem from the lack of signals that are common in mRNAs and make them easier to detect, for instance signals of ORFs and transcription start sites. Concurrently, owing to a lack of understanding of the connection between sequence and function, lncRNAs have been typically annotated based upon their location in relation to mRNAs and their functions have been predicted through a guilt-by-association approach. In the first part of the PhD research work, the splice junctions in the lncRNA transcriptome were mapped in an attempt to explore the isoform diversity of lncRNAs by using sequencing data from B-cell lymphoma. In this phase of the research work, multiple junction-spanning reads from the sequencing data with a very large read depth were found to represent the splice junctions. Using GENCODE v19 as a reference it was found that the human transcriptome harbours a large number of rare exons and introns that have remained unannotated. Concomitantly, it can be inferred that the current human transcriptome annotation is confined to a very well-defined set of splice variants. However, although the isoforms are well-defined, the same cannot be said about their biological functions and it remains to be explored why the processing machinery of lncRNAs is restricted to a set of very few splice sites. In the human genome, small regulatory RNAs like miRNAs and small nucleolar RNAs (snoRNAs) overlap with lncRNAs in their genomic loci. To further understand the human transcriptome, in the second part of the PhD research work, a study was undertaken in an attempt to distinguish the miRNA and snoRNA hosting lncRNAs from the lncRNAs that did not have any overlaps with the smaller RNAs. To this end, machine learning techniques were implemented on curated datasets employing features inspired by a few of the prevalent features used in published lncRNA detection tools encompassing not just sequence information, but also secondary structure and conservation information. Classification was attempted through supervised as well as unsupervised learning approaches; random forests for the former, PCA and k-means for the latter. In the end, the three RNA classes could not be separated with certitude, especially when the hosted RNA was not supplied to the classifier, however, this lack of detectable association can be confirmed to be of biological interest. It suggests that the function of host genes is not closely tied to the function of the hosted genes at least in this case. Nevertheless, understanding the dynamics of snoRNA and miRNA host genes can improve the knowledge of functional evolution of lncRNAs, as the fact that the smaller RNA genes are conserved makes it comparably easier to trace the host lncRNAs over much larger evolutionary timescales than most other lncRNAs. With the accelerated availability of sequencing techniques it can be expected that expanded investigation into conservation patterns and host gene functions will be possible in the near future.
2

Discovery of the role of protein-RNA interactions in protein multifunctionality and cellular complexity / Découverte du rôle des interactions protéine-ARN dans la multifonctionnalité des protéines et la complexité cellulaire

Ribeiro, Diogo 05 December 2018 (has links)
Au fil du temps, la vie a évolué pour produire des organismes remarquablement complexes. Pour faire face à cette complexité, les organismes ont développé une pléthore de mécanismes régulateurs. Par exemple, les mammifères transcrivent des milliers d'ARN longs non codants (ARNlnc), accroissant ainsi la capacité régulatrice de leurs cellules. Un concept émergent est que les ARNlnc peuvent servir d'échafaudages aux complexes protéiques, mais la prévalence de ce mécanisme n'a pas encore été démontrée. De plus, pour chaque ARN messager, plusieurs régions 3’ non traduites (3’UTRs) sont souvent présentes. Ces 3’UTRs pourraient réguler la fonction de la protéine en cours de traduction, en participant à la formation des complexes protéiques dans lesquels elle est impliquée. Néanmoins, la fréquence et l’importance ce mécanisme reste à aborder.Cette thèse a pour objectif de découvrir et comprendre systématiquement ces deux mécanismes de régulation méconnus. Concrètement, l'assemblage de complexes protéiques promus par les ARNlnc et les 3'UTRs est étudié avec des données d’interactions protéines-protéines et protéines-ARN à grande échelle. Ceci a permis (i) de prédire le rôle de plusieurs centaines d'ARNlnc comme molécules d'échafaudage pour plus de la moitié des complexes protéiques connus, ainsi que (ii) d’inférer plus d’un millier de complexes 3'UTR-protéines, dont certains cas pourraient réguler post-traductionnellement des protéines moonlighting aux fonctions multiples et distinctes. Ces résultats indiquent qu'une proportion élevée d'ARNlnc et de 3'UTRs pourrait réguler la fonction des protéines en augmentant ainsi la complexité du vivant. / Over time, life has evolved to produce remarkably complex organisms. To cope with this complexity, organisms have evolved a plethora of regulatory mechanisms. For instance, thousands of long non-coding RNAs (lncRNAs) are transcribed by mammalian genomes, presumably expanding their regulatory capacity. An emerging concept is that lncRNAs can serve as protein scaffolds, bringing proteins in proximity, but the prevalence of this mechanism is yet to be demonstrated. In addition, for every messenger RNA encoding a protein, regulatory 3’ untranslated regions (3’UTRs) are also present. Recently, 3’UTRs were shown to form protein complexes during translation, affecting the function of the protein under synthesis. However, the extent and importance of these 3’UTR-protein complexes in cells remains to be assessed.This thesis aims to systematically discover and provide insights into two ill-known regulatory mechanisms involving the non-coding portion of the human transcriptome. Concretely, the assembly of protein complexes promoted by lncRNAs and 3’UTRs is investigated using large-scale datasets of protein-protein and protein-RNA interactions. This enabled to (i) predict hundreds of lncRNAs as possible scaffolding molecules for more than half of the known protein complexes, as well as (ii) infer more than a thousand distinct 3’UTR-protein complexes, including cases likely to post-translationally regulate moonlighting proteins, proteins that perform multiple unrelated functions. These results indicate that a high proportion of lncRNAs and 3’UTRs may be employed in regulating protein function, potentially playing a role both as regulators and as components of complexity.

Page generated in 0.0959 seconds