Return to search

Workflows for the Large-Scale Assessment of miRNA Evolution: Birth and Death of miRNA Genes in Tunicates

As described over 20 years ago with the discovery of RNA interference (RNAi), double-stranded RNAs occupied key roles in regulation and as defense-line in animal cells. This thesis focuses on metazoan microRNAs (miRNAs). These small non-coding RNAs are distinguished from their small-interfering RNA (siRNA) relatives by their tightly controlled, efficient and flexible biogenesis, together with a broader flexibility to target multiple mRNAs by a seed imperfect base-pairing. As potent regulators, miRNAs are involved in mRNA stability and post-transcriptional regulation tasks, being a conserved mechanism used repetitively by the evolution, not only in metazoans, but plants and unicellular organisms.

Through a comprehensive revision of the current animal miRNA model, the canonical pathway dominates the extensive literature about miRNAs, and served as a scaffold to understand the scenes behind the regulatory landscape performed by the cell. The characterization of a diverse set of non-canonical pathways has expanded this view, suggesting a diverse, rich and flexible regulatory landscape to generate mature miRNAs. The production of miRNAs, derived from isolated or clustered transcripts, is an efficient and highly conserved mechanism traced back to animals with high fidelity at family level. In evolutionary terms, expansions of miRNA families have been associated with an increasing morphological and developmental complexity. In particular, the Chordata clade
(the ancient cephalochordates, highly derived and secondary simplified tunicates, and the well-known vertebrates) represents an interesting scenario to study miRNA evolution. Despite clearly conserved miRNAs along these clades, tunicates display massive restructuring events, including emergence of highly derived miRNAs.
As shown in this thesis, model organisms or vertebrate-specific bias exist in current animal miRNA annotations, misrepresenting more diverse groups, such as marine invertebrates. Current miRNA databases, such as miRBase and Rfam, classified miRNAs under different definitions and possessed annotations that are not simple to be linked. As an alternative, this thesis proposes a method to curate and merge those annotations, making use of miRBase precursor/mature annotations and genomes together with Rfam predicted sequences. This approach generated structural models for shared miRNA families, based on the alignment of their correct-positioned mature sequences as anchors. In this process, the developed structural curation steps flagged 33 miRNA families from the Rfam as questionable.

Curated Rfam and miRBase anchored-structural alignments provided a rich resource for constructing predictive miRNA profiles, using correspondent hidden Markov (HMMs) and covariance models (CMs). As a direct application, the use of those models is time-consuming, and the user has to deal with multiple iterations to achieve a genome-wide non-overlapping annotation. To resolve this, the proposed miRNAture pipeline provides an automatic and flexible solution to annotate miRNAs. It combines multiple homology approaches to generate the best candidates validated at sequence and structural levels. This increases the achievable sensitivity to annotate canonical miRNAs, and the evaluation against human annotation shows that clear false positive calls are rare and additional counterparts lie in retained-introns, transcribed lncRNAs or repeat families. Further development of miRNAture suggests an inclusion of multiple rules to distinguish non-canonical miRNA families.

This thesis describes multiple homology approaches to annotate the genomic information from a non-model chordate: the colonial tunicate Didemnum vexillum. Detected high levels of genetic variance and unexpected levels of DNA degradation were evidenced through a comprehensive analysis of genome-assembly methods and gene annotation. Despite those challenges, it was possible to find candidate homeobox and skeletogenesis- related genes. On its own, the ncRNA annotation included expected conserved families, and an extensive search of the Rhabdomyosarcoma 2-associated transcript (RMST) lncRNA family traced-back at the divergence of deuterostomes. In addition, a complete study of the annotation thresholds suggested variations to detect miRNAs, later implemented on the miRNAture tool. This chapter is a showcase of the usual workflow that should follow comprehensive sequencing, assembly and annotation project, in the light of the increasing research approaching DNA sequencing.

In the last 10 years, the remarkable increment in tunicate sequencing projects boosted the access to an expanded miRNA annotation landscape. In this way, a comprehensive homology approach annotated the miRNA complement of 28 deuterostome genomes (including current 16 reported tunicates) using miRNAture. To get proper structural models as input, corrected miRBase structural alignments served as a scaffold for building correspondent CMs, based on a developed genetic algorithm. By this means, this automatic approach selected the set of sequences that composed the alignments, generating 2492 miRNA CMs. Despite the multiple sources and associated heterogeneity of the studied genomes, a clustering approach successfully gathered five groups of similar assemblies and highlighted low quality assemblies. The overall family and loci reduction on tunicates is notorious, showing on average 374 microRNA (miRNA) loci, in comparison to other clades: Cephalochordata (2119), Vertebrata (3638), Hemichordata (1092) and Echinodermata
(2737). Detection of 533 miRNA families on the divergence of tunicates shows an expanded landscape regarding currently miRNA annotated families. Shared sets of ancestral, chordates, Olfactores, and specific clade-specific miRNAs were uncovered using a phyloge- netic conservation criteria. Compared to current annotations, the family repertories were expanded in all cases. Finally, relying on the adjacent elements from annotated miRNAs, this thesis proposes an additional syntenic support to cluster miRNA loci. In this way, the structural alignment of miR-1497, originally annotated in three model tunicates, was expanded with a clear syntenic support on tunicates.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:80259
Date01 August 2022
CreatorsVelandia Huerto, Cristian Arley
ContributorsUniversität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0154 seconds