Spelling suggestions: "subject:"long real"" "subject:"long red""
1 |
Development of a bioinformatics approach for the functional analysis of alternative splicingFuente Lorente, Lorena de la 02 September 2019 (has links)
[ES] Uno de los aspectos más apasionantes de la transcripción es la plasticidad transcriptómica y proteómica mediada por los procesos de regulación post-transcripcional (PTR). Los mecanismos PTR como el splicing alternativo (AS) y la poliadenilación alternativa (APA) han emergido como procesos estrechamente regulados que juegan un papel clave en la generación de la complejidad transcriptómica y están asociados con la coordinación de la diferenciación celular o el desarrollo de tejidos. Sin embargo nuestro conocimiento sobre cómo estos mecanismos regulan las propiedades de los productos resultantes para definir el fenotipo es aún muy reducido. La cantidad de variantes existentes y el amplio rango de posibles consecuencias funcionales, hacen su validación funcional una tarea impracticable si se realiza caso por caso. Además, la falta de herramientas para la evaluación funcional orientada a isoformas ha provocado que gran parte del trabajo computacional haya empleado pipelines ad-hoc aplicadas a sistemas biológicos específicos o simplemente hayan confiado en análisis de enriquecimiento GO, los cuales no son informativos del impacto en las propiedades de las isoformas que hay detrás de la regulación PTR.
De hecho, a pesar de las más de sesenta mil publicaciones relativas al AS, muy pocas isoformas se han asociado con propiedades específicas, mientras que el número de nuevas variantes AS/APA con function desconocida crece exponencialmente debido a las técnicas de secuenciación de segunda generación (NGS). Además, y debido a limitaciones técnicas de las NGS para reconstruir la estructura de los transcritos, las tecnologías de secuenciación de tercera generación (TGS) están definiendo una nueva era en la que, por primera vez, es posible conocer la secuencia de elementos estructurales y funcionales en los mRNAs.
En esta tesis se han abordado tres propósitos principales para poder avanzar en el estudio funcional de las isoformas. En primer lugar, con las TGS siendo cada vez más utilizadas, la evaluación de la calidad de los transcriptomas \textit{de novo} es esencial para asegurar la fiabilidad de la diversidad transcriptómica encontrada. La falta de análisis de calidad orientados a secuencias largas ha motivado el desarrollo de SQANTI, una pipeline automatizado para la exhaustiva evaluación de TGS transcriptomas. En segundo lugar, la información a nivel de gen de la mayoría de bases de datos funcionales sigue siendo el principal escollo para el estudio de la variabilidad entre isoformas, especialmente en el caso de las isoformas nuevas, en las que las bases de datos estáticas impiden su caracterización. Así, hemos diseñado IsoAnnot, que construye una base de datos de anotaciones funcionales con resolución a nivel de isoformas integrando información diseminada por múltiples bases de datos y métodos de predicción. Finalmente, la indisponibilidad de métodos para estudiar el impacto funcional de la regulación de isoformas, nos ha motivado a desarrollar tappAS, una herramienta dinámica, flexible y diseñada para facilitar el abordaje de este tipo de estudios.
Por lo tanto, durante esta tesis hemos desarrollado una infraestructura que resuelve los retos principales del análisis funcional de isoformas, proporcionando un conjunto de nuevos métodos y herramientas que ofrecen una oportunidad única para explorar cómo el fenotipo se especifica post-transcripcionalmente, mediante la alteración de las propiedades funcionales de las isoformas expresadas. La aplicación de nuestro análisis a un doble sistema de diferenciación neuronal en ratón definió el efecto de la regulación de isoformas entre la diferenciación de motoneuronas y oligodendrocitos para múltiples elementos funcionales. Entre ellos, hemos descubierto regiones transmembrana que son diferencialmente incluidas en las isoformas expresadas entre ambos tipos celulares y cuya regulación podría estar contribuyendo al control de / [CA] Un dels aspectes més emocionants de la biologia del transcriptoma és l'adaptabilitat contextual de transcriptomes i proteomes eucariotes mitjançant la regulació post-transcripcional (PTR). Els mecanismes PTR, com el splicing alternatiu (AS) i la poliadenilació alternativa (APA), s'han convertit en processos molt regulats que juguen un paper clau en la generació de la complexitat del transcriptoma i en la coordinació de la diferenciació cel·lular o del desenvolupament de teixits. No obstant això, el nostre coneixement de com aquests mecanismes imprimeixen característiques funcionals diferents al conjunt resultant d'isoformes per definir el fenotip observat és encara escàs. El nombre de variants de PTR i les seues conseqüències potencialment funcionals fa que la validació funcional sigui una tasca poc pràctica si es fa cas per cas. A més, la manca d'enfocaments funcionals orientats a isoformes ha fet que gran part del treballs computacionals per esbrinar qüestions funcionals a nivell de transcriptoma siguen estratègies computacionals ad hoc aplicades a sistemes biològics específics o bé basats en un simple anàlisi d'enriquiment GO, que no aporten informació sobre l'impacte de la PTR sobre les propietats de les isoformes.
Així, malgrat les més de 60.000 publicacions existents sobre AS, poques de les isoformes existents s'han associat a propietats específiques, mentre que el nombre de noves variants AS/APA amb funcions desconegudes i fins i tot inexplorades augmenta de manera exponencial gràcies a la seqüenciació de nova generació (NGS). A causa de les limitacions tècniques del NGS per reconstruir l'estructura dels transcrits, la seqüenciació d'alt rendiment de transcrits de longitud completa mitjançant tecnologies de tercera generació (TGS) obre una nova era en la transcriptòmica, ja que millora la definició dels models genètics i, per primera vegada, permet associar amb precisió esdeveniments funcionals dins de la molècula d'ARN.
Aquesta tesi aborda tres grans reptes per a progressar en l'estudi de la funció de les isoformes. En primer lloc, amb l'aparició i la popularitat creixent del TGS, la definició precisa i la caracterització completa dels transcriptomes de novo són essencials per garantir la qualitat de qualsevol conclusió sobre la diversitat del transcriptoma. La manca d'anàlisis de qualitat orientats a lectures llargues va motivar el desenvolupament de SQANTI (https://bitbucket.org/ ConesaLab / sqanti), una estratègia computacional automatitzada per a la caracterització estructural i l'avaluació de la qualitat dels transcriptomes de longitud completa. En segon lloc, els recursos funcionals existents centrats en el gen suposen una gran limitació per a l'estudi extensiu de la variabilitat funcional de les isoformes, especialment en les noves isoformes, que no es poden caracteritzar per bases de dades estàtiques. Per tant, vam dissenyar IsoAnnot, que construeix dinàmicament una base de dades amb anotacions funcionals a nivell d'isoforma, que utilitza com a informació d'entrada les seqüències dels transcrits i integra informació de diverses bases de dades i mètodes de predicció. Finalment, com no hi havia cap mètode per interrogar l'impacte funcional del PTR, vam desenvolupar nous enfocaments i eines fàcils d'utilitzar, com ara tappAS (http://tappas.org/), dissenyada per facilitar als investigadors els estudis funcionals de transcriptoma complet i de regulació d'isoformes en contexts específics.
Per tant, aquesta tesi descriu el desenvolupament d'un marc d'anàlisi que aborda els reptes fonamentals de l'anàlisi funcional d'isoformes. Aplicada a un sistema de diferenciació neuronal murina, vam descobrir regions transmembrana específiques d'isoformes, la modulació de les quals per PTR podria contribuir a controlar la dinàmica mitocondrial específica del tipus cel·lular durant la determinació del destí neuronal. / [EN] One of the most exciting aspects of transcriptome biology is the contextual adaptability of eukaryotic transcriptomes and proteomes by post-transcriptional regulation (PTR). PTR mechanisms such as alternative splicing (AS) and alternative polyadenylation (APA) have emerged as tightly regulated processes playing a key role in generating transcriptome complexity and coordinating cell differentiation or tissue development. However, how these mechanisms imprint distinct functional characteristics on the resulting set of isoforms to define the observed phenotype remains poorly understood. The number of PTR variants and their resulting range of potentially functional consequences makes their functional validation an impractical task if done on a case-by-case basis. Besides, the lack of isoform-oriented functional profiling approaches has made that much of the computational work done to elucidate transcriptome-wide functional questions has either involved ad hoc computational pipelines applied to specific biological systems or has relied on simple GO-enrichment analysis that are not informative about the PTR impact on isoform properties.
Thus, even though more than 60,000 publications on AS, a few number of existing isoforms have been associated with specific properties while the number of novel AS/APA variants with unknown and even unexplored functions is exponentially increasing thanks to the use of next-generation sequencing (NGS). Due to the technical limitations of NGS to reconstruct the transcript structure, high-throughput sequencing of full-length transcripts using third-generation technologies (TGS) is opening up a new transcriptomics era that enhances the definition of gene models and, for the first time, enables to precisely associate functional events within the RNA molecule.
This thesis addresses three major challenges to the progression of the study of isoform function. First, with the emergence and increasing popularity of TGS, the accurate definition and comprehensive characterisation of de novo transcriptomes is essential to ensure the quality of any conclusions on transcriptome diversity drawn from these data. The lack of long-read oriented quality aware analysis motivated the development of SQANTI \url{(https://bitbucket.org/ConesaLab/sqanti)}, an automated pipeline for the structural characterization and quality assessment of full-length transcriptomes. Secondly, the gene-centric nature of functional resources remained the major limitation to the extended study of functional isoform variability, especially for novel isoforms, which cannot be characterised by static databases. Thus, we designed IsoAnnot, which dynamically constructs an isoform-resolved rich database of functional annotations by using as input transcript sequences and integrating information disseminated across several databases and prediction methods. Finally, because no methods to interrogate the functional impact of PTR were available, we developed novel approaches and user-friendly tools such as tappAS \url{(http://tappas.org/)}, designed to facilitate researchers the transcriptome-wide functional study of context-specific isoform regulation.
Thereby, this thesis describes the development of an analysis framework that tackles the fundamental challenges of the isoform functional analysis by providing a set of novel methods and tools that offer an unique opportunity to explore how the phenotype is specified by altering the functional characteristics of expressed isoforms. Applied to a murine neural differentiation system, our pipeline profiled the effect of isoform regulation on the inclusion of several functional elements within transcripts between motor-neuron and oligodendrocyte differentiation systems and specifically, we discovered isoform-specific transmembrane regions whose modulation by PTR might contribute to control cell type-specific mitochondrial dynamics during neural fate determination. / This work was funded by the following grants: From 2014 to 2018. FPU: Training programme for Academic Staff. Spanish Ministry of Education, FPU2013/02348. From 2016 to 2019. NOVELSEQ: Novel methods for new challenges in the analysis of high-throughput sequencing data. MINECO, BIO2015-1658-R. From 2014 to 2017. DEANN: Developing a European American NGS Network. EU Marie Curie IRSES, GA-612583. / Fuente Lorente, LDL. (2019). Development of a bioinformatics approach for the functional analysis of alternative splicing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/124974
|
2 |
Application of Long-Read Sequencing in Microbiome Compositional Studies related to DiseaseGreenman, Noah 01 January 2024 (has links) (PDF)
Metagenomic sequencing has provided scientists with the ability to investigate microbial populations, termed microbiomes, in environmental and clinical settings. Current approaches to metagenomic research involve the use of next-generation sequencing (NGS) to generate short, precise reads for characterization of microbial compositions. While highly accurate, short reads possess several limitations that restrict their application in metagenomic research. Third generation, long-read sequencing technologies may offer several advantages for metagenomic research. Here, we used simulated datasets, as well as experimental data from murine fecal samples, to compare the relative performance of short and long reads for metagenomic research, and their impact on assessing microbial composition. Long-read data demonstrated increased precision for identification of microbiome constituents and assessing abundance without sacrificing sensitivity. Hierarchical clustering of microbiome similarity from paired short- and long-read datasets obtained from murine fecal samples revealed clustering was driven by read type as opposed to sample type, underscoring the importance of sequence type. These findings led us to use long-read sequencing for elucidating the effects of propionic acid (PPA) on the murine gut microbiome. PPA has been shown to induce physiological changes like those observed in autism spectrum disorder (ASD). Individuals with ASD may suffer from gastrointestinal comorbidities, suggesting an association with the gut microbiome. Murine offspring fed a PPA-rich diet were assessed for microbiota perturbations. Our results demonstrated that a PPA-rich diet alters the gut microbiome of progeny mice, selecting for several bacterial species that have previously been found in greater abundance among people iv with ASD. In our study, changes to microbial abundance were also associated with significant variation in bacterial metabolic pathways related to steroid hormone biosynthesis, amino sugar, and nucleotide sugar metabolism. Taken together, our findings provide empirical evidence supporting the use of long-read sequencing in metagenomic research by elucidating links between PPA exposure and gut microbiome composition.
|
3 |
Understanding the relationship between neonatal dairy calves’ gut microbiota and incidence of diarrhea using full-length 16S rRNA gene amplicon sequencing and machine learningHawkins, Jalyn Grace 13 August 2024 (has links) (PDF)
A healthy gut microbiome is crucial for the development, growth, and health of dairy calves; however, diarrhea in pre-weaned calves is highly prevalent, difficult to treat, and causes detrimental effects to the dairy industry. This study characterized early gut microbiota using longread-based 16S rRNA gene sequencing and investigated its associations with calf diarrhea and colostrum microbiota. The full-length 16S rRNA gene was amplified and sequenced on a Nanopore sequencer. We identified shared bacterial species in colostrum and calf feces, whose abundance in calf feces reduced with age. Diarrheic calves exhibited differing gut diversity before, during, and after diarrhea, and harbored increased bacteria resistant to the Cefotaxime antibiotic. Several bacterial species were associated with age and calf health. Additionally, a machine learning model identified bacteria to predict diarrhea. This study will be useful for the goal of reducing antibiotic use to promote gut health and prevent and treat neonatal calf diarrhea.
|
4 |
Transcriptome-Wide Methods for functional and Structural Annotation of Long Non-Coding RNAsDaulatabad, Swapna Vidhur 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Non-coding RNAs across the genome have been associated with various biological processes, ranging from regulation of splicing to remodeling of chromatin. Amongst the repertoire of non-coding sequences lies a critical species of RNAs called long non-coding RNAs (lncRNAs). LncRNAs significantly contribute to a large spectrum of human phenotypes, including cancers, Heart failure, Diabetes, and Alzheimer’s disease. This dissertation emphasizes the need to characterize the functional role of lncRNAs to improve our understanding of human diseases. This work consolidates a resource from multiple computational genomics and natural language processing-based approaches to advance our ability to functionally annotate hundreds of lncRNAs and their interactions, providing a one-stop lncRNA functional annotation and dynamic interaction network and multi-facet omics data visualization platform.
RNA interactions are vital in various cellular processes, from transcription to RNA processing. These interactions dictate the functional scope of the RNA. However, the multifaceted functional nature of RNA stems from its ability to form secondary structures. Therefore, this work establishes a computational method to characterize RNA secondary structure by integrating SHAPE-seq and long-read sequencing to enhance further our understanding of RNA structure in modulating the post-transcriptional regulatory processes and deciphering the influence at several layers of biological features, ranging from structure composition to consequent protein occupancy.
This study will potentially impact the research community by providing methods, web interfaces, and computational pipelines, improving our functional understanding of long non-coding RNAs. This work also provides novel integration methods of technologies like Oxford Nanopore-based long-read sequencing, RNA structure-probing methods, and machine learning. The approaches developed in this dissertation are scalable and adaptable to investigate further the functional and regulatory role of RNA and its structure. Overall, this study accelerates the development of RNA-based diagnostics and the identification of therapeutic targets in human disease.
|
5 |
Comprehensive analysis of full-length transcripts reveals novel splicing abnormalities and oncogenic transcripts in liver cancer / 完全長転写産物の網羅的解析による肝細胞癌における新規スプラシング異常と発がん性転写産物の解明Kiyose, Hiroki 23 May 2023 (has links)
京都大学 / 新制・課程博士 / 博士(医学) / 甲第24783号 / 医博第4975号 / 新制||医||1066(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授 村川 泰裕, 教授 波多野 悦朗, 教授 小川 誠司 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM
|
6 |
Genomic Structural Variation Across Five Continental Populations of Drosophila melanogasterLong, Evan Michael 01 April 2018 (has links)
Chromosomal structure variations (SV) including insertions, deletions, inversions, and translocations occur within the genome and can have a significant effect on organismalphenotype. Some of these effects are caused by structural variations containing genes. Modern sequencing using short reads makes the detection of large structural variations (> 1kb) very difficult. Large structural variations represent a significant amount of the genetic diversity within a population. We used a global sampling of Drosophila melanogaster (Ithaca, Zimbabwe, Beijing, Tasmania, and Netherlands) to represent diverse populations. We used long-read sequencing and optical mapping technologies to identify SVs in these genomes. Because the average read length used for these approaches are much longer than traditional short read sequencing, these maps facilitate the identification of chromosomal SVs of greater size and with more clarity. We found a wide diversity of structural variations in each of the five strains. These structural variations varied greatly in size and location, and significantly affected exonic regions of the genome. Structural variations accounted for a much larger difference in number of base pairs between strains than single nucleotide polymorphisms (SNPs).
|
7 |
Quantitative microbial risk assessment of small water supply systems with simultaneous detection of pathogenic bacteria / 小規模水供給システムにおける病原細菌の一斉検出法を活用した定量的微生物リスク評価Zeng, Jie 25 September 2023 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第24898号 / 工博第5178号 / 新制||工||1988(附属図書館) / 京都大学大学院工学研究科都市環境工学専攻 / (主査)教授 伊藤 禎彦, 教授 松田 知成, 教授 越後 信哉 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DGAM
|
8 |
Computational Tools for Improved Detection, Identification, and Classification of Plant Pathogens Using Genomics and MetagenomicsJohnson, Marcela Aguilera 13 February 2023 (has links)
Plant pathogens are one of the biggest threats to plant health and food security worldwide. To effectively contain plant disease outbreaks, classification and precise identification of pathogens is crucial to determine treatment and preventive measurements. Conventional methods of detection such as PCR may not be sufficient when the pathogen in question is unknown. Advances in sequencing technology have made it possible to sequence entire genomes and metagenomes in real-time and at a relatively low cost, opening an opportunity for the development of alternative methods for detection of novel and unknown plant pathogens. Within this dissertation, an integrated approach is used to reclassify a high-impact group of plant pathogens. Additionally, the application of metagenomics and nanopore sequencing using the Oxford Nanopore Technologies (ONT) MinION for fungal and bacterial plant pathogen detection and precise identification are demonstrated.
To improve the classification of the strains belonging to the Ralstonia solanacearum species complex (RSSC), we performed a meta-analysis using a comparative genomics and a reverse ecology approach to accurately portray and refine the understanding of the diversity and evolution of the RSSC. The groups identified by these approaches were circumscribed and made publicly available through the LINbase web server so future isolates can be properly classified.
To develop a culture-free detection method of plant pathogens, we used metagenomes of various plants and long-read nanopore sequencing to precisely identify plant pathogens to the strain-level and performed phylogenetic analysis with SNP resolution. In the first paper, we used tomato plants to demonstrate the detection power of bacterial plant pathogens. We compared bioinformatics tools for detection at the strain-level using reads and assemblies. In the second paper, we used a read-based approach to test the feasibility of the methodology to precisely detect the fungal pathogen causing boxwood blight. Lastly, with the improvement in nanopore sequencing, we used grapevine petioles to investigate whether we can go beyond detection and identification and do a phylogenetic analysis. We assembled a metagenome-assembled genome (MAG) of almost the same quality as the genomes obtained from cultured isolates and did a phylogenetic analysis with SNP resolution.
Finally, for the cases where there may be no related genome in the database like the pathogen in question, we used machine learning and metagenomics to develop a reference-free approach to detection of plant diseases. We trained eight different machine learning models with reads from healthy and infected plant metagenomes and compared the classification accuracy of reads as belonging to a healthy or infected plant. From the comparison, random forest was the best model in terms of computational resources needed while maintaining a high accuracy (> 0.90). / Doctor of Philosophy / Microbes are present in every environment on the planet and have been on Earth for billions of years. While some microbes are beneficial, others can cause diseases. To differentiate the ones causing diseases from those who do not, looking into the evolutionary forces making them different is crucial to classify and identify them correctly. Although microorganisms cause diseases in humans and animals, the ones causing diseases in plants are one of the biggest threats to plant health and food security worldwide.
In a perfect world, plant diseases would be diagnosed by eye or simple procedures. However, when a plant disease is present, it is not always obvious which organism, if any, is causing the disease making it hard for outbreaks to be detected and contained promptly. With technological advances, it is now possible to obtain all the genetic information of not only one organism but all the organisms living in an environment at a time. This genetic information can then be used to precisely identify what organism is causing a disease in a plant for faster disease diagnosis and, consequently, more efficient disease prevention and control.
In this dissertation, we used the bacterial group, called Ralstonia solanacearum species complex, which can cause different diseases in more than 200 crops, to investigate and understand the evolution and diversity of the members of this group. We also used newly developed technologies to obtain the genetic material of all the organisms living in multiple important plants including tomato, grapevine, and the ornamental bush, boxwood. Using this genetic material, we developed a methodology for the detection of bacteria and a fungus causing plant diseases.
While this works well when the suspected organism or a similar one is available for comparison, the detection of plant diseases in cases where this information is not available is challenging. Machine learning models, where computers can learn complex patterns from data, have the potential to detect pathogens without the need to compare the sequences to sequences of other pathogens. Here we also used the genetic material to train and compare different machine learning models to classify plants as either being infected or healthy.
|
9 |
Genetic basis and timing of a major mating system shift in CapsellaBachmann, J.A., Tedder, Andrew, Laenen, B., Fracassetti, M., Désamoré, A., Lafon-Placette, C., Steige, K.A., Callot, C., Marande, W., Neuffer, B., Bergès, H., Köhler, C., Castric, V., Slotte, T. 13 September 2019 (has links)
Yes / A crucial step in the transition from outcrossing to self-fertilization is the loss of genetic self-incompatibility (SI). In the Brassicaceae, SI involves the interaction of female and male speci-ficity components, encoded by the genesSRKandSCRat the self-incompatibility locus (S-lo-cus). Theory predicts thatS-linked mutations, and especially dominant mutations inSCR, arelikely to contribute to loss of SI. However, few studies have investigated the contribution ofdominant mutations to loss of SI in wild plant species. Here, we investigate the genetic basis of loss of SI in the self-fertilizing crucifer speciesCapsella orientalis, by combining genetic mapping, long-read sequencing of completeS-hap-lotypes, gene expression analyses and controlled crosses. We show that loss of SI inC. orientalisoccurred<2.6 Mya and maps as a dominant trait totheS-locus. We identify a fixed frameshift deletion in the male specificity geneSCRand con-firm loss of male SI specificity. We further identify anS-linked small RNA that is predicted tocause dominance of self-compatibility. Our results agree with predictions on the contribution of dominantS-linked mutations toloss of SI, and thus provide new insights into the molecular basis of mating system transitions. / Work at Uppsala Genome Center is funded by 550 RFI / VR and Science for Life Laboratory, Sweden. The SNP&SEQ Platform is supported by 551 the Swedish Research Council and the Knut and Alice Wallenberg Foundation. V.C. 552 acknowledges support by a grant from the European Research Council (NOVEL project, 553 grant #648321). The authors thank the French Ministère de l’Enseignement Supérieur et de la 554 Recherche, the Hauts de France Region and the European Funds for Regional Economical 555 Development for their financial support to this project. This work was supported by a grant 556 from the Swedish Research Council (grant #D0432001) and by a grant from the Science for 557 Life Laboratory, Swedish Biodiversity Program to T.S. The Swedish Biodiversity Program is 558 supported by the Knut and Alice Wallenberg Foundation.
|
10 |
Long-Read RNA-Seq: Quality Control and BenchmarkingPardo Palacios, Francisco José 18 November 2024 (has links)
[ES] La presente tesis muestra la utilización de las lecturas largas para resolver las limitaciones asociadas al ARN-Seq habitual, presentando innovaciones significativas en este campo. Las lecturas largas permiten capturar transcritos completos y detectar nuevas variantes de splicing, mejorando los resultados obtenidos con lecturas cortas en términos de precisión ya que no existe la necesidad de realizar un ensamblado de lecturas que podría dar lugar a isoformas quiméricas.
En el marco de este trabajo, se ha desarrollado la herramienta SQANTI3, diseñada para la evaluación y filtrado de transcriptomas. SQANTI3 clasifica modelos de transcripción de lecturas largas según categorías estructurales basadas en sus splice junctions (SJ) y anota diversas características de calidad, tales como la presencia de SJ no canónicas o la fiabilidad de las anotaciones de los sitios de inicio y término de transcripción (TSS y TTS, por sus siglas en inglés) utilizando datos ortogonales. También ofrece un módulo de filtrado de artefactos basado en aprendizaje automático y reglas definidas por el usuario, así como un módulo de "rescate" para evitar la pérdida de genes completos por un filtrado excesivo. Por último, SQANTI3 integra la anotación funcional de los transcriptomas con isoAnnot Lite, facilitando el análisis de cambios en la expresión de isoformas y sus implicaciones funcionales.
SQANTI3 se utilizó en los retos 1 y 3 del proyecto LRGASP (Long-read RNA-seq Genome Annotation Assessment Project), un esfuerzo internacional y multicéntrico para el benchmarking de herramientas bioinformáticas de lecturas largas en ARN-Seq. Ambos retos se centraron en la identificación correcta de transcritos en organismos altamente anotados (reto 1) y en organismos no modelo con limitaciones de información a priori (reto 3). LRGASP proporcionó datos de diferentes tecnologías y protocolos a los participantes para que presentaran los resultados obtenidos sus herramientas bioinformáticas. Estos resultados se evaluaron y compararon utilizando SQANTI3, dejando patente las diferencias de transcriptomas obtenidos para una misma muestra dependiendo de los datos y métodos empleados.
En resumen, el trabajo en esta tesis resalta la importancia que la utilización de lecturas largas para ARN-Seq puede tener en el futuro y como SQANTI3 es y será una herramienta clave para la evaluación y mejora de la calidad de los transcriptomas. / [CA] La present tesi mostra la utilització de les lectures llargues per resoldre les limitacions associades a l'ARN-Seq habitual, presentant innovacions significatives en aquest camp. Les lectures llargues permeten capturar transcrits complets i detectar noves variants de splicing, millorant els resultats obtinguts amb lectures curtes en termes de precisió, ja que no és necessari realitzar un assemblatge de lectures que podria donar lloc a isoformes quimèriques.
En el marc d'aquest treball, s'ha desenvolupat l'eina SQANTI3, dissenyada per a l'avaluació i filtratge de transcriptomes. SQANTI3 classifica models de transcripció de lectures llargues segons categories estructurals basades en les seues splice junctions (SJ) i anota diverses característiques de qualitat, com la presència de SJ no canòniques o la fiabilitat de les anotacions dels llocs d'inici i terme de transcripció (TSS i TTS, per les seues sigles en anglés) utilitzant dades ortogonals. També ofereix un mòdul de filtratge d'artefactes basat en aprenentatge automàtic o regles definides per l'usuari, així com un mòdul de "rescat" per a evitar la pèrdua de gens complets per un filtratge excessiu. Finalment, SQANTI3 integra l'anotació funcional dels transcriptomes amb isoAnnot Lite, facilitant l'anàlisi de canvis en l'expressió d'isoformes i les seues implicacions funcionals.
SQANTI3 es va utilitzar en els reptes 1 i 3 del projecte LRGASP (Long-read RNA-seq Genome Annotation Assessment Project), un esforç internacional i multicèntric per al benchmarking d'eines bioinformàtiques de lectures llargues en ARN-Seq. Ambdós reptes es van centrar en la identificació correcta de transcrits en organismes altament anotats (repte 1) i en organismes no model amb limitacions d'informació a priori (repte 3). LRGASP va proporcionar dades de diferents tecnologies i protocols als participants perquè presentaren els resultats obtinguts amb les seues eines bioinformàtiques. Aquests resultats es van avaluar i comparar utilitzant SQANTI3, deixant patent les diferències de transcriptomes obtinguts per a una mateixa mostra depenent de les dades i mètodes emprats.
En resum, aquesta tesi ressalta la importància que la utilització de lectures llargues per a ARN-Seq pot tindre en el futur i com SQANTI3 és i serà una eina clau per a l'avaluació i millora de la qualitat dels transcriptomes. / [EN] This thesis presents the usage of long-read sequencing to overcome the limitations associated with conventional RNA-Seq, introducing significant innovations in this field. Long-read sequencing enables the capture of full-length transcripts and the detection of novel splicing variants, improving the accuracy of results compared to short-read sequencing, as there is no need for assembly, which could otherwise lead to chimeric isoforms.
As part of this work, the SQANTI3 tool has been designed and developed for the evaluation and filtering of transcriptomes. SQANTI3 classifies long-read transcription models into structural categories based on their splice junctions (SJ) and annotates a wide variety of quality features, such as the presence of non-canonical SJs or the reliability of Transcription Start and Termination Sites (TSS and TTS) detected using orthogonal data. It also includes an artifact filtering module based on machine learning or user-defined rules, as well as a "rescue" module to prevent the loss of complete genes due to excessive filtering. Finally, SQANTI3 integrates the functional annotation of transcriptomes with isoAnnot Lite, facilitating the analysis of isoform expression changes and their functional implications.
SQANTI3 was used in challenges 1 and 3 of the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP), an international and multicenter effort to benchmark bioinformatic tools for long-read RNA-Seq data. Both challenges focused on the correct identification of transcripts in well-annotated organisms (challenge 1) and in non-model organisms with limited prior information (challenge 3). LRGASP provided participants with data from different sequencing technologies and protocols to submit the results obtained by their bioinformatics tools. These results were evaluated and compared using SQANTI3, highlighting the differences in transcriptomes obtained from the same sample depending on the data and methods used.
In summary, the work in thesis emphasizes the importance that long-read RNA-Seq can have in the future and how SQANTI3 is and will continue to be a key tool for the evaluation and improvement of transcriptome quality. / The project is supported by the following grants: Pew Charitable Trust, NIGMS R35GM138122, NHGRI R21HG011280, Spanish Ministry of Science PID2020-119537RB-10, NIGMS R35GM142647, NIGMS R35GM133569, NHGRI U41HG007234, NHGRI F31HG010999, and UM1 HG009443, NHGRI R01HG008759 and R01HG011469, NHGRI R01HG007182, NHGRI UM1HG009402, NHMRC Investigator Grant GNT2017257, Comunitat Valenciana Grant ACIF/2018/290, Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, Grant No. 2019-002443, an institutional fund from the Department of Biomedical Informatics, The Ohio State University, an institutional fund
from the Department of Computational Medicine and Bioinformatics, University of Michigan, SPBU 73023672, AMED 22kk0305013h9903,
23kk0305024h0001, Wellcome Trust [WT222155/Z/20/Z] , and European Molecular Biology Laboratory. We acknowledge the support of the Spanish Ministry of Science and Innovation to the EMBL partnership, Centro de Excelencia Severo Ochoa, and CERCA Programme / Generalitat de Catalunya and the support of the German Federal Ministry of Education and Research with the grant 161L0242A. This work has been also funded by NIH grant R21HG011280, by the Spanish Ministry of Science grants BES-2016-076994 and PID2020-119537RB-100, and by the Comunitat Valenciana grant ACIF/2018/290. / Pardo Palacios, FJ. (2024). Long-Read RNA-Seq: Quality Control and Benchmarking [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/212027
|
Page generated in 0.0794 seconds