• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 207
  • 96
  • 42
  • 29
  • 8
  • 5
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 472
  • 472
  • 86
  • 85
  • 83
  • 75
  • 71
  • 53
  • 49
  • 45
  • 41
  • 37
  • 36
  • 35
  • 32
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Expression tissulaire des gènes paralogues : application au cerveau humain et à son état pathologique / Tissue Expression of Paralogous Genes : application on human Brain and its Pathological state

Julien, Solène 19 December 2017 (has links)
Dans l’histoire évolutive, deux gènes paralogues sont issus d’un évènement de duplication de leur ancêtre commun. Les gènes paralogues sont caractérisés par des duplications globales de génome (WGD) ou à petite échelle (SSD) et par leur datation. Les WGDs ont lieu à deux reprises à la base de la lignée des vertébrés. Les évènements de SSD ont lieu à plusieurs moments pouvant être plus récents, plus anciens ou contemporain de la période des évènements de WGD. La rétention des paralogues dans le génome, associée à une divergence de l’expression spatiale est une contribution importante pour l’augmentation de la complexité de l’organisme au cours de l’évolution. Certaines études ont montré que les duplications anciennes seraient plus associées aux maladies. L’objectif de la première partie de la thèse est de créer une ressource sur les paralogues en collectant et en analysant différentes annotations. Nous avons construit une ressource robuste de paralogues humains à partir de listes publiées mais aussi à partir d’annotations externes. L’exploration de différentes annotations nous a permis d’identifier une identité de séquence élevée entre gènes paralogues pouvant biaiser la mesure d’expression des gènes et diminuer leur expression. L’objectif de la seconde partie, est d’explorer l’expression spatiale et la co- expression des paralogues au sein du cerveau humain, à partir des données RNA-seq du consortium GTEx. Les données d’expression GTEx de 13 tissus cérébraux, nous ont permis de montrer que la datation récente mais aussi que le type SSD contribuaient à une expression plus tissu-spécifique. Nous avons utilisé l’analyse de la co-expression (WGCNA) afin de regrouper les paralogues possédant une expression similaire au travers des tissus et nous avons pu suggérer une co-expression des SSD récents. Nos études sur les maladies ont montré que les SSD récents accumulaient des mutations associées à des maladies cérébrales. Finalement, nous avons trouvé que la co-expression des paralogues et leur tissu-spécificité au travers des régions cérébrales pouvaient enrichir nos connaissances sur les gènes associés à des maladies cérébrales. / In evolution history, two paralogous genes originate from the duplication event of a common ancestor gene. Paralogous genes are characterized by whole genome (WGD) or small-scale (SSD) duplications and their duplication date. The WGDs happened twice in the early vertebrate lineage. SSD events take place at any moment in evolutionary history and can be younger, older or dating to the same period than WGD events. Retention of paralogs in the genome associated with divergence of spatial expression is an important contributor to the increase of organism complexity through evolution. Different studies found that old duplications are more associated with diseases. The objective of the first part of the thesis is to create a resource on paralogs by collecting and analyzing annotations. We built a robust resource of human paralogs from published lists of paralogous genes and also from external annotations. Annotation exploration allowed us to identify a high sequence identity between paralogous genes impacting the gene expression measurement from RNA-seq data and decreasing the gene expression. The objective of the second part is to explore spatial expression and co-expression of paralogs in the human brain, from the GTEx consortium RNA-seq data. The GTEx expression data of 13 brain tissues allowed us to show that duplication youth and SSD type contributed to a more tissue-specific expression. We used co-expression analyses (WGCNA) to group paralogs with similar expression across tissues and we suggested the co-expression of younger SSDs. Our disease studies showed the younger SSD accumulation of mutations associated with brain diseases. We finally found that paralog co-expression and their tissue-specificity across brain regions could enrich information of known brain disease-associated genes.
82

Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival

Schissler, Grant A., Li, Qike, Gardeux, Vincent, Achour, Ikbel, Li, Haiquan, Piegorsch, Walter W., Lussier, Yves A. 24 February 2016 (has links)
Poster exhibited at GPSC Student Showcase, February 24th, 2016, University of Arizona. / Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We previously employed a metric that could prioritize the statistical significance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g. the equivalent to a gene expression fold-change). Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance (MD) to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples MD (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simulations), while not inflating false-positive rate using a study with biological replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant and are predictive of breast cancer survival (P<0.05, n¼80 invasive car- cinoma; TCGA RNA-sequences). Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient’s transcriptome. These pathways offer the opportunities for deriving clinically actionable decisions that have the potential to complement the clinical interpret- ability of personal polymorphisms obtained from DNA acquired or inherited polymorphisms and mutations. In addition, it offers an opportunity for applicability to diseases in which DNA changes may not be relevant, and thus expand the ‘interpretable ‘omics’ of single subjects (e.g. personalome).
83

Integration of RNA and protein expression profiles to study human cells

Danielsson, Frida January 2016 (has links)
Cellular life is highly complex. In order to expand our understanding of the workings of human cells, in particular in the context of health and disease, detailed knowledge about the underlying molecular systems is needed. The unifying theme of this thesis concerns the use of data derived from sequencing of RNA, both within the field of transcriptomics itself and as a guide for further studies at the level of protein expression. In paper I, we showed that publicly available RNA-seq datasets are consistent across different studies, requiring only light processing for the data to cluster according to biological, rather than technical characteristics. This suggests that RNA-seq has developed into a reliable and highly reproducible technology, and that the increasing amount of publicly available RNA-seq data constitutes a valuable resource for meta-analyses. In paper II, we explored the ability to extrapolate protein concentrations by the use of RNA expression levels. We showed that mRNA and corresponding steady-state protein concentrations correlate well by introducing a gene-specific RNA-to-protein conversion factor that is stable across various cell types and tissues. The results from this study indicate the utility of RNA-seq also within the field of proteomics. The second part of the thesis starts with a paper in which we used transcriptomics to guide subsequent protein studies of the molecular mechanisms underlying malignant transformation. In paper III, we applied a transcriptomics approach to a cell model for defined steps of malignant transformation, and identified several genes with interesting expression patterns whose corresponding proteins were further analyzed with subcellular spatial resolution. Several of these proteins were further studied in clinical tumor samples, confirming that this cell model provides a relevant system for studying cancer mechanisms. In paper IV, we continued to explore the transcriptional landscape in the same cell model under moderate hypoxic conditions. To conclude, this thesis demonstrates the usefulness of RNA-seq data, from a transcriptomics perspective and beyond; to guide in analyses of protein expression, with the ultimate goal to unravel the complexity of the human cell, from a holistic point of view. / <p>QC 20161121</p>
84

Sequencing, de novo assembly and annotation of a pink bollworm larval midgut transcriptome

Tassone, Erica E., Zastrow-Hayes, Gina, Mathis, John, Nelson, Mark E., Wu, Gusui, Flexner, J. Lindsey, Carrière, Yves, Tabashnik, Bruce E., Fabrick, Jeffrey A. 22 June 2016 (has links)
Background: The pink bollworm Pectinophora gossypiella (Saunders) (Lepidoptera: Gelechiidae) is one of the world's most important pests of cotton. Insecticide sprays and transgenic cotton producing toxins of the bacterium Bacillus thuringiensis (Bt) are currently used to manage this pest. Bt toxins kill susceptible insects by specifically binding to and destroying midgut cells, but they are not toxic to most other organisms. Pink bollworm is useful as a model for understanding insect responses to Bt toxins, yet advances in understanding at the molecular level have been limited because basic genomic information is lacking for this cosmopolitan pest. Here, we have sequenced, de novo assembled and annotated a comprehensive larval midgut transcriptome from a susceptible strain of pink bollworm. Findings: A de novo transcriptome assembly for the midgut of P. gossypiella was generated containing 46,458 transcripts (average length of 770 bp) derived from 39,874 unigenes. The size of the transcriptome is similar to published midgut transcriptomes of other Lepidoptera and includes up to 91 % annotated contigs. The dataset is publicly available in NCBI and GigaDB as a resource for researchers. Conclusions: Foundational knowledge of protein-coding genes from the pink bollworm midgut is critical for understanding how this important insect pest functions. The transcriptome data presented here represent the first large-scale molecular resource for this species, and may be used for deciphering relevant midgut proteins critical for xenobiotic detoxification, nutrient digestion and allocation, as well as for the discovery of protein receptors important for Bt intoxication.
85

Investigating the Role of the Synaptic Transcriptome in Ethanol-Responsive Behaviors

O'Brien, Megan A 01 January 2014 (has links)
Alcoholism is a complex neurological disorder characterized by loss of control in limiting intake, compulsion to seek and imbibe ethanol, and chronic craving and relapse. It is suggested that the characteristic behaviors associated with the escalation of drug use are caused by long-term molecular adaptations precipitated by the drug’s continual administration. These lasting activity-dependent changes that underlie addiction-associated behavior are thought, in part, to depend on new protein synthesis and remodeling at the synapses. It is well established that mRNA can be transported to neuronal distal processes, where it can undergo localized translation that is regulated in a spatially restricted manner in response to stimulation. Through two avenues of investigation, the research herein demonstrates that behavioral responses to ethanol result, at least in part, from alterations in the synaptic transcriptome which contribute to synaptic remodeling and plasticity. The synaptoneurosome preparation was utilized to enrich for RNAs trafficked to the synapse. Two complementary methods of genomic profiling, microarrays and RNA-Seq, were used to survey the synaptic transcriptome of DBA/2J mice subjected to ethanol-induced behavioral sensitization. A habituating expression profile, characteristic of glucocorticoid-responsive genes, was observed for a portion of synaptically targeted genes determined to be sensitive to repeated ethanol exposure. Other ethanol-responsive genes significantly enriched for at the synapse were related to biological functions such as protein folding and extra-cellular matrix components, suggesting a role for local regulation of synaptic functioning by ethanol. In a separate series of experiments, it was shown that altered trafficking of Bdnf, an ethanol-responsive gene, resulted in aberrant ethanol behavioral phenotypes. In particular, mice lacking dendritically targeted Bdnf mRNA exhibited enhanced sensitivity to low, activating doses and high, sedating doses of ethanol. Together these experiments suggest that ethanol has local regulatory effects at the synapse and lays the foundation for further investigations into the role of the synaptic transcriptome in ethanol-responsive behaviors. Supported by NIAA grants R01AA014717, U01 AA016667 and P20AA017828 to MFM, F31AA021035 to MAO, and NIDA T32DA007027 to WLD.
86

Signalling circuitry controlling fungal virulence in the rice blast fungus Magnaporthe oryzae

Oses-Ruiz, Miriam January 2014 (has links)
Rice blast disease is caused by the filamentous ascomycete fungus Magnaporthe oryzae and is the most destructive disease of cultivated rice. The pathogen elaborates a specialized infection structure called the appressorium. The morphological and physiological transitions that lead to appressorium formation of M. oryzae are stimulated through perception of environmental signals and are tightly regulated by cell cycle checkpoints. External stimuli are internalized by a variety of intracellular MAP kinase signaling pathways, and the major pathway regulating appressorium morphogenesis and plant infection is the Pmk1 MAP kinase signaling pathway. The central kinase, Pmk1, is required for appressorium morphogenesis and the homeobox and C2/H2 Zn-finger domain transcription factor, called Mst12, is required for appressorium formation and tissue invasion. The Mst12 null mutant is able to form melanised appressoria, but it is non-pathogenic. To understand the mechanism of appressorium morphogenesis and penetration peg formation, genome-wide comparative transcriptional profiling analysis was performed for the Δpmk1 and Δmst12 mutant using RNA-seq and HiSeq 2000 sequencing. This thesis reports the identification of gene sets regulated by the Pmk1 signalling pathway and defines the sub-set of these genes regulated by Mst12. I show that a hierarchy of transcription factors is likely to operate downstream of Pmk1 to regulate the main processes required for appressorium morphogenesis and plant infection. I also report the role of Mst12 in cytoskeletal re-organisation and show that it is necessary for septin-dependent F-actin polymerisation at the base on the appressorium prior to plant infection. This is consistent with the major transcriptional changes observed by RNA-seq. The thesis also reports experiments that strongly suggest that appressorium mediated plant penetration is regulated by an S-phase checkpoint which operates independently of the conventional DNA damage and repair response, and the Cds1 and Chk1 checkpoint kinases. Transcriptional profiling results are consistent with the S-phase checkpoint operating downstream of the Pmk1 MAP kinase signalling pathway. An integrated model for the operation of the Pmk1/Mst12 signalling pathways and the hierarchical control of appressorium morphogenesis in the rice blast fungus is presented.
87

Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes

Esterhuysen, Fanechka Naomi January 2018 (has links)
>Magister Scientiae - MSc / INTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests.
88

An 'AID' to understanding links between splicing and transcription

Reid, Jane Elizabeth Anne January 2015 (has links)
This study seeks to address one of the simplest questions that can be asked about an interconnected system; what happens to one process in the absence of the other process? This is a more difficult task than it would appear at first, due to the absence of small molecule inhibitors that can inhibit splicing globally in yeast cells. The first results chapter describes the adaptation of a system called the auxin induced degron (AID) to the task of inhibiting pre-mRNA splicing. This system appears to have several advantages over previous methods of inhibiting splicing and has many potential applications. Another hurdle to understanding what happens to transcription in the absence of splicing is the differential stability of pre-mRNA versus mRNA. At steady state the vast majority of transcripts of a specified gene will be mRNA transcripts. This means that even if you could rapidly inhibit splicing it would be a long time before all the pre-existing mRNA would turn-over. If you waited until specified mRNAs turned over it is likely that the cells would be very sick making it difficult to separate primary and secondary effects. The second results chapter shows the use of a metabolic labelling technique using a uracil analogue called 4-thiouracil (4SU). 4SU is added for an extremely short amount of time (1.5 min, 2.5 min, and 5 min) and the RNA produced during the labelling time is isolated by affinity purification. This allows us to study the kinetics of pre-mRNA splicing in wild-type cells and to seek correlations between splicing kinetics and gene architecture. The third results chapter combines the methods used in the previous two chapters to give a new technique called AID4U-seq. AID4U-seq allows for rapid inhibition of splicing and then the ability to isolate only the transcripts that were created after this inhibition came into effect. This should allow for examination of the primary consequences of blocking pre-mRNA splicing at multiple stages during spliceosome assembly. Additionally AID4U-seq is immediately applicable to the study of other areas of RNA processing. Defining the effects on the transcriptome of inhibiting splicing at multiple stages of assembly is an ambitious aim likely to require many more years of research. Therefore this thesis chiefly seeks to illustrate a novel strategy to begin dissecting a complex issue in which splicing, transcription, degradation and the post-transcriptional modification of histones are all likely to have roles.
89

Transcriptomic Insights into the Morphological Variation Present in Bromeliaceae

Gilkison, Victoria A. 01 May 2015 (has links)
The Bromeliaceae family utilizes a wide range of adaptations to inhabit a variety of environments including dry ones. Many attribute the large adaptive radiation of Bromeliaceae throughout the Neotropics to three main features: absorptive trichomes, tank reservoirs, and CAM photosynthesis. Based on leaf morphology and arrangement, root type, and nutrient acquisition, Pittendrigh (1948) conservatively separated bromeliads into four main classes. These four main classes are designated Type I bromeliads, Type II bromeliads, Type III bromeliads and Type IV bromeliads. We used RNA-sequencing of leaf mRNA to investigate similarities and differences in gene expression which can be related back to the four distinct leaf morphologies in the Bromeliaceae family. We found several transcripts relating to the presence of a tank and absorptive trichomes. In addition, we found evidence of varying forms of carbohydrate synthesis for carbon storage during CAM photosynthesis. Lastly, transcriptomics differences indicate different drought survival strategies, with the most extreme differences occurring between Aechmea nudicaulis and Tillandsia gardneri. This study identified transcripts related to the morphological gradient and highlighted how each ecological type has a particular set of adaptations and strategies for survive in a particular regime.
90

Network-based visualisation and analysis of next-generation sequencing (NGS) data

Wan Mohamad Nazarie, Wan Fahmi Bin January 2017 (has links)
Next-generation sequencing (NGS) technologies have revolutionised research into nature and diversity of genomes and transcriptomes. Since the initial description of these technology platforms over a decade ago, massively parallel RNA sequencing (RNA-seq) has driven many advances in the characterization and quantification of transcriptomes. RNA-seq is a powerful gene expression profiling technology enabling transcript discovery and provides a far more precise measure of the levels of transcripts and their isoforms than other methods e.g. microarray. However, the analysis of RNA-seq data remains a significant challenge for many biologists. The data generated is large and the tools for its assembly, analysis and visualisation are still under development. Assemblies of reads can be inspected using tools such as the Integrative Genomics Viewer (IGV) where visualisation of results involves ‘stacking’ the reads onto a reference genome. Whilst sufficient for many needs, when the underlying variance of the genome or transcript assemblies is complex, this visualisation method can be limiting; errors in assembly can be difficult to spot and visualisation of splicing events may be challenging. Data visualisation is increasingly recognised as an essential component of genomic and transcriptomic data analysis, enabling large and complex datasets to be better understood. An approach that has been gaining traction in biological research is based on the application of network visualisation and analysis methods. Networks consist of nodes connected by edges (lines), where nodes usually represent an entity and edge a relationship between them. These are now widely used for plotting experimentally or computationally derived relationships between genes and proteins. The overall aim of this PhD project was to explore the use of network-based visualisation in the analysis and interpretation of RNA-seq data. In chapter 2, I describe the development of a data pipeline that has been designed to go from ‘raw’ RNA-seq data to a file format which supports data visualisation as a ‘DNA assembly graph’. In DNA assembly graphs, nodes represent sequence reads and edges denote a homology between reads above a defined threshold. Following the mapping of reads to a reference sequence and defining which reads a map to a given loci, pairwise sequence alignments are performed between reads using MegaBLAST. This provides a weighted similarity score that is used to define edges between reads. Visualisation of the resulting networks is then carried out using BioLayout Express3D that can render large networks in 3-D, thereby allowing a better appreciation of the often-complex network structure. This pipeline has formed the basis for my subsequent work on the exploring and analysing alternative splicing in human RNA-seq data. In the second half of this chapter, I provide a series of tutorials aimed at different types of users allowing them to perform such analyses. The first tutorial is aimed at computational novices who might want to generate networks using a web-browser and pre-prepared data. Other tutorials are designed for use by more advanced users who can access the code for the pipeline through GitHub or via an Amazon Machine Image (AMI). In chapter 3, the utility of network-based visualisations of RNA-seq data is explored using data processed through the pipeline described in Chapter 2. The aim of the work described in this chapter was to better understand the basic principles and challenges associated with network visualisation of RNA-seq data, in particular how it could be used to visualise transcript structure and splice-variation. These analyses were performed on data generated from four samples of human fibroblasts taken at different time points during their entry into cell division. One of the first challenges encountered was the fact that the existing network layout algorithm (Fruchterman- Reingold) implemented within BioLayout Express3D did not result in an optimal layout of the unusual graph structures produced by these analyses. Following the implementation of the more advanced layout algorithm FMMM within the tool, network structure could be far better appreciated. Using this layout method, the majority of genes sequenced to an adequate depth assemble into networks with a linear ‘corkscrew’ appearance and when representing single isoform transcripts add little to existing views of these data. However, in a small number of cases (~5%), the networks generated from transcripts expressed in human fibroblasts possess more complex structures, with ‘loops’, ‘knots’ and multiple ends being observed. In a majority of cases examined, these loops were associated with alternative splicing events, a fact confirmed by RT-PCR analyses. Other DNA assembly networks representing the mRNAs for genes such as MKI67 showed knot-like structures, which was found to be due to the presence of repetitive sequence within an exon of the gene. In another case, CENPO the unusual structure observed was due to reads derived from an overlapping gene of ADCY3 gene present on the opposite strand with reads being wrongly mapped to CENPO. Finally, I explored the use of a network reduction strategy as an approach to visualising highly expressed genes such as GAPDH and TUBA1C. Having successfully demonstrated the utility of networks in analysing transcript isoforms in data derived from a single cell type I set out to explore its utility in analysing transcript variation in tissue data where multiple isoforms expressed by different cells within the tissue might be present in a given sample. In chapter 4, I explore the analysis of transcript variation in an RNA-seq dataset derived from human tissue. The first half of this chapter describes the quality control of these data again using a network-based approach but this time based the correlation in expression between genes and samples. Of the 95 samples derived from 27 human tissues, 77 passed the quality control. A network was constructed using a correlation threshold of r ≥ 0.9, which comprised 6,109 nodes (genes) and 1,091,477 edges (correlations) and clustered. Subsequently, the profile and gene content of each cluster was examined and enrichment of GO terms analysed. In the second half of this chapter, the aim was to detect and analyse alternative splicing events between different tissues using the rMATS tool. By using a false-discovery rate (FDR) cut-off of < 0.01, I found that in comparisons of brain vs. heart, brain vs. liver and heart vs. liver, the program reported 4,992, 4,804 and 3,990 splicing events, respectively. Of these events, only 78 splicing events (52 genes) with more than 50% of exon inclusion level and expression level more than FPKM 30. To further explore the sometimes-complex structure of transcripts diversity derived from tissue, RNAseq assembly networks for KLC1, SORBS2, GUK1, and TPM1 were explored. Each of these networks showed different types of alternative splicing events and it was sometimes difficult to determine the isoforms expressed between tissues using other approaches. For instance, there is an issue in visualising the read assembly of long genes such as KLC1 and SORBS2, using a Sashimi plots or even Vials, just because of the number of exons and the size of their genomic loci. In another case of GUK1, tissue-specific isoform expression was observed when a network of three tissues was combined. Arguably the most complex analysis is the network of TPM1 where the uniquification step was employed for this highly expressed gene. In chapter 5, I perform a usability testing for NGS Graph Generator web application and visualising RNA-seq assemblies as a network using BioLayout Express3D. This test was important to ensure that the application is well received and utilised by the user. / Almost all participants of this usability test agree that this application would encourage biologists to visualise and understand the alternative splicing together with existing tools. The participants agreed that Sashimi plots rather difficult to view and visualise and perhaps would lose something interesting features. However, there were also reviews of this application that need improvements such as the capability to analyse big network in a short time, side-by-side analysis of network with Sashimi plot and Ensembl. Additional information of the network would be necessary to improve the understanding of the alternative splicing. In conclusion, this work demonstrates the utility of network visualisation of RNAseq data, where the unusual structure of these networks can be used to identify issues in assembly, repetitive sequences within transcripts and splice variation. As such, this approach has the potential to significantly improve our understanding of transcript complexity. Overall, this thesis demonstrates that network-based visualisation provides a new and complementary approach to characterise alternative splicing from RNA-seq data and has the potential to be useful for the analysis and interpretation of other kinds of sequencing data.

Page generated in 0.0226 seconds