Return to search

Analysis of nonsense-mediated decay targeted RNA (nt-RNA) in high-throughput sequencing data / CUHK electronic theses & dissertations collection

Nonsense-mediated mRNA decay (NMD) is an important protective mechanism to guard against erroneous transcripts particularly mRNA transcripts containing premature termination codons (PTC). In classical teaching, such erroneous transcripts (called nonsense-mediated decay targeted RNA, nt-RNA here) are considered as incidental non-specific side-products of the cellular transcription machinery and they are rapidly cleared by NMD and thus they exists in scanty quantity inside a cell (i.e. at a very low steady state abundance). As a side product of stochastic transcriptional error, they are also commonly considered to carry no biologic function. / By analysis of a large collection of RNA-seq data in TCGA (over 4000 samples and the hard disk storage was over 50 TB), it was found that nt-RNA were produced in large amount for some genes, sometimes, they were even more abundant than the normal transcripts of the corresponding genes. / Based on the hypothesis that some nt-RNA are specifically produced by a biological process (in contrast to a process happened by chance), the aims of this work are: 1) To quantify the expression of nt-RNA (survey of the spectrum); 2) To examine the relationship between nt-RNA and protein expression (biological roles); 3) To detect nt-RNAs that affect prognosis of cancer (biological roles); 4) To apply nt-RNA as diagnostic biomarkers for cancer (application); 5) To identify nt-RNAs to classify tumors for unknown primary (CUP, application). / Firstly, nt-RNA were defined from Gene databases and all PTC containing transcripts were compared to their corresponding normal transcripts to locate specific signature tags (both short segments of sequences and splice junctions) for each of the nt-RNA. And the presence and counts of these nt-RNA signature tag were searched in all RNA reads of RNA-seq datasets. Such search and counting produced the read counts of each nt-RNA signature tag and all RNA-read containing such tags are targets for NMD. RNA-seq datasets used in this study included TCGA normal samples, TCGA tumor samples and cancer cell lines for 13 cancer types. / In the example of KIRC, it was found that most differentially expressed nt-RNA (tumor vs control) were related to differential expression of the corresponding normal transcripts. However, nt-RNA were produced in 900 genes which were independent of higher production of the normal transcripts. In the example of KIRC, collection of 12 genes in the proteasome ubiquitination pathway standed out among the highly produced nt-RNA. This finding is very interesting as VHL-HIF1A is a key oncogenesis mechanism in KIRC and normal HIF1A degradation required proteasomal ubiquitination pathway. GO analysis was highly significant at p-value<4.11E-05. And the nt-RNA producing genes included PSMB4, PSMD14, PSMC6, PSMD13, PSMB1, VCP, ANAPC5, PSMA4, PSMD3, ANAPC7, OS9, GCLC. / Secondly, some nt-RNA retarded translation of the normal transcripts. By using proteome data, the relationship between quantity of nt-RNA unique tags and normal protein product were analyzed by ANOVA comparison of linear models. It was found that 422 nt-RNA unique tags influenced the expression of proteins, which suggested a potential biological action of these nt-RNA. PTEN also produced nt-RNA in KIRC and tumor cells with higher PTEN nt-RNA had a lower PTEN protein level (p-value of ANOVA comparison of linear models: 0.017). Survival analysis results showed that PTEN nt-RNA levels affected survival, which suggested that it can be used as biomarker for prognosis. Furthermore, survival analysis were done for other nt-RNA unique tags which affected protein expression using clinical data. / Thirdly, the application of nt-RNA as diagnostic markers and markers to define tumor origin in CUP were examined. nt-RNA were identified in different types of tumors. Here, only nt-RNA that were independent of the normal gene transcripts in term of differential expression were used as biomarkers. By comparing tumor samples with normal samples, nt-RNAs as diagnostic markers were detected. Unsupervised clustering was performed for these nt-RNAs and heat maps showed high degree of separation of tumor and normal samples. For studying tumor origin in CUP, in both cross-validation study in the training dataset (N=541) and independent sample set external validation (N=2462), a highly discriminating sets of nt-RNAs were defined for most cancers examined (400 nt-RNA seq. tags). Unsupervised clustering was performed for the 400 nt-RNA seq. tags and heat maps showed its power to define tumor origin in CUP. And then the significance of classifier formed by 400 nt-RNA seq. tags was measured by performing 100 resampling of the training set. The results for the 100 resampling showed that the correctly classified instance rate for training set had 96.4895% ± 0.75% (mean ± standard deviation); for validation set had 91.0239% ± 1.032611%. / In conclusion, this study showed nt-RNA can have important biological function and be used for various applications. It’s a potential biomarker for diagnosis and prognosis of diseases. And it can also be used to decide the origin site of tumors, which indicates that nt-RNA will provide great information for potential application in diagnosis of cancer and determining the origin in cancer of unknown primary site (CUP). [With diagram] / 無意介導的mRNA降解(NMD)是一種重要的保護機制,它可以防止錯誤的轉錄本,特別是含有提前終止密碼子的轉錄本。在經典的教學里,這種錯誤的轉錄本(這裡稱為無意介導的mRNA降解所靶向的轉錄本,記為nt-RNA)被認為是細胞轉錄過程中偶然產生的非特異性的副產物,它們很快被NMD清除,因此它們在細胞內的表達很少(即穩態時它們的表達量很少)。作為隨機的轉錄錯誤的一個副產物,它們通常被認為是沒有生物功能的。 / 通過分析大量的來自TCGA的RNA-seq的數據(超過4000個樣本,存儲空間超過50TB),我們發現一些基因的nt-RNA有很高的表達量,有的甚至超過同一個基因的正常轉錄本的表達量。 / 我們的假設是一些nt-RNA是由某個生物過程特定產生的,而不是偶然產生的。基於這一假設,本研究的目標有:(1)量化nt-RNA的表達(表達譜的調查);(2)探索nt-RNA與蛋白質表達的關係(生物功能);(3)尋找可以影響癌症預後的nt-RNA(生物功能);(4)用nt-RNA作為癌症診斷的生物標記物(應用);(5)識別可以用來區分原发灶不明的癌症的nt-RNA(應用)。 / 首先,通過基因的數據庫定義nt-RNA,并將這些nt-RNA與相應的正常的轉錄本進行比較,找到每個nt-RNA特有的標簽(包括系列的片段和剪接位点)。進而在RNA-seq數據所有的讀段中搜索這些nt-RNA特有的標簽并記數。通過這樣的搜索和記數,產生了每個nt-RNA特有標簽的讀段數目,而包含這些標簽的讀段就是NMD的靶標。本研究中使用的RNA-seq數據包含13種癌症的TCGA正常和癌症樣本,以及癌細胞系的樣本數據。 / 在腎癌的例子中,大多數差異表達(癌症與正常比較)的nt-RNA和它相應的正常的轉錄本的差異表達是有關聯的。然而,900个基因產生的nt-RNA與正常轉錄本的高表達是獨立的。我們發現與白酶體泛素化通路相關的12個基因高表達nt-RNA。這個發現是很有意思的,因為VHL-HIF1A是KIRC的一個重要的致癌機制,而正常的HIF1A的降解需要通過白酶體泛素化通路。白酶體泛素化通路在基因富集分析中是顯著的(p值<4.11E-05)。這12個基因分別是PSMB4,PSMD14,PSMC6,PSMD13,PSMB1,VCP,ANAPC5,PSMA4,PSMD3,ANAPC7,OS9,GCLC。 / 其次,一些nt-RNA可以降低正常轉錄本的翻譯。利用蛋白組數據,我們用ANOVA比較線性模型的方法研究了nt-RNA特有的標簽與正常的蛋白產物的關係。結果發現,422个nt-RNA特有的標簽影響蛋白質的表達,這說明nt-RNA具有潛在的生物作用。PTEN也在KIRC裡產生nt-RNA,PTEN的nt-RNA表達越高的樣本,含有越少的PTEN蛋白產物(ANOVA比較線性模型的p值=0.017)。生存分析的結果顯示PTEN的nt-RNA影響生存率,這說明PTEN的nt-RNA可以作為癌症預後的生物標記物。進一步,對其他的影響蛋白表達的nt-RNA特有的標簽也做了生存分析。 / 最後,我檢查了nt-RNA作為診斷標記物和用來定義原发灶不明的癌症(CUP)的起源的標記物的兩大應用。只有在差異表達方面獨立於正常轉錄本的那些nt-RNA會被用作生物標記物。通過比較癌症和正常的樣本,檢查了哪些nt-RNA可以作為診斷標記物。利用無監督的聚類分析和熱圖顯示了這些nt-RNA可以很明顯地將癌症和正常樣本分開。在研究原发灶不明的癌症(CUP)的起源中,通過對訓練集(N=541)和獨立的外部驗證集(N=2462)進行交叉驗證學習,定義了一個可以識別大多數癌症樣本的nt-RNA標簽集(400個nt-RNA特有的片段標簽)。無監督的聚類分析和熱圖顯示了用這些nt-RNA定義原发灶不明的癌症(CUP)的起源的能力。隨後,通過從訓練集的樣本隨機抽樣100次,檢查了由400個nt-RNA特有的片段標簽組成的分類器的顯著性。100次隨機抽樣的結果顯示:對訓練集,樣本準確分類率的均值和標準差分別是96.4895%和0.75%;對驗證集,樣本準確分類率的均值和標準差分別是91.0239%和1.032611%。 / 總之,本研究顯示了nt-RNA有重要的生物功能和多種應用。它是癌症診斷和預後的潛在的生物標記物。它也可以被用來決定癌症的原发灶,這意味著nt-RNA將會為癌症診斷和決定原发灶不明的癌症的原发灶的這些潛在應用提供很好的信息。[附圖] / Hu, Fuyan. / Thesis Ph.D. Chinese University of Hong Kong 2015. / Includes bibliographical references (leaves 173-211). / Abstracts also in Chinese. / Title from PDF title page (viewed on 12, October, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_1291523
Date January 2015
ContributorsHu, Fuyan (author.), Tang, Nelson L. S. (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Chemical Pathology. (degree granting institution.)
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography, text
Formatelectronic resource, electronic resource, remote, 1 online resource (xvi, 211 leaves) : illustrations (some color), computer, online resource
RightsUse of this resource is governed by the terms and conditions of the Creative Commons "Attribution-NonCommercial-NoDerivatives 4.0 International" License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.003 seconds