Return to search

Development of bioinformatics platforms for methylome and transcriptome data analysis.

高通量大規模並行測序技術,又称為二代測序(NGS),極大的加速了生物和醫學研究的進程。隨著測序通量和複雜度的不斷提高,在分析大量的資料以挖掘其中的資訊的過程中,生物訊息學變得越發重要。在我的博士研究生期間(及本論文中),我主要從事於以下兩個領域的生物訊息學演算法的開發:DNA甲基化資料分析和基因間區長鏈非編碼蛋白RNA(lincRNA)的鑒定。目前二代測序技術在這兩個領域的研究中有著廣泛的應用,同時急需有效的資料處理方法來分析對應的資料。 / DNA甲基化是一種重要的表觀遺傳修飾,主要用來調控基因的表達。目前,全基因組重亞硫酸鹽測序(BS-seq)是最準確的研究DNA甲基化的實驗方法之一,該技術的一大特點就是可以精確到單個堿基的解析度。為了分析BS-seq產生的大量測序數據,我參與開發並深度優化了Methy-Pipe軟體。Methy-Pipe集成了測序序列比對和甲基化程度分析,是一個一體化的DNA甲基化資料分析工具。另外,在Methy-Pipe的基礎上,我又開發了一個新的用於檢測DNA甲基化差異區域(DMR)的演算法,可以用於大範圍的尋找DNA甲基化標記。Methy-Pipe在我們實驗室的DNA甲基化研究項目中得到廣泛的應用,其中包括基於血漿的無創產前診斷(NIPD)和癌症的檢測。 / 基因間區長鏈非編碼蛋白RNA(lincRNA)是一種重要的調節子,其在很多生物學過程中發揮作用,例如轉錄後調控,RNA的剪接,細胞老化等。lincRNA的表達具有很強的組織特異性,因此很大一部分lincRNA還沒有被發現。最近,全轉錄組測序技術(RNA-seq)結合基因從頭組裝,為新的lincRNA鑒定以及構建完整的轉錄組列表提供了最有力的方法。然而,有效並準確的從大量的RNA-seq測序數據中鑒定出真實的新的lincRNA仍然具有很大的挑戰性。為此,我開發了兩個生物訊息學工具:1)iSeeRNA,用於區分lincRNA和編碼蛋白RNA(mRNA);2)sebnif,用於深層次資料篩選以得到高品質的lincRNA列表。這兩個工具已經在多個生物學系統中使用並表現出很好的效果。 / 總的來說,我開發了一些生物訊息學方法,這些方法可以幫助研究人員更好的利用二代測序技術來挖掘大量的測序數據背後的生物學本質,尤其是DNA甲基化和轉錄組的研究。 / High-throughput massive parallel sequencing technologies, or Next-Generation Sequencing (NGS) technologies, have greatly accelerated biological and medical research. With the ever-growing throughput and complexity of the NGS technologies, bioinformatics methods and tools are urgently needed for analyzing the large amount of data and discovering the meaningful information behind. In this thesis, I mainly worked on developing bioinformatics algorithms for two research fields: DNA methylation data analysis and large intergenic noncoding RNA discovery, where the NGS technologies are in-depth employed and novel bioinformatics algorithms are highly needed. / DNA methylation is one of the important epigenetic modifications to control the transcriptional regulations of the genes. Whole genome bisulfite sequencing (BS-seq) is one of the most precise methodologies for DNA methylation study which allows us to perform whole methylome research at single-base resolution. To analyze the large amount of data generated by BS-seq experiments, I have co-developed and optimized Methy-Pipe, an integrated bioinformatics pipeline which can perform both sequencing read alignment and methylation state decoding. Furthermore, I’ve developed a novel algorithm for Differentially Methylated Regions (DMR) mining, which can be used for large scale methylation marker discovery. Methy-Pipehas been routinely used in our laboratory for methylomic studies, including non-invasive prenatal diagnosis and early cancer detections in human plasma. / Large intergenic noncoding RNAs, or lincRNAs, is avery important novel family of gene regulators in many biological processes, such as post-transcriptional regulation, splicing and aging. Due to high tissue-specific expression pattern of the lincRNAs, a large proportion is still undiscovered. The development of Whole Transcriptome Shotgun Sequencing, also known as RNA-seq, combined with de novo or ab initio assembly, promises quantity discovery of novel lincRNAs hence building the complete transcriptome catalog. However, to efficiently and accurately identify the novel lincRNAs from the large transcriptome data stillremains a bioinformatics challenge.To fill this gap, I have developed two bioinformatics tools: I) iSeeRNAfor distinguishing lincRNAs from mRNAs and II) sebnif for comprehensive filtering towards high quality lincRNA screening which has been used in various biological systems and showed satisfactory performance. / In summary, I have developed several bioinformatics algorithms which help the researchers to take advantage of the strength of the NGS technologies(methylome and transcriptome studies) and explore the biological nature behind the large amount of data. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Sun, Kun. / Thesis (Ph.D.) Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 118-126). / Abstracts also in Chinese.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_1077735
Date January 2014
ContributorsSun, Kun (author.), Sun, Hao , active 2014 (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Chemical Pathology. (degree granting institution.)
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography, text
Formatelectronic resource, electronic resource, remote, 1 online resource (xvii, 126 leaves) : illustrations (some color), computer, online resource
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0034 seconds