廣泛的區域內和跨民族的轉錄變化反映了人類的適應和自然選擇。基因表達是轉化基因組信息為功能基因產品 - 蛋白質的主要機制。異常基因的表達和疾病的發病機制有關。基因組革命提供了獨特的機會為複雜的人類轉錄組進行全面的研究。轉錄分析需要複雜的生物信息學方法。在技術角度,一個實證模型用了哺乳動物基因組中內含子長度幾何尾分佈的定律準確地確定剪接交界處和非唯一映射讀取的位置。這種方法在處理非唯一映射讀取比BWA更好。這方法還比其他工具檢測出更多已經實驗證實的剪接交界處。核糖核酸測序首先用於北京漢人和西歐之間的表達表型與的轉錄變化的詳盡研究。民族的具體剪接交界處被發現。此外,民族的具體特點體現在相對異構體的豐度差。最後,這分子表型剪接頻譜的變化在不同種族之間的不同表明了另一個描繪種族多樣性的方法,核糖核酸測序還被用於探索的一種複雜的疾病:二型糖尿病的分子異常。二型糖尿病表現在廣泛不同的基因表達。(1)這研究證實先前公佈的全基因組關聯研究;(2)改善策劃不佳的位點和(3)發現新型2型糖尿病相關的基因。本研究通過整合各種改變的信號,並在一個高度可信的基因 - 基因相互作用網絡進行解釋,增強表達異常在2型糖尿病的認識。在更廣泛的69×79的情況下,對照組的結果進行了驗證。本研究增強表達異常在2型糖尿病的認識。 / Extensive intra- and inter- ethnic transcriptome variation reflects human adaptation and natural selection. Gene expression is the primary mechanism that translates genome information into functional gene product that lead to physiological phenotypes. Aberrant gene expression has been associated to the pathogenesis of diseases. The genome revolution has offered unique opportunity for a comprehensive interrogation of the complexity of human transcriptome. Analysis of transcriptome using RNA-Seq requires sophisticated bioinformatics approach. In a technical perspective, an empirical model based on the geometric-tail distribution of intron lengths in mammalian genome was developed to accurately determine splice junctions from junction reads and locations of non-uniquely mapped reads. Such method handles non-uniquely mapped reads better than BWA. The method can also detect more experimentally confirmed splice junction than other tools. Expressional phenotyping was employed to explore global transcriptomic variation between Beijing Han Chinese and Western European. In addition to inter-ethnic variations in gene expression, ethnic specific splice juctions were found. Further, ethnic specific trait manifests in differential relative isoform abundance. Lastly, such spectrum of variations was different between different ethnic groups, suggesting alternative splicing as another molecular phenotype that delineates ethnic diversity. Expressional phenotyping was then used in a case-control study to explore the molecular abnormalities of a complex disease: Type 2 Diabetes (T2DM). T2DM manifested in wide-spread repression of gene expression. The study (1) confirmed previously reported Genome-wide Association Study (GWAS) loci; (2) curated poorly characteriezed GWAS loci and (3) discovered novel T2DM associated genes. By integrating various alteration signals and interpretation performed in a highly confident gene-gene interaction network, this study augmented the understanding of expressed abnormalities in T2DM. The results were validated in a broader 69 x 79 case-control group. / Detailed summary in vernacular field only. / Li, Jing Woei. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 118-130). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.v / 中文擇要 --- p.vi / Thesis/Assessment Committee --- p.ix / Acknowledgement --- p.ix / List of figures --- p.x / List of tables --- p.xii / List of Abbreviations --- p.xiii / Scientific contributions --- p.xv / List of Publication(s) related to this thesis --- p.xvi / Conference presentations --- p.xvii / Chapter Chapter 1: --- Introduction and Literature Reviews --- p.1 / Chapter 1.1 --- The variable human transcriptome --- p.1 / Chapter 1.2 --- Significance of variation in gene expression and transcript variants --- p.2 / Chapter 1.3 --- Transcriptomic study in a technological perspective --- p.8 / Chapter 1.3.1 --- Microarray: Probing what was designed to be probed --- p.8 / Chapter 1.3.2 --- RNA-Seq: the ab initio decoder of biological sequences --- p.9 / Chapter 1.4 --- Analysis of RNA-Seq data --- p.10 / Chapter 1.4.1 --- The bioinformatics challenges prevail --- p.10 / Chapter 1.4.2 --- Identifying changes in gene expression --- p.16 / Chapter 1.4.3 --- Identifying splice site, quantification of isoform level expression --- p.17 / Chapter 1.5 --- Conclusion --- p.19 / Chapter 1.6 --- Aims of this study --- p.20 / Chapter 1.6.1 --- Splice junction determination --- p.20 / Chapter 1.6.2 --- Expressional phenotyping in ethnical context --- p.20 / Chapter 1.6.3 --- Expressional phenotyping in a disease context --- p.20 / Chapter Chapter 2: --- Detection of splicing events --- p.21 / Chapter 2.1 --- Abstract --- p.21 / Chapter 2.2 --- Introduction --- p.22 / Chapter 2.3 --- Methods and workflow --- p.25 / Chapter 2.4 --- Algorithm --- p.29 / Chapter 2.5 --- Geometric-tail distribution --- p.32 / Chapter 2.6 --- Insert-size distribution --- p.33 / Chapter 2.7 --- Multiread analysis --- p.34 / Chapter 2.7.1 --- GT model probably places multiread more accurately than BWA --- p.35 / Chapter 2.8 --- Splice-site comparison --- p.37 / Chapter 2.8.1 --- GT model discovers more experimentally confirmed splice junction --- p.37 / Chapter 2.8.2 --- GT model is highly accurate --- p.39 / Chapter 2.9 --- Discussion --- p.40 / Chapter 2.10 --- Limitation --- p.40 / Chapter Chapter 3: --- Transcriptomic variation in a ethnicity context --- p.41 / Chapter 3.1 --- Abstract --- p.41 / Chapter 3.2 --- Introduction --- p.42 / Chapter 3.3 --- Materials and Methods --- p.46 / Chapter 3.3.1 --- HapMap lymphoblastoid cell-lines --- p.46 / Chapter 3.3.2 --- Sequenced samples --- p.48 / Chapter 3.3.3 --- Paired-end RNA-Seq, dataset and reads processing --- p.48 / Chapter 3.3.4 --- Genome reference and annotation --- p.49 / Chapter 3.3.5 --- Strategies for reads mapping --- p.49 / Chapter 3.3.6 --- Pathway and Gene Ontology analysis --- p.50 / Chapter 3.3.7 --- Differential gene expression analysis --- p.50 / Chapter 3.3.8 --- Ethnic specific splice junction --- p.51 / Chapter 3.3.9 --- Junction sites saturation analysis --- p.51 / Chapter 3.3.10 --- Ethnical novel transcribed regions --- p.52 / Chapter 3.3.11 --- Isoform dynamics and meta-analysis --- p.53 / Chapter 3.4 --- Result --- p.54 / Chapter 3.4.1 --- Paired-end RNA-Seq --- p.54 / Chapter 3.4.2 --- Differential gene expression and meta-analysis --- p.56 / Chapter 3.4.3 --- Ethnic specific splice junction is rare --- p.58 / Chapter 3.4.4 --- Saturation of discovery of highly confident annotated junctions --- p.59 / Chapter 3.4.5 --- Novel transcribed regions --- p.62 / Chapter 3.4.6 --- Isoform dynamics and meta-analysis --- p.63 / Chapter 3.5 --- Discussion --- p.66 / Chapter 3.6 --- Limitations --- p.67 / Chapter 3.6.1 --- HapMap LCLs may not reflect the entire spectrum of natural variation --- p.67 / Chapter 3.6.2 --- Sequencing depth and the usefulness of published dataset --- p.67 / Chapter 3.6.3 --- Knowledge gap in understanding of the human genome --- p.69 / Chapter Chapter 4: --- Transcriptomic investigation of complex disease: Type 2 Diabetes --- p.70 / Chapter 4.1 --- Abstract --- p.70 / Chapter 4.2 --- Introduction --- p.72 / Chapter 4.3 --- Materials and Methods --- p.75 / Chapter 4.3.1 --- Subjects --- p.75 / Chapter 4.3.2 --- Strand-specific RNA-Seq Library Construction --- p.77 / Chapter 4.3.3 --- Genome annotation sequencing reads processing --- p.81 / Chapter 4.3.4 --- Reads mapping for expression analysis --- p.82 / Chapter 4.3.5 --- Differential Gene expression analysis --- p.82 / Chapter 4.3.6 --- GWAS candidate genes --- p.83 / Chapter 4.3.7 --- Individual network, pathway and Gene Ontology analysis --- p.83 / Chapter 4.3.8 --- Alternative Splicing Variation --- p.83 / Chapter 4.3.9 --- Reads mapping and processing for expressed genomic variants discovery --- p.84 / Chapter 4.3.10 --- Expressed and functional genomic variants --- p.85 / Chapter 4.3.11 --- Screening for gene fusion --- p.86 / Chapter 4.3.12 --- Sense and Antisense analysis --- p.86 / Chapter 4.3.13 --- Integrated multi-level T2DM alternations gene interaction network --- p.87 / Chapter 4.3.14 --- Validation of selected genes --- p.87 / Chapter 4.4 --- Results --- p.88 / Chapter 4.4.1 --- High quality strand-specific pair-ended RNA-Seq facilitated downstream analyses --- p.88 / Chapter 4.4.2 --- Definition of significance --- p.91 / Chapter 4.4.3 --- Wide-spread repressed gene expression in T2DM --- p.91 / Chapter 4.4.4 --- Confirmation and curation of T2DM GWAS loci by RNA-Seq --- p.92 / Chapter 4.4.5 --- Global expression alteration on T2DM associated genes --- p.97 / Chapter 4.4.6 --- Alteration of relative splicing isoforms variations and T2DM specific isoforms --- p.100 / Chapter 4.4.7 --- Rare and deleterious SNPs --- p.100 / Chapter 4.4.8 --- Absence of alteration in Sense/Antisense ratio and expressed fusion gene --- p.101 / Chapter 4.4.9 --- T2DM manifests a broad spectrum of expressed abnormalities --- p.101 / Chapter 4.4.10 --- Pathway-based integration of multiple levels of alteration expanded the T2DM network --- p.103 / Chapter 4.4.11 --- Validation of selected genes --- p.107 / Chapter 4.5 --- Discussion --- p.108 / Chapter Chapter 5: --- Conclusions and future perspectives --- p.115 / Chapter 5.1 --- Conclusions --- p.115 / Chapter 5.2 --- Future perspective --- p.115 / Chapter 5.2.1 --- Splicing detection --- p.115 / Chapter 5.2.2 --- Studies related to ethnicity --- p.116 / Chapter 5.2.3 --- Complex diseases --- p.116 / References --- p.118 / Appendix --- p.131
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328138 |
Date | January 2012 |
Contributors | Li, Jing Woei., Chinese University of Hong Kong Graduate School. Division of Life Sciences. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography |
Format | electronic resource, electronic resource, remote, 1 online resource (xvii, 177 leaves) : ill. (some col.) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0029 seconds