Return to search

Pangenome modeling for analyzing the evolution of Mycobacterium tuberculosis.

泛基因组的概念来源于比较分析同一微生物物种的多个基因组。泛基因组分析已经被用于研究病原微生物基因组的变化,并且揭示了与菌株进化和宿主适应相关的特异基因。目前泛基因组研究主要集中在估计不同物种的泛基因组大小。但是对泛基因组的结构进行进化分析的生物信息方法还有待开发,以便研究不同基因在不同菌株的进化,比如基因的获得或者丢失,或者共同进化的基因簇。这样的分析方法可以把基因和菌株之间的表型关联起来,为进一步的生物学实验提供线索。 / 为了研究结核分枝杆菌种和分枝杆菌属泛基因组进化的规律并揭示其生物学意义,本论文开发了两种泛基因组数据分析的生物信息学方法。第一个是基于局部最大简约计算的祖先状态重构算法。它被用来分析结核分枝杆菌北京型全基因组水平插入/缺失序列(indels)的进化。分析表明基因组退化不仅塑造了该物种不同亚种的形成,并且也塑造了同一亚种不同亚型的分化,比如北京型。该分析还找出了北京型全基因组水平的RD区域和各种被中断的基因,这些基因可能同北京型的毒力进化相关。同时,该算法提供了另一个理解简约分析的视角;该视角可以把统计分析引入到该算法中。本论文提出的第二个模型是基于泛基因组进化的基因聚类模型。通过计算泛基因组中不同基因家族的分布频率,结合基于图论的聚类算法,该模型可以找出泛基因组中共同进化的基因聚类。对分枝杆菌属的泛基因组进行聚类分析发现了不同类别的基因簇,它们与不同分枝杆菌种的表型进化相关。这些结果说明了,一方面结核分枝杆菌在进化过程中丢失大量环境相关的基因;另一方面,它可能通过水平基因转移获得一些基因,特别是PE/PPE基因家族。因此,结核分枝杆菌可能是通过不断的基因组收缩,从一个环境菌种进化为与宿主共进化的病原菌。 / 总地来说,上面的两种方法能够被有效地用于结核分枝杆菌种和分枝杆菌属的泛基因组分析。 将来的工作可以考虑进一步引进随机模型;同时需要建立分枝杆菌的泛基因组数据库,以面对大规模测序的需求。 / Comparative analysis of multiple genomes of the same microbial species has led to the concept of pangenome to characterize the variations of gene content in different strains and to study their relationship to strain phenotype variations. Pangenome studies of microbial pathogens have identified strain-specific genes that may play roles in the evolution and adaptation of the pathogens. In previous studies, much attention was paid to estimate the size of the pangenome of different microbial species. But it is also important to develop bioinformatic methods for analyzing the evolution of the pangenome of a species, such as gene gain and loss or coevolution of clusters of genes, which may help to associate genotype variations with phenotype variations of a microbial species, and thus provides biological insights for further studies. / In this thesis, to analyze the pangenome consisting of complete mycobacterial genomes from public database and additional five Mycobacterium tuberculosis (MTB) Beijing genotype genomes sequenced by our own project, two bioinformatic approaches have been developed. The first is a local parsimony ancestral state reconstruction method, which was used to analyze genome-wide indels evolution of the MTB Beijing genotype. The key finding was that reductive evolution shaped the formation of not only different MTB species, but also different subspecies or genotypes, such as the Beijing genotype, for which genome-wide deletions of large RDs and disruption of individual genes were identified. This finding might have implications for the virulence evolution of the Beijing genotype. The method also provides an alternative perspective to understand parsimony analysis in phylogenetics, which can be used to incorporate statistical analysis into the method. / The second approach developed is a pangenome phyletic model for analyzing the coevolution of genes in the pangenome of a microbial species. This phyletic model calculates coevolution scores of gene frequencies in a pangenome. And graph-based clustering is used to identify coevolved clusters of genes. Applying this method to the genus Mycobacterium helped us to identify various gene clusters, from conserved core clusters of housekeeping genes to species-specific clusters, including genes related to pathogenesis. The key finding was that different MTB species have arose from their mycobacterial ancestor mainly by loss of many environmental related genes. On the other hand, gain of genes has also occurred within the MTB genomes, especially the clusters of the PE/PPE genes. This finding implied that the MTB species have undergone reductive evolution from an environmental species to adapt to and coevolve with their specific hosts. / In conclusion, the two methods were shown to be powerful in analyzing the pangenome of the MTB species and also of the Mycobacterium genus, and have provided useful insights into their genome and virulence evolution for further studies, including both pathogenesis related genes and genotyping genetic markers. Future works in that direction is to introduce stochastic models of gene evolution into these two methods. Finally, this work indicated that pangenome modeling is critical and can provide a good starting point for comprehensive pangenome sequencing of mycobacteria. Therefore, a database of Mycobacterial genomes for integrative pangenome annotation and evolutionary analysis should be developed. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Zhou, Haokui. / Thesis (Ph.D.) Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 121-135). / Abstracts also in Chinese.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_1077698
Date January 2014
ContributorsZhou, Haokui (author.), Zhao, Guoping , 1948- (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Microbiology, (degree granting institution.)
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography, text
Formatelectronic resource, electronic resource, remote, 1 online resource (ix, 135 leaves) : illustrations (some color), computer, online resource
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0029 seconds