泛基因组(Pan-genome)是某一物种全部基因的总称,其中包括核心基因组(该物种所有个体中都存在的基因)和非必须基因组(只在部分个体中存在的基因,以及某个体特有的基因)。文章从泛基因组学角度比较分析了30株已经完成测序的大肠杆菌的基因、基因组成及其进化特征,结果表明核心基因只占据每株大肠杆菌全部基因数目的 50%左右,而平均每个菌株有146个特有基因,结果表明随着更多大肠杆菌菌株的基因组被测序,将会不断有新基因被发现。通过比较分析大肠杆菌不同菌株之间基因的保守性与基因的GC含量以及选择压力之间的关系,发现越保守的基因其GC含量变化范围越窄,同时在进化中受到的选择压力也越大。这些结果将有助于深入了解大肠杆菌基因组的进化特征及其基因组成的动态变化,并为预防和控制由致病性大肠杆菌引发的流行疾病提供理论依据,同时也为大规模病原菌基因组数据的分析方法提供借鉴。
A pan-genome describes the full complement of genes in species. It is a superset of all the genes in all the individuals of a species, which is composed of a 'core genome' containing genes present in all individuals, and a 'dispensa- ble genome' containing genes present only in some individuals and individual-specific genes. From pan-genome sight, 30 finished genomes from Escherichia coli were employed to analyze their gene and genome compositions and evaluation in this study. The results indicated that the core genes accounted for about 50% of the total number of genes, while about 146 strain-specific genes existed in the each strain tested. The data suggests that the E. coli pan-genome is vast, and unique genes will continue to be identified when more E. coli genomes are sequenced. After analyzing relationships of the gene conservation, GC content and selection pressure in different strains tested, we found that more conserved genes had a narrow range of GC content, and they also bear more selection pressure. These results will be helpful for better understanding of the evolution profile of E. coli genome, and the dynamic changes of its gene compositions. The E. coli pan-genome pro- vides useful information for prevention and control of the diseases caused by pathogenic E. coli, and also provides a paradigm for the large-scale analysis of pathogenic bacteria genomes.