作为DNA序列的重要组成特征,基因组寡核苷酸使用模式及其偏倚的研究已被广泛应用于原核生物基因组的分析.然而,关于寡核苷酸使用模式的偏倚是否具有种群特异性并反映种群的功能这一问题,尚未阐明.我们基于一阶马尔可夫链模型,提出了一个度量寡核苷酸使用模式偏倚的新指标--基因组三核苷酸(trinucleotide;tri-)转移概率偏倚(transition probability bias;TPB)特征向量,或称之为三核苷酸转移概率最大偏倚分布,并分析比较了727条有代表性的原核生物基因组序列tri-TPB特征向量.结果表明,基因组tri-TPB特征向量具有物种特异性,亲缘关系越近的物种,它们的tri-TPB特征向量越相似;同种内的不同菌株具有几乎完全相同的tri-TPB特征向量,并且不依赖于基因组的GC含量;此外,基因组tri-TPB特征向量的相似性与菌株的致病性特征相关.本研究结果为基于全基因组寡核苷酸组成和分布信息的物种及其致病性进化分析提供了新的思路和方法.
As important characteristics of DNA sequence compositions, genomic oligonucleotide usage pattern and its bias study have been widely used in the analysis of prokaryotic genomes. Nevertheless, it remains unclear whether the bias of the genomic oligonucleotide usage pattern possesses species-specific properties of the genomes and reflects species functions or not. Based on a Markov chain model, a novel index--the characteristic vector of trinucleotide transition probability bias (tri-TPB), namely the distribution pattern of maximum trinucleotide transition probability bias, was proposed to measure the oligonucleotide usage pattern bias. 727 representative prokaryotic genomes were analyzed and compared their characteristic of tri-TPB vector. Our results showed that the closer the phylogenic relationship was, the more similar the characteristic oftri-TPB vectors was; especially, an almost identical characteristic vector tri-TPB pattern remains seen nearly in all genomes within the same species, was independent of genome GC contents. In addition, it was indicated that the similaritis of characteristic vectors of genomic tri-TPB patterns correlate closely with the pathogenicity of bacterial strains. The present results provide us a new perspective for the analysis of genome evolution and their pathogenicity evolution in genomic oligonucleotide composition and distribution.