寻找物种基因组中k-mer频数分布的特征,对几个典型物种的基因组序列进行了统计分析.区分“字”域和“频数”域。运用两种求信息熵的方法——Shannon信息熵和Fisher信息熵,定义了五种k-mer频数的泛函.发现对于每一物种,由Shannon信息熵定义的四种泛函与k之间都具有很好的线性关系,并且这种线性关系在所研究的物种问具有普适性.
To find the character of the distribution of k-mer frequency in genome, several genomes of typical species are statistically studied. Considering the difference between word domain and frequency domain, using Shannon information and Fisher information respectively, five types of functional of k-mer frequency are defined. The results showed good linear relation existing in each species for four functionals deduced from Shannon information. Moreover,these linear relations are basically universal among the studied species.