基于启动子的以下特点:(1)启动子区域有一些一致序列,但对于不同的启动子,一致序列在个别碱基上会有所改变,具有多样性;(2)一致序列的位置并不固定,总是在某个范围内波动;(3)大部分的真核生物启动子都和CpG岛有关。提出了一个新的启动子预测方法,即采用了一种新的统计建模策略,并首次提出了区间位置权重矩阵(IPWM)概率模型。大规模序列测试结果表明,新的启动子预测系统具有较好的敏感性和特异性。
Based on the following features of promoters : ( 1 ) Promoter regions include some consistent sequences , however, consistent sequence has diversity because of nucleotide variation for different promoters ; ( 2 ) Positions of consistent sequence are not fixed, instead, their positions are actually more likely to fluctuate in an approximate re- gion;(3 ) Most of the eukaryotic promoters are related with CpG island. A new method is presented for promoter prediction which adopts a new statistical modeling, and it is the first time to present a new concept "Interval Position Weight Matrix" probability model. Experimental results on large sequences show that the new promoter prediction system is efficient with higher sensitivity and specificity.