在基于保守序列这一信号特征识别剪接位点的基础上.挖掘了可用于剪接位点识别的其他多个特征(包括剪接位点上、下游序列的碱基组成。剪接位点信号和上、下游序列的碱基组成随位点邻近序列C+G含量的变化等统计特征),建立了描述这些特征的模型。设计了能有效融合这些特征对剪接位点进行识别的对数线性模型,开发了剪接位点识别程序SpliceKey.测试结果表明:SpliceKey识别剪接位点的精度不仅较WAM方法有显著的提高,而且也优于国际上最新发布的剪接位点识别软件DGSplice.SpliceKey已提供网络服务:http://infosci.hust.edu.cn/SpliceKey/.
Besides the feature of the conservative signal sequences around splice sites, other features for identifying splice sites were exploited, including the relationship between the conservative signals and the CA-G content of sequences around splice sites, the compositional features of the up and down stream sequences of splice sites and their dependence on the C+G content of sequences around splice sites. Further, different models are constructed to describe these features, and a logitlinear model is created to integrate them. Eventually, a new program SpliceKey for the prediction of splice sites is developed. Testing results demonstrate that the prediction accuracy of SpliceKey is not only significantly higher than that of WAM, but also better than that of DGSplice, a recently released splice site prediction program. SpliceKey is available at http://infosci.hust. edu. cn/SpliceKey/.