细菌sRNA是一类长度在40~500nt的调控RNA,在细菌与环境相互作用中发挥重要功能,因此,细菌sRNA识别研究具有重要意义。然而,与蛋白编码基因具有易于识别的特征不同,目前细菌sRNA识别仍是一件比较困难的事。此方法介绍了一个基于已知细菌sRNA转录终点的碱基频率矩阵来识别sRNA的预测策略,并在大肠杆菌K-12 MG1655中进行了sRNA的预测。结果表明,该模型在独立测试集中具有较高的特异性和阳性检出率,因此,这一方法将为实验发现细菌sRNA提供较好的生物信息学支持。
Bacterial sRNAs are an emerging class of regulatory RNAs, 40-500 nt in length. These sRNAs play a key role in bacteria-environments interaction, therefore, the identification of sRNAs is very important. Compared to the prediction of protein-coding genes with distinguished features, however, prediction of sRNA-coding genes is still challenging. Here we report on a strategy to predict sRNA-coding genes using the base distribution frequency matrix derived from the known sRNA genes, and E. coil K-12 MG1655 is exemplified to demonstrate the performance of the strategy. The results indicate that the presented model has both higher specificity and higher positive prediction value compared to the previous models. The model provides better support for sRNA genes discovery using experimental protocols.