原核生物的倡导者力量的预言不仅在生命科学的基本研究而且在合成生物学的应用方面基于它的顺序是很重要的。许多进展被做了为力量预言造量的模型,特别象人工的神经网络(ANN ) 那样的机器学习方法的介绍显著地有改进预言精确性。作为最重要的机器学习方法之一,支持向量机器(SVM ) 是更强大的听说从小样品数据集并且这样的知识想了处于这个问题工作。证实这的方法,我们构造了基于的 SVM 到份量上的模型预言倡导者力量。100 个倡导者序列和力量价值的一个图书馆随机被划分成二数据集,包括为模型训练和为模型测试的一个测试集合(10 个序列) 的一个训练集合(10 个序列) 。结果显示的结果有训练的尺寸的增加的预言表演增加设定,并且最好的表演在 90 的尺寸被完成,这定序。在模型参数的优化以后,一个高效的模型最后被训练,与为适合训练集合的一个高摆平的关联系数( R 2> 0.99 )并且测试集合( R 2> 0.98 ),哪个比我们的以前的工作获得的 ANN 的好。我们的结果表明基于 SVM 的模型的结论能为倡导者力量的量的预言被采用。
Background: The prediction of the prokaryotic promoter strength based on its sequence is of great importance not only in the fundamental research of life sciences but also in the appfied aspect of synthetic biology. Much advance has been made to build quantitative models for strength prediction, especially the introduction of machine learning methods such as artificial neural network (ANN) has significantly improve the prediction accuracy. As one of the most important machine learning methods, support vector machine (SVM) is more powerful to learn knowledge from small sample dataset and thus supposed to work in this problem. Methods: To confirm this, we constructed SVM based models to quantitatively predict the promoter strength. A library of 100 promoter sequences and strength values was randomly divided into two datasets, including a training set (≥10 sequences) for model training and a test set (≥ 10 sequences) for model test. Results: The results indicate that the prediction performance increases with an increase of the size of training set, and the best performance was achieved at the size of 90 sequences. After optimization of the model parameters, a high-performance model was finally trained, with a high squared correlation coefficient for fitting the training set (R^2〉 0.99) and the test set (R^2〉 0.98), both of which are better than that of ANN obtained by our previous work. Conclusions: Our results demonstrate the SVM-based models can be employed for the quantitative prediction of promoter strength.