在丙型肝炎病毒(HCV)的基因复制和蛋白质成熟的过程中,非结构蛋白5B(NS5B)作为RNA依赖的RNA聚合酶起到了重要的作用.抑制NS5B聚合酶可以阻止丙型肝炎病毒的RNA复制,因此成为一种治疗丙型肝炎的有效方法.通过计算机方法进行虚拟筛选和预测NS5B聚合酶抑制剂已经变得越来越重要.本文主要采用机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5DT))对已知的丙型肝炎病毒NS5B蛋白酶抑制剂与非抑制剂建立分类预测模型.1248个结构多样性化合物(552个NS5B抑制剂与696个非NS5B抑制剂)被用于测试分类预测系统,并用递归变量消除法选择与NS5B抑制剂相关的性质描述符以提高预测精度.独立验证集的总预测精度为84.1%-85.0%,NS5B抑制剂的预测精度为81.4%-91.7%,非NS5B抑制剂的预测精度为78.2%-87.2%.其中支持向量机给出最好的NS5B抑制剂预测精度(91.7%);C4.5决策树给出最好的非NS5B抑制剂预测精度(87.2%);k-最近相邻法给出最好的总预测精度(85.0%).研究表明机器学习方法可以有效预测未知数据集中潜在的NS5B抑制剂,并有助于发现与其相关的分子描述符.
Non-structural proteins 5B (NS5B) play an important role in protein maturation and gene replication as an RNA dependent RNA polymerase in the hepatitis C virus (HCV). Inhibiting NS5B polymerase will prevent RNA replication and, therefore, it is significant for the treatment of HCV. It is becoming increasingly important to screen and predict molecules that have NS5B inhibitory activity by computational methods. This work explores several machine learning (ML) methods (support vector machine (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) for the prediction of NS5B inhibitors (NS5BIs). This prediction system was tested using 1248 compounds (552 NS5BIs and 696 non- NS5BIs), which are significantly more diverse in chemical structure than those used in other studies. A feature selection method was used to improve the prediction accuracy and the selection of molecular descriptors responsible for distinguishing between NS5BIs and non-NS5BIs. The prediction accuracies were 81.4%-91.7% for the NS5BIs, 78.2%-87.2% for the non-NS5BIs, and 84.1%-85.0% overall based on the three kinds of machine learning methods. SVM gave the best accuracy of 91.7% for the NS5BIs, C4.5 gave the best accuracy of 87.2% for the non-NS5BIs, and k-NN gave the best overall accuracy of 85.0% for all the compounds. This work suggests that machine learning methods can facilitate the prediction of the NS5BIs potential for unknown sets of compounds and to determine the molecular descriptors associated with NS5BIs.