流感是一种主要的呼吸道传染病,在普通人群中有着较高的发病率,而对于一些年老和高危病人还有较高的死亡率,研究显示抑制神经氨酸苷酶fNA)可以阻断病毒RNA复制,因此NA是有效治疗H1N1型流感病毒的重要药物靶标.通过计算机方法进行虚拟筛选和预测NA抑制剂已经变得越来越重要.针对酶活性位点进行基于结构的合理药物设计,开发H1N1病毒神经氨酸苷酶抑制剂,已成为药物研究的热点之一.本文通过多种机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5DT))对已知的神经氨酸苷酶抑制剂(NAIs)与非神经氨酸苷酶抑制剂(non-NAIs)建立分类预测模型.其中227个结构多样性化合物(72个NAIs与155个non.NAIs)被用于测试分类预测系统,并用递归变量消除法选择与神经氨酸苷酶抑制剂分类相关的性质描述符以提高预测精度.本研究对独立验证集的总预测精度为75.9%-92.6%,NA抑制剂的预测精度为64.3%-78.6%,非H1N1抑制剂的预测精度为77.5%-97.5%.SVM法给出最好的总预测精度(92.6%).本研究表明支持向量机等机器学习方法可以有效预测未知数据集中潜在的NA抑制剂,并有助于发现与其相关的分子描述符.
Influenza is a major respiratory infection associated with significant morbidity in the general population and mortality in elderly and high-risk patients. Research has shown that inhibiting neuraminidase (NA) prevents RNA replication, so NA is an important drug target in the treatment of H1N1 influenza virus. It is becoming increasingly important to screen and predict molecules that have NA inhibitory activity by computational methods. In this work, we explored several machine learning methods (support vector machine (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) for predicting NA inhibitors (NAIs). These predictive systems were tested using 227 compounds (72 NAIs and 155 non-NAIs), which were significantly more diverse in chemical structure than those used in other studies. A feature selection method was used to improve the accuracy of the predictions and the selection of molecular descriptors responsible for distinguishing between NAIs and non-NAIs. The prediction accuracies were 75.9%-92.6% for all the compounds, 64.3%-78.6% for NAIs, and 77.5%-97.5% for non-NAIs. The SVM method gave the best total accuracy of 92.6% for all of methods. This work suggests that machine learning methods can be useful to predict potential NAIs from unknown sets of compounds and to determine molecular descriptors associated with NAIs.