我们构建了表征乙酰胆碱酯酶抑制剂分子组成、电荷、拓扑、几何结构及物理化学性质等特征的1559个描述符,通过Fischer Score排序过滤和Monte Carlo模拟退火法相结合进行变量筛选得到37个描述符,然后分别用支持向量学习机(SVM)、人工神经网络(ANN)和k-近邻(k-NN)等机器学习方法建立了乙酰胆碱酯酶抑制剂的分类预测模型.对于训练集的515个样本,通过五重交叉验证,各机器学习方法对正样本,负样本和总样本的平均预测精度分别为87.3%-92.7%,67.0%-81.0%和79.4%-88.2%;通过y-scrambling方法验证SVM模型是否偶然相关,结果正样本,负样本和总样本的平均预测精度分别为72.7%-82.5%,41.0%-53.0%和62.1%-69.1%,明显低于实际所建模型的预测精度,表明所建模型不存在偶然相关;对172个没有参与建模的外部独立测试样本,各机器学习方法对正样本,负样本和总样本的预测精度分别为93.3%-100.0%,74.6%-89.6%和86.1%-95.9%.所建模型中,SVM模型预测精度最好,且明显高于其它文献报道结果.
A total of 1559 molecular descriptors including constitutional, charge distribution, topological, geometrical, and physicochemical descriptors were calculated to encode acetylcholinesterase inhibitors. The 37 molecular descriptors were selected using a hybrid filter/wrapper approach by combining a Fischer Score and Monte Carlo simulated annealing. Classification models for the acetylcholinesterase inhibitors were then built based on support vector machine (SVM), artificial neural networks (ANN), and k nearest neighbor (k NN) methods. For the 515 samples in the training set, we obtained average prediction accuracies of 87.3%-92.7%, 67.0%-81.0%, and 79.4%-88.2% for the positive, the negative, and the total samples, respectively, by 5 fold cross validation. Average prediction accuracies of 72.7%-82.5%, 41.0%-53.0%, and 62.1%-69.1% were obtained for the positive, the negative, and the total samples, respectively, by the y scrambling method, indicating that there was no chance correlation in our models. An external test was conducted on 172 samples that were not used for model building and we obtained prediction accuracies of 93.3%-100.0%, 74.6%-89.6%, and 86.1%-95.9% for the positive, the negative, and the total samples, respectively. The prediction accuracies obtained by all the machine learning methods especially by the SVM method were far better than previously reported results.