与有效成分相作用靶点的识别是从分子水平阐明中药作用机理的关键步骤。本文提出1种新的思路来解决中药化学成分的靶点识别问题。该方法基于小分子化合物的结构,计算表征分子组成、电荷分布、拓扑、几何结构及物理化学性质的分子描述符,经BestFirst搜索策略和CfsSubsetEval评估策略相结合的方法筛选出与靶点作用活性相关的分子描述符。采用径向基神经网络、朴素贝叶斯和随机森林3种机器学习方法构建一系列靶点的识别模型,后期将所建模型整合成靶点识别系统,进而预测中药有效成分的作用靶点。采用10折交叉验证,3种方法得到总的预测正确率分别为83.33%-95.71%、84.62%-96.43%、82.14%~95.59%,识别过程在(0.02~0.19)秒完成。实验结果证明该方法不但简单有效,更主要的是满足面向中药化学成分的靶点识别任务对辨识效率的要求。
Revealing the interactions between target proteins and chemical compositions of Traditional Chinese Medicine (TCM) is beneficial to understand the substances and mechanisms of TCM at a molecular level. The problem of identifying the targets of TCM chemical constituents is a cornerstone challenge in constructing the TCM chemical constituent-target interaction networks. It is therefore necessary to develop improved methods that are effective, fast and scalable. Here a new method is proposed for the identification of targets for TCM chemical constituents. Main steps of this method are as follows: Firstly calculate molecular descriptors including constitutional, charge distribution, topological, geometrical, and physicochemical descriptors to characterize the molecular structure of small molecular drugs, and then select the molecular descriptors relevant to the activity of the drug-target interactions, using the approach combing BestFirst and CfsSubsetEval, further establish targets identification models by three machine learning methods. The models were validated through 10:fold cross-validation (CV) and the total prediction accuracies were 83.33%-95.71%, 84.62%-96.43%, 82.14%-95.59%, accomplishing in 0.02~0.19 second, The final results reveal that the proposed method is reliable, and overcomes the efficiency problem of targets identification. This paper therefore suggests that the proposed method used in the research of targets identification may be a good strategy in exploring the molecular mechanism of TCM.