为深入认识含氟农药生物活性与其结构之间的关系,建立了理想的QSAR模型,从化合物油水分配系数等7个分子结构描述符出发,基于支持向量回归(SVR)和MSE最小原则,经自动寻找最优核函数和非线性筛选描述符,构建了多个K-最近邻(KNN)预测子模型.再经非线性筛选获得保留子模型,以保留子模型实施组合预测(Multi—KNN—SVR).33种含氟化合物对5种不同病害生物活性的留一法组合预测结果表明,采用非线性筛选描述符和KNN子模型能有效地提高预测精度,基于多个KNN子模型的非线性组合能进一步提高预测性能.Multi—KNN—SVR组合预测在QSAR以及其它相关预测研究中具有广泛应用前景.
To further understand the quantitative structure-activity relationship (QSAR) of fluorine-containing pesticide and improve the prediction precision of QSAR models, a novel nonlinear combinatorial forecast method named Multi-KNN-SVR, multi-K-nearest neighbor based on support vector regression, was proposed. The novel method includes the following key steps: firstly, seeking the best kernel automatically based on the minimum mean square error (MSE) ; secondly, screening descriptors nonlinearly by F-test; finally, carrying out the combinatorial forecast with multiple KNN sub-models. Muhi-KNN-SVR was applied to the QSAR for the antibacterial bioactivities of 33 fluorine-containing pesticides against 5 different plant diseases. The results of leave-one-out test show that screening descriptors and sub-models were essential, and the combinatorial forecast after screening sub-models could get a better precision than single KNN model. The predicte results also indicated that Muhi-KNN-SVR had the advantages of high prediction precision ( MSE = 0. 005-0. 015, MAPE- 2. 136-3. 164) , high stability, strong generalization ability, structural risk minimization, non-linear characteristics and avoiding the over-fit in all reference models. Muhi-KNN-SVR, therefore, can be widely used in QSAR and other related fields.