环境致癌物可诱发人类或哺乳动物体内的肿瘤,建立环境致癌物的计算机预测模型对环境风险评价和生态安全具有重要的意义.通过构建了3780个化合物的数据集,随机选取其中3024个作为训练集,其余756个作为外部验证集;基于定量构-效关系(QSAR)方法,采用逐步判别分析和主成分分析建立数学模型.结果表明训练集非致癌物预测正确率为86.0%,可能致癌物的预测正确率为88.O%,而采用主成分建模时,非致癌物和可能致癌物的预测正确率分别为74.2%和73.1%.说明逐步判别分析法的结果优于主成分判别分析.同时确定了可能致癌物和非致癌物的分子结构参数,阐明了两者结构差异.以上结果为预测和评估环境致癌物提供参考依据.
The establishment of in silico model for predicting environmental carcinogens and non-carcinogens is helpful for the environmental risk evaluation. A dataset composed of 3 780 diverse compounds was built, in which 3 024 compounds are randomly selected as the training set, and the rest 756 compounds as the test set. A stepwise discriminant analysis (SDA) and a principal component analysis (PCA) methods were applied, resulting in several reliable quantitative structure-activity relationships (QSAR)-based models. The accuracies for non-carcinogens in the training set is 86.0% and for the possible carcinogens is 88.0% ;while the PCA obtains the accuracies for the non-and possible carcinogens are 74.2% and 73.1% ,respectively.The results showed that the SDA is superior to the PCA for the investigated data. The obtained results should be helpful for environmental safety evaluation.