在抗艾滋病治疗中,HIV-1蛋白酶抑制剂发挥着重要作用。对于HIV-1蛋白酶裂解作用位点的研究有助于找到新的治疗靶点。为了对HIV-1蛋白酶特异位点进行预测,本研究用氨基酸索引数据库(AminoAcidIndex,AAIndex)中的531个氨基酸物理化学性质参数直接表征肽样本的结构,通过二层特征筛选,最终将4248个表征参数降为57个表征参数。分别采取四种核函数进行HIV-1蛋白酶特异位点的支持向量机(SVM)建模,并通过10折交叉验证及外部测试集方法来验证建模的准确性。结果表明选取NormalizePolyKernel核函数进行SVM建模效果优于其他核函数(PolyKernel、PUK、RBFKemel),所建立的模型对于训练集的10组交叉验证预测准确率达到93.947%,对于外部测试集的预测正确率达到93.684%。
The HIV - 1 protease inhibitor plays an important role in the therapy of AIDS. The research on HIV - 1 protease' s cleavage site will be useful to found new therapeutic targets. To predict the HIV - 1 protease specific site, we apply Amino Acid Index(AAIndex) ' s 531 amino acid' s parameter of chemical and physical to present the structure of peptide sample. And based on two stage feature selection method , 57 features are selected from origi- nal 4248 features. By using four kernel function of support vector machine ( SVM), HIV - 1 protease specific site' s model is built. Our research showed the modeling by the kernel function of NormalizePolyKernel had the higher prediction rate than other three kernel function. As a result, the accuracy rate of prediction achieves 93. 947 % and 93. 684% for corss validation test and an independent set test, respectively.