针对3-状态隐马尔可夫模型(HMM)预测蛋白质二级结构准确率不高的问题,提出了7-状态和15-状态HMM。研究对象为CB513数据集合中筛选出的492条蛋白质序列,将其随机均分7组。分别应用7-状态和15-状态HMM对以上数据集进行二级结构预测,对预测准确率进行了7-交叉验证,并将预测结果与应用3-状态HMM的预测结果进行了比较。结果表明,应用7-状态HMM,Q,准确率提高3.11%,SOV提高6.15%,QE提高6.49%;应用15-状态HMM,QF比7-状态HMM又提高5.74%。在15-状态HMM预测中加入序列的同源信息后,Q3准确率比单序列15-状态HMM增加8.76%。结果表明,7-状态HMM预测能力优于3-状态HMM,15-状态HMM总体预测能力和7-状态HMM相当,但8折叠预测能力强于7-状态HMM。
In view of the lower accuracy of 3-state hidden Markov model (HMM) for protein secondary structure prediction, the study proposed 7-state HMM and 15-state HMM and applied them to prediction of the secondary structure of 492 proteins selected from the dataset CB513, and divided them into 7 even subsets. The prediction accuracy for the two models were evaluated by 7-fold cross validation and the prediction results were compared with 3-state HMM' s. It was found that the Q3, SOV and QE of 7-state HMM were increased by 3.11%, 6.15 % and 6.49% respectively compared with 3-state HMM. The QE of 15-state HMM was increased by 5.74% compared with 7-state HMM. The Q3 of 15-state HMM was 8. 76% higher when using multiple sequence alignments than using single sequence. The results show that the prediction ability of 7-state HMM is better than 3-state HMM. 15-state HMM is similar to 7-state HMM in prediction ability, but superior in β-strand prediction.