生物信息学的一个关键的研究课题是理解细胞的分子机制,这依赖于对基因所决定的每一条蛋白质的含义或者功能的理解.一般通过与一条或多条功能已知的蛋白质的相似性比较来推测未知蛋白质的功能,其中,基于支持向量机的一些算法取得了很好的成果.SVM—pairwise算法是当前最好的基于支持向量机的算法中的一个,该方法利用两条序列的相似性来将蛋白质序列转化为固定长度的向量.文中提出了一种新的利用支持向量机算法对蛋白质序列进行分类的方法,这种方法使用位点进化距离代替两条序列的比对得分,该方法比SVM—pairwise有着显著的改善,在蛋白质结构分类数据库(SCOP)上进行的实验表明,该方法具有比SVM—pairwise更好的分类性能.
An important research topic in bioinformatics is to understand the meaning and function of each protein encoded in the genome. One of the most successful approaches to this problem is via sequence similarity with one or more proteins whose functions are known. The SVM based methods are among the most successful ones. Currently, one of the most accurate homology detection method is the SVM-pairwise method. This method combines the pairwise sequence similarity with Support Vector Machine. This paper presents an alternative for SVM-based protein classification. The method, SVM-PSV, uses a new sequence similarity kernel, the Position Specific Values (PSV) kernel, for use with Support Vector Machines (SVMs) to solve the protein classification problem. The resulting algorithm gives better recognizing accuracy in the comparison with state-of-art methods, including SVM-pairwise, in the experiments of the detection of the homology based on the SCOP database. In the respect of computational efficiency, this method is significantly better than the SVM-pairwise one.