传统的蛋白质亚核定位利用单一序列特征表达导致信息不足,且表达与定位孤立导致信息不充分利用,为此利用伪氨基酸组成和位置特异性得分矩阵,收集到氨基酸物理化学特性信息和蛋白质进化信息,从而形成信息丰富的融合表达。在该基础上利用有监督局部保持投影学习数据低维流形,进而得到类间分割、类内保持的低维判别特征。然后依据此数据分布,适用最近邻分类器预测亚核位置。最后在标准数据集上,十折交叉验证的评估结果表明:该方法相较于已有方法在精度上有较大提升。
The drawbacks of traditional methods of protein sub-nuclear localization are the insufficient information of single feature sequence representations,and the independent relationship between sequence representation and prediction methods. Therefore a fusion representation is constructed by combining pseudo amino acid composition with position specific scoring matrix. From these two single representations,the physical and chemical characteristic information of amino acids and protein evolution information are collected respectively. The low dimensional discriminant features are obtained with the inter-class segmenting and inner-class maintaining characteristics by supervised locality preserving projection learning data low-dimensional manifold. Then depending on the data distribution,nearest neighbor classifier is employed to predict sub-nuclear locations. Finally on the standard data sets,the evaluate results by 10-fold cross validation show that the proposed method has significant improvement in accuracy compared with the existing methods.