蛋白质功能预测是后基因组时代生物信息学的核心问题之一.蛋白质功能标记数据库通常仅提供蛋白质具有某个功能(正样例)的信息,极少提供蛋白质不具有某个功能(负样例)的信息.当前的蛋白质功能预测方法通常仅利用蛋白质正样例,极少关注量少但富含信息的蛋白质负样例.为此,提出一种基于正负样例的蛋白质功能预测方法(protein function prediction using positive and negative examples,ProPN).ProPN首先通过构造一个有向符号混合图描述已知的蛋白质与功能标记的正负关联信息、蛋白质之间的互作信息和功能标记间的关联关系,再通过符号混合图上的标签传播算法预测蛋白质功能.在酵母菌、老鼠和人类蛋白质数据集上的实验表明,ProPN不仅在预测已知部分功能标记蛋白质的负样例任务上优于现有算法,在预测功能标记完全未知蛋白质的功能任务上也获得了较其他相关方法更高的精度.
Predicting protein function is one of the key challenges in the post genome era.Functional annotation databases of proteins mainly provide the knowledge of positive examples that proteins carrying out a given function,and rarely record the knowledge of negative examples that proteins not carrying out a given function.Current computational models almost only focus on utilizing the positive examples for function prediction and seldom pay attention to these scarce but informative negative examples.It is well recognized that both positive and negative examples should be used to achieve a discriminative predictor.Motivated by this recognition,in this paper,we propose a protein function prediction approach using positive and negative examples(ProPN)to bridge this gap.ProPN first utilizes a direct signed hybrid graph to describe the positive examples,negative examples,interactions between proteins and correlations between functions;and then it employs label propagation on the graph to predict protein function.The experimental results on several public available proteomic datasets demonstrate that ProPN not only makes better performance in predicting negative examples of proteins whose functional annotations are partially known than state-of-the-art algorithms,but also performs better than other related approaches in predicting functions of proteins whose functional annotations are completely unknown.