结合多种生物数据分析蛋白质相互作用网络(Protein-Protein Interaction Network,PPIN)中的功能模块结构,是目前蛋白质功能计算分析领域亟待解决的难题之一。本文提出了一种基于聚合非负矩阵分解(Collective Non-neg-ative Matrix Factorization,CoNMF)的多视图一致性功能模块检测方法,该方法同时逼近多视图数据,寻找统一的最优解达到对原多数据的最优近似。根据该统一解得到功能模块关系,同时该方法能够找到可重叠性的功能模块。实验结果显示本文所提出算法通过融合基因本体、基因表达谱与PPIN数据,在模块检测准确度上有一定提高,检测出的蛋白质功能模块具有真实生物意义。
Detecting functional modules from protein-protein interaction networks (PPINs)is an active research area with many practical applications .To date,multiple biological data sources are available such as gene expression data and gene ontology (GO).These data explain the biological roles of proteins from different views and provide additional information to alleviate false information in PPINs .This work focuses on extracting consistent information from diverse data sources .To address this problem,this work proposes a collective non-negative matrix factorization (CoNMF)method which efficiently integrates views of gene ontology, gene expression data and PPINs .In our method,the integration problem is reduced to optimimum approximations of multi-view data by the productions of their common matrix factor with basis matrices .As a result,the common matrix factor provides an intuitive in-terpretation of soft clustering .Extensive experiments show that CoNMF outperforms most of the baseline methods listed in the paper and is an effective method to extract functional modules in PPINs .