蛋白质相互作用网络的聚类算法研究是充分理解分子的结构、功能及识别蛋白质的功能模块的重要方法.很多传统聚类算法对于蛋白质相互作用网络聚类效果不佳.功能流模拟算法是一种新型聚类算法,但该算法没有考虑到距离的作用效果并且需要人为地设置合并阈值,带有主观性.文中提出了一种新颖的基于蜂群优化机理的信息流聚类模型与算法.该方法中,数据预处理采用结点网络综合特征值的排序来初始化聚类中心,将蜂群算法的蜜源位置对应于其聚类中心,蜜源的收益度大小对应于模块间的相似度,采蜜蜂结点的所有邻接点按照结点网络综合特征值的降序排列,作为侦察蜂的搜索邻域.采用正确率、查全率等指标对聚类效果做出客观评价,并对算法的一些关键参数进行仿真、对比与分析.结果表明新算法不仅克服了原功能流模拟算法的缺点,且其正确率和查全率的几何平均值最高,能够有效地识别蛋白质功能模块.
The clustering algorithm of Protein-Protein Interaction (PPI) networks is an impor tant method to fully understand the organizations and functions of molecules and identify the functional modules of protein. There are lots of traditional clustering algorithms which do not perform well in clustering PPI networks. Recently functional flow simulation algorithm is a novel clustering algorithm. However, it does not take the effect of distance into account and the mer- ging threshold is set manually which is subjective. This paper proposes a novel information flow clustering model and algorithm based on the mechanism of Artificial Bee Colony (ABC) optimiza- tion. This method firstly sorts the network comprehensive feature value of nodes to initialize the cluster centers during the procedure of data pre-processing. The nectar source of ABC algorithm is corresponding to cluster center, the income level of nectar stands for the similarity between modules. Afterwards all the adjacent nodes of employed bee node are sorted in the descending or- der according to the network comprehensive feature value of nodes, which are regarded as the searching neighborhood of scouts. In the end, the algorithm adopts precision, recall and other criteria to evaluate the cluster effect in an objective way. In addition, some significant parameters of the algorithm is simulated, compared and analyzed. The experiment results show that the new algorithm not only overcomes the shortcomings of original algorithm, but also the harmonic mean value of precision and recall gets greatly improved, which can effectively identify the functional modules of protein.