针对标签空间的指数增长这一问题,提出了一种基于潜在特征的重叠社团识别算法。首先,提出了一种包含重叠社团的网络产生式模型。根据该产生式模型,通过最大化目标网络的产生概率来推导网络中节点的潜在特征,并给出了优化目标函数。然后,通过将网络诱导为二部图,分析得出了潜在特征个数的下届,并据此对标签空间进行优化。实验表明,提出的重叠社团识别算法与BigClam算法相比较,在保持运行效率和查准率基本不变的前提下,可以明显提高检索结果的召回率。该算法可以有效地应对社团识别中标签空间的指数增长。
In order to solve the problem of exponential increase of label space, an overlapping community discovery algorithm based on latent feature was proposed. Firstly, a generative model for network including overlapping communities was proposed. And based on the proposed generative model, an optimal object function was presented by maximizing the generative probability of the whole network, which was used to infer the latent features for each node in the network. Next, the network was induced into a bipartite graph, and the lower bound of feature number was analyzed, which was used to optimize the object function. The experiments show that, the proposed overlapping community discovering algorithm can improve the recall greatly while keeping the precision and execution efficiency unchanged, which indicates that the proposed algorithm is effective with the exponential increase of label space.