链路预测是数据挖掘研究的主要问题之一.由于网络的复杂性、数据的多样性,根据网络结构及已有信息对异质网络中的不同类型的数据进行链路预测的问题也变得更加复杂.针对双类型异质信息网络,提出了一种基于聚类和决策树的链路预测方法 CDTLinks.通过将网络中2种类型对象互为特征的方法得到对象的特征表示,并分别进行聚类.对于双类型异质网络提出了3种启发式规则来构建决策树,根据信息增益来选择树中不同分支.最后,根据聚簇分布结果以及决策树模型来判断任意2个不同类型节点之间是否存在链接.另外,定义了潜在链接节点并引入层数的概念,在降低算法运行时间的同时提高了准确率.在DBLP和AMiner数据集上验证了提出的CDTlinks方法,结果表明:在双类型异质网络中,CDTlinks模型能够有效地进行链路预测.
Link prediction is one of the primal problems in data mining.Due to the network complexity and the data diversity,the problem of link prediction for different types of data in heterogeneous networks has become more and more complicated.Aiming at link prediction in bi-typed heterogeneous information network,this paper proposes a link prediction method based on clustering and decision tree,called CDTLinks.One kind of objects is considered as the features of the other kind of objects.Then,they are clustered separately.Three heuristic rules are proposed to construct decision trees for bi-typed heterogeneous networks.The branch of the tree with the highest information gain is selected.Finally,we can judge whether there is a link between two nodes through the clustering result and the decision tree model.In addition,we define the concept of potential link nodes and introduce the number of layers,which can reduce the running time and improve the accuracy.The proposed CDTlinks method is validated on DBLP and AMiner datasets. The experimental results show that the CDTlinks model can be used to conduct link prediction effectively in bi-typed heterogeneous networks.