针对大规模微博中多实体间的稀疏关系数据,提出一种面向多实体稀疏关系数据的高效联合聚类算法。在算法中,为了充分利用多关系数据,提出了一种顽健的约束信息嵌入方法构建关系矩阵,降低了矩阵的稀疏性,进一步提高了算法的准确率。在稀疏约束的块坐标下降框架下,关系矩阵通过非负矩阵三分解算法同时获得不同实体的聚类指示矩阵。非负矩阵分解过程中,通过高效的投射算法实现快速求解,确保了聚类结果的稀疏结构。在人工和真实数据集上的实验表明,算法在3个指标上都具有明显提高,特别是在极端稀疏数据上的效果更加明显。
For large-scale sparse relation data of multi-entity in microblogging, an efficient co-clustering algorithm was proposed which processed sparse relation data of multi-entity. In order to take full advantage of multi-relational data when using this algorithm, a robust constraint information embedding algorithm was proposed to construct relation matrix, and the performance of relation mining was improved by reducing matrix sparsity. In the sparse constraint block coordinate descent framework, relation matrix concurrently obtained cluster indication matrix of different entities by non-negative matrix tri-factorization. In non-negative matrix factorization, to ensure sparse structure of clustering result, a quick solution was achieved through efficient projection algorithm. Experiments on synthetic and real data sets show that proposed algorithm goes beyond all the baselines on three indicators. The improvement is more significant especially when processing extremely sparse data.