基于词袋的主题模型其最终主题表示存在着表意不清、可读性差等问题,为解决此问题,提出将事件作为文档和主题描述的基本元素进行主题建模。鉴于事件的稀疏性,采用基于Biterm的主题模型,并在主题推断时结合generalized Pólya urn(GPU)模型加入事件间关联性的先验知识进行指导监督,从共现和语义两个层面削弱了事件稀疏性对主题生成的负作用。实验结果表明,该算法得到的主题可解释性较好且聚类效果提升明显。
Topic models based on bag-of-words got topics represented by individual words which were difficult to understand.This paper thus proposed using events as documents' terms,meanwhile,considered Biterm topic model combined with GPU model which incorporated event correlation knowledge for inference,alleviated the adverse effect of event sparsity from co-occurrence and semantic two aspects. Experimental results show that the proposed method guarantees both interpretable and coherent topics.