社会化标签中普遍存在标签的主题粒度和文档不一致以及部分标签和文档内容无关这两个问题,而现有基于主题模型的社会化标签推荐算法并没有同时对二者进行建模.针对这两点,提出了一种新的主题模型,该模型不仅允许标签和文档具有各自的主题粒度,而且允许标签来自与文档无关的噪声主题.在两个不同的社会化标签语料上的实验结果表明,所提出的模型相比内容相关模型和标签的隐含狄利克雷分配模型,在混淆度和平均正确率均值这两个指标上均有所提高.
It is common that the topic-granularity of social tags is not consistent with correspondent document, and some tags cannot describe the topic of the document content. The existing topic models-based tag recommendation did not address the foregoing problems simultaneously as well. Motivated by the fact, the proposed novel topic model allows different granularity of word topics and tag topics, and assumes that the tags can originate from a general distribution unrelated to the content. Experimental results show that the proposed model outperforms content relevance model (CRM) and tag-logical device address (tag- LDA) on two different social tagging corpora in both perplexity and mean average precision.