主题模型是用于识别博客、网络社区、微博等社交网络平台上用户关注点的重要手段。考虑到社交网络平台上短文本主题识别的特殊性,该文根据短文本内容在上下文上的相关性,提出一种基于混合权重合并策略的AW-LDA模型。该模型将符合上下文相关条件的短文本进行虚拟合并,并根据上下文相关程度对不同短文本赋予不同的权重,构建了一种新的短文本主题识别方法。通过网络BBS社区与微博社区两组数据的实验,该模型能够有效识别不同话题下社交网络用户关注点,为解决短文本主题识别问题提供了新的解决思路。
It is an important measure to utilize the topic model to recognize the users' focuses on social networks, such as blog, online community, and microblog. Considering the particularity of topic recognizing of short texts on the social network platform, this paper develops an AW-LDA model based on mixed-weight combined strategy according to the relevance of short texts' context. This model virtually combines short texts, which are in line with contextual-related conditions, and endows different short texts with different weights according to the related extent. It proposes a new method of recognizing short texts' topics. According to the experiments on data of BBS and Weibo communities, the results show that the model can effectively recognize social network users' focuses on different subjects and it proposes a new idea about solving the topic recognition problem of short texts.