提出一种基于LDA模型的K-means聚类的话题发现,并在网络食品安全问题中进行效果验证.该算法中使用LDA模型对文档空间建模,并选取文档对主题的概率分布作为每篇文档的向量,利用K-means算法对这些向量进行聚类处理,最终得到话题发现的结果.为了验证试验的效果,还进行了1组使用传统的VSM模型下的Kmeans算法的实验作为对照组.通过在涵盖43个食品安全分类的1 920条新闻报道和腾讯微博的数据上的实验,记录了6个不同迭代次数下的结果并得到平均值,实验结果表明该方法在3个评估指标P、R、F上都比传统方法提高了20%.
This paper presents an algorithm for the topic detection of food safety problems, which is using K-means clustering algorithm based on the latent dirichlet algorithm (LDA) model. The algorithm model- ing the document space with LDA model, and select the probability distribution of the themes to the docu- ment as a vector of each document, process the vectors with K-means clustering algorithm, and finally get the results of the topic detection. In order to verify the effect of the test, this paper also carries out a set of traditional experiment as a control group, by using K-means algorithm based on VSM mode. Through the experiments on the data which covers 43 classes of 1920 news and Tencent micro blogging, we record the results of six experiments under different iterations and take the average. The experimental results show that the method proposed has a 20 percent increase than traditional methods on the three evaluation indices P, R and F.