由于网络舆情文本的数据量非常大,用人工方式很难从舆情文本中发现舆情热点。利用LDA(Latent Dirichlet Allocation)主题模型的文本降维及词语聚类功能。能够从海量的舆情文本中自动提取所关注的焦点主题词。但由于缺乏动态的时间分布机制,LDA难以捕捉随时间变化的热点词链。本文提出了加入动态时间层的DTD-LDA(Dynamic Time Distribution LDA)模型.增加了文档-时间和时间-主题的动态分布机制.改善了LDA主题词对时间变化的敏感性,可以有效提取迅速变化的舆情文本热点词链。实验表明,DTD-LDA相比较同类模型,在动态热点词链的提取上具有更好的准确率和召回率。
In view of the large amount of opinion data, it is difficult to effectively carry out the analysis and treatment of hotspots by manual way. The Latent Dirichlet Allocation topic model can reduce text dimension and realize words clustering, as well as extract the focus topical words from the large number of the public opinion text automatically. Due to the lack of time layer for dynamic distribution mechanism, LDA is unable to capture the hotspot word chains with the time variation. This paper put forward the dynamic time layer added model DTD-LDA, which forms the dynamic distribution mechanism in document-time and time-topic layer, and improves the sensitivity of topic words in LDA for the changing time, so makes the model find the hotspot words chain that change rapidly in deferent time effectively. The experiments show that DTD-LDA discovers hotspot word chains with better precision and recall than other similar topic models.