随着互联网的高速发展,网络信息呈现爆炸性增长态势,主题演化分析能够帮助人们从海量的互联网数据中获取更有价值的信息。分析主题的演化发展轨迹有利于人们了解主题事件发生的前因后果,并对主题事件发展趋势进行更好地预测,进而辅助管控。针对单个主题演化分析方法中阈值设定和主题漂移的问题,提出一种LDA—AP主题演化模型。该方法利用LDA模型对不同时间窗口内的新闻文本分别进行建模,得到相应的主题。利用AP聚类算法对不同时间窗口内的多个主题进行聚类,其中计算主题相似度采用加入时间衰减因子的JS散度来度量。最后对多个主题内容进行演化分析。通过相关的实验分析和对比,结果表明该方法可以改善主题演化的性能,并能较好地分析多个新闻主题事件随时间的演化趋势。
With the rapid development of Intemet, the network information presents explosive growth, and the topic evolution analysis can help people get more valuable information from the massive Intemet data. Evolutionary trajectory analysis of the topic is helpful for people to understand the antecedents and consequences of the event and to better predict the development trend of theme events, assistance of control. Aiming at the problem of threshold setting and topic shift in the method of a single event evolution analysis,a new LDA-AP model is proposed. In this method, the LDA model is used to model the news texts in different time windows, and the topic of different time windows is obtained. Then the AP clustering algorithm is used to analyze the multiple topic in different time windows,in which topic similarity calculation using the JS divergence with attenuation factor to measure. Finally the evolution analysis of multiple topic is conducted. Through experimental comparison with the reference method, the results show that the proposed method can effectively improve the performance of the topic evolution, and the evolution trend of multiple news events with time is better analyzed.