利用隐含狄利克雷分配模型(LDA),根据科技文献往年的主题变化来分析科技文献主题的演化,是目前主题演化研究的热点。根据科技论文的主题演化具有无后效性的特点,使用马尔可夫链来预测主题的演化信息。该方法利用LDA模型获取不同时段的主题,使用相似度等方法对相邻时间窗口的主题进行关联,并根据主题的强度将主题分为热门主题、普通主题和冷门主题,最后利用马尔可夫链得到主题之间的强度转移概率矩阵,对主题的强度变化趋势进行分析和预测。对NIPS论文集进行实验表明,科技论文主题在长日寸间演化后,其状态占比趋于稳定,热门主题、普通主题和冷门主题占比将保持在30%、60%和10%左右。说明该方法能有效地根据现有的主题演化结果对主题在未来几年的演化信息进行预测。
According to the change of the topic of scientific papers in previous years, to analyze the evolution of scientific papers based on Latent Dirichlet Allocation (LDA) is the current research focus. Through the aftereffect for topic evolution of scientific paper, Markov Chain is used to predict the evolution information of topic. In this method,LDA is used first to obtain the topics in different time win- dows, then some calculation method like similarity is uset to associate with topics in neighboring time window. According to the intensity of topics, these topics are divided into 3 states including popular, normal and cold. Finally, the state transition matrix which is gained by the Markov Chain is used to analyze and forecast the treld of topic evolution. The experiment on proceedings of NIPS shows that after a long period evolution, the state proportion of topics of scientific papers is stabilized, with hot 30%, normal 60% and cold 10% remained, which shows that this method can effectively predict the :rend of topic evolution in the next few years according to the existing evolution- ary information.