科学文献主题挖掘可以帮助科研人员快速准确的捕捉学科主题的脉络结构,跟踪主题演化发展,并对学科发展趋势做出预测。本文将LDA(Latent Dirichlet Allocation,潜在狄利克雷分配)主题模型与科学文献生命周期理论结合起来,提出一种挖掘学科领域生命周期语义信息的方法。为了体现学科领域研究主题的动态语义信息,将科学文献按照时间特征划分,利用文献信息增长规律刻画学科领域生命周期,在此基础上,实现学科生命周期不同阶段、不同层次的主题抽取和主题演化。在由国内新能源领域的科学文献构成的语料库上的实验结果表明,该方法能够监测和追踪新能源研究热点和发展态势,能够为科学研究和科技政策制定提供决策支持。
Scientific literature topic mining could help researchers catch subject topic context structure, track topic development and make forecast. The paper combines LDA( Latent Dirichlet Allocation) topic model and scientific literature life cycle theory, to present a method, which could mine semantic information of subject area life cycle. In order to reflect the dynamic semantic information of subject topic, the paper divides scientific literatures by time character firstly and describes the subject area life cycle by literature" information growth rule. On this basis, we carry out topic extraction and topic evolution in different stages and levels of subject life cycle. Experimental results on real scientific literature corpus in domestic new energy research field demonstrated that the approach proposed in this paper could monitor and track research hot and development trend of new energy. So, it could provide decision support for scientific research and science and technology policy making.