准确识别在线新闻的热点话题,有助于政府了解社会动向、企业洞察消费需求、学者追踪研究热点。为此,提出一种基于隐含狄利克雷分布和社会网络分析的在线新闻文本热点挖掘模型。首先,借助LDA主题模型对同一时期某一领域的新闻文本进行主题词提取,形成主题词共现结构网络。然后,采用社会网络分析方法对共现网络进行分析,构造主题词语的社会网络结构图谱,进行中心性分析、核心-边缘分析和凝聚子群分析,并以“可持续发展”领域为例,对该领域的热点进行识别。最后,分别与TD—IDF和LDA的主题抽取方法对比,并结合百度指数的验证,发现本文的方法能够有效地反映词语的重要程度和分布情况,具有较强的可移植性。
The purpose of paper is to identify hot topics of online news accurately, it is of great reference value for government to understand the social dynamics, enterprise to get consumers' demand, and researcher to find research hotspot. In this paper, a model for mining hot topic of online news text based on Latent Dirichlet Allocation and Social Network Analysis is developed. Firstly, some topics of news text are extracted by LDA topic model at the same period, then topic co-occurrence network is made by SNA. We construct social network structure map and analysis node centricity, core- periphery and cohesive subgroup. And an experiment is conducted by online news text in the field of sustainable development. Finally, comparing with methods of extracting topic based on TD-IDF and LDA respectively, this method can reflect the degree of importance and distribution of words effectively by the verification of Baidu Index, and has strong portability.