受CURE聚类算法的启发,在分析了动态演化特性的基础上,提出了一种面向动态演化特性的双质心话题模型,以解决话题动态演化特性对话题检测的影响。该模型动态地建立分界点,以其为界将话题表示成初始质心和当前质心两个质心。初始质心代表分界点之前话题所关注的内容,当前质心表示从分界点到当前时间之间话题所关注的内容。提出了基于时间和词分布密度两种不同的分界点确定方法。详细描述了分界点、初始质心、当前质心的建立及更新方法。最后对基于双质心话题模型的英语话题检测算法进行了研究探讨,通过实验证明了该算法的有效性。
Inspired by the CURE algorithm, on the basis of analyzing the dynamic evolvement properties, the authors proposed a dynamic evolvement-orient topic model based on the double centroids to solve the negative influence of the topic's dynamic evolvement properties on topic detection. This topic model dynamically chooses a division point, and expresses a topic as double centroids, i.e. the initial centroid and the current centroid. The initial centroid is about the contents involved before division point, and the current centroid is about the contents interested between the division point and the current time. This paper researches into two distinct methods to create division point, which are based on time and distribution density, respectively. This paper depicts in detail the creation and the modification of the division point, the initial centroid and the current centroid, and finally discusses the English topic detection algorithm based on the double centroids topic model, which is proved to be successful by experiments.