针对现有的主题模型不能随时空情境准确反映主题的周期变化和空间分布的问题,根据互联网信息通常包含发布时间地点等情境数据的特点,提出一种用于主题跟踪的时空情境主题模型.首先将数据集的多主题分布与时空信息关联起来建立时空情境主题模型,对主题周期和强度进行描述;然后通过EM算法估计模型参数,利用该参数分别计算主题快照和主题周期;最后利用时序相似度计算判断后续主题信息,实现主题跟踪.食品安全事件主题跟踪的实验表明:与单纯依赖文本特征的主题跟踪方法相比,文中提出的方法能够明显提高跟踪效率和多个主题的跟踪准确性,这有助于进一步实现精准的主题信息检京.
As the existing topic model can not accurately reflect the periodic variation and spatial distribution of topics in spatiotemporal context, a spatiotemporal contextual topic model for topic tracking is proposed according to the fact that the Internet information often contains the publishing time and site. In the investigation, first, by associating the distribution of subtopics with spatiotemporal context, a model is established to describe the cycle and strength of topics. Then, the parameters of the proposed model are estimated through EM algorithm, and are employed to obtain the snapshot and cycle of topics. Finally, the time-based topic similarity is calculated to estimate the subsequent topic information, thus realizing the topic tracking. The tracking experiments of food safety events indicate that, as compared with the traditional topic tracking method only depending on the text features, the proposed method can obviously improve the tracking efficiency of the topic as well as the tracking accuracy of subtopics. It is thus concluded that the proposed method helps to achieve more accurate topic retrieval.