微博数据是短文本事件探测的典型数据源,由于微博内容的多样性、稀疏性和碎片性,现有事件探测方法使用的数据源单一且噪声较大,在时空信息的发现上粒度过大,导致结果的准确性差。因此,在事件探测算法上提出动态上下文窗口算法,构建候选微博进行事件探测,提高了事件探测的效率和精度。并提出利用微博内容发现特定事件地理位置信息的算法,提高了事件时空信息的获取精度。最后应用于食源性疾病事件的自动探测中,相比以往的事件探测方法,扩大了数据来源,且时间和空间维度上的准确性得到显著提高。
Micro Blog is a typical short text data source for event detection. Because of diversity, sparsity and debris in Micro Blog content, using existing event detection method is ineffective, and the event spatio-temporal information is inaccurate. To the end, a dynamic context window algorithm was proposed, improved the efficiency and precision of event detection of foodborne diseases based on Micro Blog. Moreover, an algorithm was developed which can get spatio-temporal information from Micro Blog more accurate. Finally, extensive experiments on event detection of foodborne diseases show the proposed method can help to expand the data source and improve the accuracy of time and space dimension.