随着互联网的发展,网络数据呈现出异质数据多、文本标签化、数据不均衡等特点,这使得传统的基于长文本在线式网络事件的方法逐渐失效。采用改进的Single Pass方法进行在线式异质媒体网络事件发现:首先,通过分析网络数据中的不均衡性,重新设计相似度计算公式;其次,设计滑动时间窗口来提高Single Pass的算法效率;最后在Flickr的SED2014数据集上开展实验。实验结果表明,提出的算法具有有效性和实用性。
With the development of Internet,the web data has present the characteristics of heterogeneous,text tagging and imbalanced data,which leads to the failure of the traditional online event detection method based on long text. The improved Single Pass Algorithm was adopted to detect online heterogeneous media web events. On one hand,the similarity calculation formula based on the imbalanced data was redesigned. On the other hand,the slice-windows to improve single pass algorithm runtime was designed. The result on SED2014 dataset shows the effectiveness and practicality of algorithm.