微博客消息中经常蕴含大量实时交通信息,有望与现有实时交通信息采集方式形成互补。该文针对微博客消息语义模糊性及用户描述差异性问题,提出了一种微博客消息蕴含交通信息的D-S证据理论提取方法。该方法首先构建微博客消息蕴含交通状态信息评价体系,利用百科知识提高评价精度,然后定义微博客消息源的基本概率分配函数,通过证据合成与证据决策,实现微博客消息蕴含实时交通信息的甄别与融合。实验结果表明,该方法能够对微博客消息蕴含实时交通信息的可信度进行有效判断,并能够在最大程度上利用不同微博客用户发布消息的信息内容,且较之传统的文本聚类融合方法具有更高的准确率。
Micro-Blog messages usually contain a great amount of real-time traffic information which can be expected to become an important data source for city traffic. In this paper, we propose an approach for extracting traffic information from massive micro-blogs based on D-S evidence theory to solve the data fusion problem brought by microblog's characteristics of high dynamic, uncertainty and ambiguous narrating. Firstly, an evaluation index system for the traffic information collected from the mass micro-blog messages is built, whose accuracy is enhanced by use of a wikipedia semantic model. Secondly, a function of basic probability assignment is defined for the micro-blog messages with the help of word similarity. Finally, the D-S theory is adopted to judge and fuse the extracted traffic information, throught evidence composition and decision. An experiment on Beijing road networks and Sina Micro-blog platform shows the presented approach can effectively judge the reliability of the traffic information contained in mass micro-blog messages, and can utilize the message contents delivered by different micro-blog users at utmost. Meanwhile, compared with traditional text clustering algorithm, the proposed approach is more accurate.