微博客消息中可能蕴含大量描述城市道路的交通信息,如交通状况、交通事件、交通管制等,提取这些交通信息能够为传统的固定式传感器和浮动车采集交通信息手段提供有效补充。然而,微博客消息描述的模糊性、差异性及非结构化特征,使得从海量微博客消息中快速准确地提取和甄别交通信息成为难题。提出一种从微博客消息中快速提取和融合交通信息的技术方法,首先对采集到的微博客消息进行分词解析和路网匹配,然后采用基于神经网络的模糊C聚类方法对描述路段交通状态的微博客消息定量化结果进行分析,获取各路段置信度最高的交通状态描述,最后得到各路段的交通畅通度水平。基于新浪微博客和北京路网的实验过程验证了本文技术方法的有效性。
Micro-blog messages usually contain a great deal of traffic information such as traffic conditions, traffic events and traffic controls, which can be useed as a complement to conventional traffic information collection technologies like fixed sensors and floating cars. However, due to ambiguous narrating, uncertainty, and the unstructured characteristics of micro-blog messages, extracting traffic information from micro-blog messages is rather difficult. In this paper, we propose an approach for extracting traffic information from a large amount of micro-blog messages. First, we build a traffic informa- tion table by semantically extracting traffic related words from micro-blog messages and matching each word onto the corre-sponding road segment of the road networks. Then, according to the traffic information table, we evaluate the highest confidence level of traffic condition for each road segment by using a neural network based Fuzzy-C-Means ( FCM ) clustering method, to obtain the most confident road conditions. Experiments on Beijing road networks with a large number of Sina mi- cro-blog messages verify the effectiveness of the presented approach.