东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种中文自然语言表达交通信息的跨阶分词算法

期刊名称：武汉大学学报(信息科学版)
时间：0
页码：943-947
语言：中文
分类：P208[天文地球—地图制图学与地理信息工程;天文地球—测绘科学与技术]
作者机构：[1]中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室,北京市朝阳区大屯路甲11号100101, [2]中国矿业大学北京资源与安全工程学院,北京市海淀区学院路丁11号,100083, [3]福州大学福建省空间信息工程研究中心,福州市工业路523号350002
相关基金：项目来源：国家863计划资助项目（2006AA12Z209,2007AA12Z241）;国家自然科学基金资助项目（40871184）;中国科学院知识创新工程重点方向性资助项目（KZCX2-YW-308）.
相关项目：自然语言表达城市交通信息的融合与应用技术研究

关键词：交通信息, 中文自然语言处理, 分词, 跨阶法, traffic information natural Chinese processing, word segmentation cross-step algorithm

中文摘要：

在分析中文分词算法和交通信息自然语言表达特点基础上，提出了一种自然语言表达交通信息的跨阶匹配分词算法，以适应动态出行信息服务对数字形式结构化实时交通信息的迫切需求。该算法充分考虑了交通信息自然语言描述词库记录长度特点，通过设置对应的中文分词阶数，将传统中文分词的字符串指针1阶跨越方法改进为依词库性质变化的多阶跨越方法，对可能成词的中文字符串进行整体处理，极大地提高了自然语言表达交通信息的实时分词与理解效率。通过与改进MM（maximum matching）算法的实验比较，本方法在理解成功率和容错性相同的情况下，效率比MM分词算法提高了10倍以上。

英文摘要：

A novel cross-step word segmentation algorithm is proposed to process real-time traffic information represented in natural Chinese in this paper, to meet the urgent need of real-time traveling information service, for dynamic traffic information. Considering the record length distribution of the word libraries depicting real-time traffic information, this algorithm sets corresponding steps of word segmentation for address, direction and event libraries, and improves the one step running of the string pointer in classical Chinese word segmentation to flexible multiple steps running, so as to aggregate possible Chinese words efficiently. A case study shows that the proposed algorithm runs 10 times faster than an improved MM algorithm, whilst keeping similar accuracy and robustness. The authors argued that the presented algorithm is greatly helpful to the automatic and intelligent processing of the real-time traffic information, and facilitate the development of travel information services.

同期刊论文项目