东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于条件随机场的中文地名识别方法

ISSN号：1671-8860
期刊名称：《武汉大学学报：信息科学版》
时间：0
分类：P208[天文地球—地图制图学与地理信息工程;天文地球—测绘科学与技术] TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：北京大学遥感与地理信息系统研究所,北京100871
相关基金：国家自然科学基金（41271385）; 测绘遥感信息工程国家重点实验室开放研究基金（（16）重02）

关键词：地名识别, 条件随机场, 自然语言处理, 中文地名, toponym recognition, conditional random field, natural language processing, Chinese toponym

中文摘要：

在互联网迅速发展的现代化信息社会,大量地理信息都以非结构化的文本形式存在,而地名识别是挖掘这些地理信息的重要基础。目前已有的地名识别方法主要是从自然语言处理的角度来实现,并没有充分考虑到地名的构成和使用习惯等特征,造成识别率偏低或过拟合等问题。本文引入语言学相关知识,分析中文地名用字特征,在传统的地名专名＋通名的结构上,更细致地划分地名的词素类型,总结归纳各词素类型的特征,将这些特征融入条件随机场的方法中,使地名识别问题转化为序列标注问题。并根据中文地名的特征,制定形式化规则,设计基于字的标注规范。在此基础上,设计中文地名特征模板,通过条件随机场模型训练和预测,识别自然语言文本中的中文地名。采用170万字的人民日报标注语料进行实验验证,结果表明本文方法对中文地名识别的召回率、准确率和F值分别达到92.69%、96.73%和94.67%,优于已有研究成果,能为地理信息科学领域的研究和应用提供更有效的地名服务。

英文摘要：

With the rapid development of the World Wide Web,a huge quantity of geographic information resources are hidden as unstructured texts.Toponym recognition is the foundation of mining the potential geographic information from these texts.In traditional toponym recognition methods based on the natural language processing,the structure of Chinese toponym and features of user customs are ignored,which results in the low recall and precision.In this paper,linguistic knowledge is introduced to analyze Chinese toponym,and the more specific morpheme categories are recognized.Then the process of toponym recognition is transformed into an equivalent sequence labeling problem based on the conditional random field.A proper labeling schema for Chinese toponym is also designed to improve the recognition accuracy.In the experiments,the 1.7 million tagged corpus of The People＇s Daily are used to test the proposed method.The recall,precision and F value of the result are92.69%,96.73% and 94.67%respectively,which are better than other machine learning models.It is proven that the proposed method is effective to recognize Chinese toponym.This research can provide more precise Toponym services for geographic information applications.

同期刊论文项目

定性地理信息检索的模型与方法

期刊论文 5 专利 1

同项目期刊论文

定性地理信息检索方法及其实现

大数据驱动的人类移动模式和模型研究

基于Storm的地理编码引擎

基于链接分析的网页文本核心地名提取方法

期刊信息

《武汉大学学报：信息科学版》
中国科技核心期刊

主管单位:国家教育部
主办单位:武汉大学
主编：刘经南
地址：湖北武汉珞珈山
邮编：430072
邮箱：whuxxb@vip.163
电话：027-68778045

国际标准刊号：ISSN：1671-8860
国内统一刊号：ISSN：42-1676/TN
邮发代号:38-317

获奖情况:
全国优秀科技期刊,全国优秀高校自然科学学报一等奖,湖北省优秀期刊称号

国内外数据库收录:
俄罗斯文摘杂志,荷兰地学数据库,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）

被引量:24217