讨论了中文文本的地名解析流程,提出基于条件随机场和篇章地名关系的地名识别方法、基于局部模糊匹配的地名标准化方法以及基于认知显著度的地理编码方法,并构建了地名解析原型系统。实验显示,该系统可以获得较为满意的精确率、召回率和F-1值,同时讨论了地名词典的完备性、地名识别精度以及地名语义歧义消除等影响地名解析性能的主要因素。
This paper explores approaches for Toponym resolution in Chinese text,and proposes a geo-parsing approach based on conditional random fields and discourse toponym relations,and a geo-coding approach based on partial fuzzy matching and cognitive salience calculation.The proposed geo-parsing approach deals with the recognition of toponym in three major steps.The experiment shows that the key factors that may influence the performance of toponym resolution in Chinese text are the coverage of gazetteer,the performance of geo-parsing and the performance of semantic disambiguation of toponyms.In our experiment,there are about 17% toponyms can not locate their semantics in the gazetteer.Ambiguity in geo-parsing and geo-coding are the next prominent factors that affect the overall performance of toponym resolution.