建立了一个试验用地名库和地理语料库,在此基础上构建对地名用字可信度的统计分析模型。通过分析地名在中文文档中的使用习惯和规律,总结出经常与地名一起使用的且具有地名指示含义的辅助字或词,以此为基础建立地名识别辅助词词库和地名识别的规则库。对地名库和地理语料库的用字进行统计分析,通过设定地名用字可信度概率阈值和辅助词指示作用对文本中潜在地名进行初步的筛选形成候选地名;在粗筛选产生的候选地名基础上结合地名识别规则进一步确认,以提高地名识别的准确率。
A Chinese place names library and geographical corpus library were established, and a statistical analysis model of the word credibility was constructed on the basis of analysis of the habits and patterns of place names in Chinese document. Summary was made that place names was often used in conjunction with the instructions and had the meaning of the place auxiliary word or phrase to form an auxiliary word thesaurus. By setting support statistical model probability threshold indicative of place names in the text preliminary recognition of potential candidates for place names ensured a higher recall rate. After establishing automatic recognition of geographical names rules, further to determine the candidate place names were determined and improved recognition accuracy.