蒙古语在命名实体识别方面开展过人名的识别,但在地名的识别方面还没有开展相应的研究。首次实现了基于条件随机场模型的蒙古文地名识别。首先从蒙古语黏着性特点分析入手,研究了蒙古语语料库中地名的存在形式以及各类地名的特点,针对蒙古语语料库中地名的特点,在词汇特征、指示词特征、特征词特征等特征基础上引入了词性特征。之后通过地名词典补召了未识别的地名。以内蒙古大学开发的100万词规模的标注语料库为训练数据,该模型的地名识别性能达到了94.68%的准确率、84.40%的召回率和89.24%的F值。
This is the first realization of Mongolian geographical names recognition based on condi- tional random fields. First we analyze the existing forms and characteristics of the geographical names in the corpus from the aspect of Mongolian adhesion characteristic. In addition to designation words and the part of speech, lexical features are also introduced as the location feature of geographical names. Then unrecognized names are called by location dictionaries. Taking the 3rd-level annotated corpus with about 1000,000 words as the training data, the proposed model achieves an accuracy of 94.68%, a recall rate of 84.40%, and a F score of 89.24%.