"同音同形同类词"是蒙古文词汇的重要组成部分。据统计,"蒙古文同音同形同类词"在静态环境中占词典词条总数的5.1%,动态环境中占语料总词数的11.6%。"蒙古文同音同形同类词"知识库包括"同音同形同类词信息词典"及其管理维护工具、人工识别与标注"同音同形同类词"的"100万词级现代蒙古语文数据库"、"同音同形同类词"的搭配库、共现库、类语库、共现成分的统计工具、"同音同形同类词"的自动识别与标注工具等。本文基于共现库在测试集中自动识别标注了"同音同形同类词",其识别标注召回率为99.8%,准确率为81.7%。
Homographs are a significant component of the Mongolian vocabulary.Mongolian homographs account for 5.1% of the total entries in Mongolian dictionary in static circumstance,and 11.6% in dynamic circumstance.Mongolian homographs data base includes homographs electronic dictionary,Mongolian language database with one million words,homographs’collocation base,co-occurrence base,synonym base,the management and maintenance tool of the homographs electronic dictionary,the statistical tool of the co-occurrence components and an automatic recognition tool of the homographs.In this paper,we implement the homographs automatic recognition and tagging based on the co-occurrence base.The preliminary test shows that the recalling rate is 99.8% with a precision rate of 81.7%.