本文在朴素贝叶斯分类器的基础上,融人了知网语义的元素,构建了一个统计与语义相结合的中文人名识别模型.其基本思想是,首先利用贝叶斯分类器对中国人名进行定位和粗略识别,然后使用知网语义做进一步修正.该模型在继承了贝叶斯算法公式简单和具有一定学习能力的基础上,避免了人名规则的大量使用,同时克服统计方法中人名边界难于界定的问题.实验结果表明,其准确率和召回率分别为95.67%和97.78%.
Chinese name is of highest frequency of unknown words in Chinese articles. The correct fate of Chinese name recognition will affect the application of syntactic analysis, machine translation, information retrieval, extraction, automatic question answering system, and so on. It is the key and difficult point. The difficulty of Chinese name recognition is that it contains large kinds of name without morphological characteristics, also has some uncommon words. Despite these shortage for name recognition, there is the relative independence between characters except a small number of characters could be word. Thus feature is well in line with the Naive Bayes. In fact, the Bayesian classifier has good recognition results. But in the complex context, the recognition is not satisfactory for applications. The reason is that it is difficult to define the boundary of the names. It is easy to cause the boundary error. To solve this problem, this paper constructs a Chinese name recognition model combining HowNet with Bayesian classifier. The basic idea is to locate and recognize the Chinese name roughly by Bayesian classifier, and then to fix this name by using HowNet. The model not only has the advantages of simple formula and ability to learn, but also overcomes the extensive use of rules and the difficulty of boundary defining. Experimental results show that the precision and recall rates were 95.67% and 97.78%, respectively.