位置:成果数据库 > 期刊 > 期刊详情页
基于知网的贝叶斯中文人名识别
  • ISSN号:0469-5097
  • 期刊名称:南京大学学报(自然科学版)
  • 时间:2012
  • 页码:147-153
  • 分类:TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]合肥工业大学计算机与信息学院,合肥230009
  • 相关基金:国家自然科学基金(61070131,61175051),国家重点基础研究发展计划(973项目)(2009CB326203)
  • 相关项目:动态环境下复杂系统因果关系发现与稳健性推理的研究
中文摘要:

本文在朴素贝叶斯分类器的基础上,融人了知网语义的元素,构建了一个统计与语义相结合的中文人名识别模型.其基本思想是,首先利用贝叶斯分类器对中国人名进行定位和粗略识别,然后使用知网语义做进一步修正.该模型在继承了贝叶斯算法公式简单和具有一定学习能力的基础上,避免了人名规则的大量使用,同时克服统计方法中人名边界难于界定的问题.实验结果表明,其准确率和召回率分别为95.67%和97.78%.

英文摘要:

Chinese name is of highest frequency of unknown words in Chinese articles. The correct fate of Chinese name recognition will affect the application of syntactic analysis, machine translation, information retrieval, extraction, automatic question answering system, and so on. It is the key and difficult point. The difficulty of Chinese name recognition is that it contains large kinds of name without morphological characteristics, also has some uncommon words. Despite these shortage for name recognition, there is the relative independence between characters except a small number of characters could be word. Thus feature is well in line with the Naive Bayes. In fact, the Bayesian classifier has good recognition results. But in the complex context, the recognition is not satisfactory for applications. The reason is that it is difficult to define the boundary of the names. It is easy to cause the boundary error. To solve this problem, this paper constructs a Chinese name recognition model combining HowNet with Bayesian classifier. The basic idea is to locate and recognize the Chinese name roughly by Bayesian classifier, and then to fix this name by using HowNet. The model not only has the advantages of simple formula and ability to learn, but also overcomes the extensive use of rules and the difficulty of boundary defining. Experimental results show that the precision and recall rates were 95.67% and 97.78%, respectively.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《南京大学学报:自然科学版》
  • 中国科技核心期刊
  • 主管单位:中华人民共和国教育部
  • 主办单位:南京大学
  • 主编:龚昌德
  • 地址:南京汉口路22号南京大学(自然科学版)编辑部
  • 邮编:210093
  • 邮箱:xbnse@netra.nju.edu.cn
  • 电话:025-83592704
  • 国际标准刊号:ISSN:0469-5097
  • 国内统一刊号:ISSN:32-1169/N
  • 邮发代号:28-25
  • 获奖情况:
  • 中国自然科学核心期刊,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 美国化学文摘(网络版),美国数学评论(网络版),德国数学文摘,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:9316