中文姓名识别是信息抽取的一个重要研究内容,同时也对自然语言处理的其他应用具有重要意义.本文针对中文姓名构成的一般规律和特点,提出了一种姓氏用字驱动的统计与规则相结合的混合中文姓名识别算法,该算法以姓氏用字为线索,通过对前后文中汉字串成词的可能性的评价来进行姓名识别.对所提出的算法用144 K文本进行了实验测试,验证了它的有效性.
Chinese name recognition is an important research topic in information extraction area, and it is also pivotal for other applications of natural language processing. Aimed at general laws and characters of Chinese name composition, the paper proposes a hybrid Chinese name recognition algorithm based on family name driv- en approach, which combines statistical method with rule approach to detect a name appearance in a Chineselanguage text. The algorithm uses family name as clue and recognizes a Chinese name through evaluating probability that the context Chinese string is a word. Effectivity of the algorithm was validated by experiments on a corpus of size 144 K.