根据中国人名和外国人名的构成特点产生潜在中国人名和外国人名,然后把它们作为节点词加入到句子的分词有向图中,利用上下文信息对有向图的边赋值.使有向图最短路径对应句子正确切分.在确定句子正确切分时识别出句子中的外国人名和中国人名,该方法可以避免由分词结果造成的人名不能被召回的现象,提高了人名识别的召回率.通过对真实语料的测试,在封闭测试中该方法对中国人名和外国人名识别的综合指标F值为97.30%.
Foreign person name (FP name) and Chinese person name (CP-name) candidates are generated according to their inherent characteristics. Then add all candidates into the segmentation digraph of a sentence as vertices and assign a weight to each edge of the digraph with statistics derived from the training corpus. Thus the shortest path of the digraph is exactly the correct segmentation of the sentence. When select the correct segmentation of the sentence, FP-names and CP-names can be recognized. The proposed method can avoid person name errors brought up by segmentation. The experimental result shows the F value is 97.30% in close test.