本文针对难度最大的两类命名实体(地名和机构名)在条件随机场框架下首次引入了小规模的常用尾字特征。实验表明,该特征与词类特征具有一定的互补性,联合使用可以以较小的训练代价显著提高专有名词的识别性能,特别是机构名的识别精度。该系统在我国863简体命名实体识别评测语料上专名(人名、地名和机构名)总体F1值达88.76%,超过当年最佳系统8.63个百分点。在SIGHAN2006命名实体识别语料上的结果也居于前列。
We propose small-scale-hint-character-list (SSHCL) features for location and organization names under the conditional random fields framework. As experiments show, SSHCL features provide significant gains in precision, especially for organization names,showing complementary property to part-of-speech.It also lowers construction and training cost greatly that a common large scale feature set demands. The overall proper nouns F1 measurement of integrated system on simple Chinese 863 program 2004 NER corpora reaches 88.76 %, gaining 8.63 % improvement over the best system in the evaluation. The performance on SIGHAN 2006 is also remarkable.