针对目前中文医疗机构名识别问题,提出一种基于层叠条件随机场模型的中文医疗机构名识别方法;该方法第一层条件随机场(CRF)模型基于词粒度,结合自定义词典,实现人名、地名以及简单机构名识别,将最终的结果传递到第二层CRF模型;第二层CRF模型通过词性、词界以及上下文等特征最终完成对复合嵌套的医疗机构名实体的识别。结果表明:在封闭实验中,该方法识别正确率达到94.6%,召回率达到96.2%;在开放实验中,该方法识别正确率达到92.3%,召回率达到90.2%。本文模型相比于结合规则的单层CRF模型,F值分别提高1.99%、2.8%,总体结果得到显著改善。
A method based on the cascade conditional random field model was proposed to solve the current problem of Chinese medical institution name recognition. The first layer of this method was about the random field conditional (CRF) model, it combined word size and a custom dictionary to recognize person names, place names and names of simple organizations, and then sent the results to the second layer of CRF model. The second layer of CRF model completed the entity recognition of the names of compositely nested medical institutions by the feature of speech, word boundary and context. In closed experiment, the correct recognition rate is 92.3 % and the recall rate is 90.2%. In open experiment, the correct recognition rate is 92.3% and the recall rate is 90.2%. Compared with the single layer CRF model combined with rules, the F-measure is increased by 1.99% and 2.8% respectively. The overall results are significantly improved.