实体抽取在自然语言处理领域中已经相当成熟;随着电子医疗文本急剧增加,医疗实体抽取在医疗领域的应用越来越受到关注.然而,针对医疗领域的专业术语,通用实体抽取方法普遍存在准确率不高的问题.针对药品说明书中的疾病、症状和致病菌,本文采用语言规则的方法,对其进行抽取并评价其准确性.首先,根据已有的术语表分词、词性标注并进行实体抽取;其次,根据语言规则识别医疗实体,从而提高实体抽取的准确率.实验结果显示各类医疗实体抽取的准确率可达80%以上.
The entity extraction has already been quite mature in the area of natural language processing. With the dramatic increase of electronic medical texts,more and more attention have been paid on the applications of medical entity extraction in the medical field. However,for the terminology in the medical field,the accuracy of generic entity extraction is not high. This paper uses the method of linguistic rules to extract diseases,symptoms and pathogens in dispensatory and evaluate the accuracy of the system. According to the existing vocabulary,part of speech tagger will conduct the initial entity extraction. And then,the medical terminology will be enriched by the linguistic rules,so it can further improve the accuracy of the medical entity extraction. The experimental results show that the accuracy of medical entities,such as diseases,symptoms and pathogens,is more than 80% and the approach proposed by this paper is efficient and effective.