结合HL7(Health Level Seven)标准的数据存储特点对目前电子病历的内容和结构进行了深入分析,提出了医疗信息五元组模式,以及更为细化的二元组和语义类描述,并在此基础上提出了模式泛化、模式获取、医疗信息自动抽取等一系列算法.通过实际312份住院病历数据下的实验表明,系统在查准率与查全率方面,获得了较好的结果,而且由于有自动学习的特性,随着训练语料的增加,系统的整体性能表现将更加优异.
We analyzed the contents and structure of current electronics medical records, and proposed a definition of Five-Tuples pattern and another more fine-grained definition of two-turples pattern and semantic clas- ses. On this foundation, we proposed a series of algorithms including patterns generalization, patterns automatic extraction and medical information extraction. The experiments with 312 actual medical records show that the system performs well both in the precision and recall. And because of the functionality of self-learning, the system will be more outstanding with an increase in the training corpus.