对中文电子病历中的否定术语进行检测,可以为非结构化的电子病历文本的概念索引的建立提供依据。对于电子病历中术语的提取,在经典的正向最大匹配算法的基础上,结合互信息,可以有效地避免覆盖性歧义对提取结果的影响;对于否定语义的确定,在基于规则算法的基础上,结合词共现率模型,有效地降低了由于标点录入错误而出现假阳性术语的概率。通过实验表明,本文提出的方法相对于传统的基于规则的算法,阴性结果的预测值提高了6.85%。
The method for detecting the negative terms in Chinese electronic medical record(EMR)is useful in providing evidence for constructing concept index.In this respect,we adopted an improved method which combined maximum matching with mutual information in order to extract terms in EMRs.This method can overcome the influence of overlay ambiguity.In addition,for the determination of negative semantic,we also adopted an improved method which combined rule-based method with word co-occurrence.This new method can reduce the probability of appearance of false positive terms caused by punctuation input errors.The result showed that the negative predictive value is 7.85% higher than the rule-based method.