针对中医领域,提出了一种基于条件随机场的术语抽取方法,该方法将中医领域术语抽取看作一个序列标注问题,将中医领域术语分布的特征量化作为训练的特征,利用CRF工具包训练出一个领域术语模型,然后利用该模型进行术语抽取。选择《名医类案》作为中医领域文本进行术语抽取实验,取得了较好的效果,准确率为83.11%,召回率为81.04%,F-值为82.06%。
This paper introduces a Conditional Random Fields (CRF) based method for term extraction in Traditional Chinese Medical(TCM). This method, taking the field term extraction as an issue of sequence marking, quantitates the characters of field term distribution as the training characters, leverages the CRF toolkit to generate a field term model and uses the model for field term extraction. With Classified Medical Records of Distinguished Physicians as its test materials, the experiment results in 83. 11% precision rate, 81.04% recall rate, and 82.06% F-measure.