词频反文档频率是文档特征权值表示的常用方法,用以评估单词对于语料库中的其中一份文件的重要程度.通过合理映射病例库与语料库的词汇关系,对词频反文档频率模型进行改进,并将改进后的词频反文档频率模型应用到针灸处方疾病症状权重挖掘中,计算出疾病所对应的各种症状权重.实验统计了106种疾病,其中与临床诊疗经验相符合的有84种,准确率达79.2%,实验表明改进的模型能得到较好的疾病症状权重鉴别效果.
Term frequency-inverse document frequency is commonly used to calculate document feature weight values, such as to evaluate the importance of words in one document of the corpus. This paper improves the TF-IDF model by reasonably mapping the words relationship between case library and the corpus. It also applies the improved MAPTF-IDF model to acupuncture and moxibustion prescription to calculate the TCM symptom weights. The exper- iment analyzes 106 types of diseases; in which 84 types of diseases are consistent with clinical diagnosis. The accuracy rate is 79.2 %. Experiment results show that the improved model can achieve better identification effect.