在这份报纸,我们为能在文本印射每混乱提及到一个唯一的统一医药语言系统(UMLS ) 的正规化任务建议一条多重特征途径概念唯一的标识符(CUI ) 。我们开发一个二拍子的圆舞方法用 UMLS API 获得候选人 CUI 和他们的联系比较喜欢的名字的一张表并且由计算输入混乱提及和每个候选人的类似选择最靠近的 CUI。类似计算步作为一个分类问题和多重特征(绳特征,评价特征,类似特征,和上下文的特征) 被提出被用来使混乱提及正常化。结果证明多重特征途径与 MetaMap 基线相比从 32.99% ~ 67.08% 改进正规化任务的精确性。
In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline.