中医药语言系统是以中医药学科为主导。结构和功能与UMLS相类似的中医药学及其相关学科的语言系统平台。自2002年开始研制以来,目前已初具规模,但是系统中的数据离实际应用还有很大差距。本部分的研究工作主要从语言类型和语义关系这两个角度出发,对系统中的低质量数据进行数据清洗,提出了基于语义网络的数据清洗策略;目的在于逐步完善语言系统的数据质量,使得中医药语言系统能够早日投入到科研和临床应用中。
Traditional Chinese Medicine Language System (TCMLS) is a language system platform which is similar to the UMLS structurally and functionally. Since developed from 2002, it has now begun to take shape. But the present problem is that the data quality of system is very low, and still has a long way from practical application. This part of the research is mainly did data cleaning toward the low-quality data from the perspective of the semantic type and semantic relationship. This paper proposed a data cleaning strategy that based on semantic network. The purpose of our work is to improve the data quality gradually and put TCMLS into research and clinical application as early as possible.