中医证候数据的处理,是临床研究、数理统计、经验挖掘以及文献检索等过程中的重要一环,但因其存在大量词形不规范现象,因而也往往成为研究过程中的一大瓶颈。研究以“中医医案数据库”5万余条数据为基础,采用改良的证素数据处理方法,对证素出现频次进行统计与分析,补充一定数量的其他证素及其常见异构形式,并探索了中医证候数据的检索方法,在一定程度上实现了基于证候内在含义的跨词形检索。
TCM data processing, is an important part of the clinical research, statistics, experience mining and literature search. But there are a lot of words with irregular shape, therefore,it often brings a major bottleneck in the research process. In this study, based on more than 50,000 pieces of data in" Chinese Medical Records Database", using improved methods of syndrome elements data processing, statistics and analysis of the frequency appear were made,adding a number of other elements and heterogeneous form. and it explored the TCM data retrieval a method,to a certain extent,achieved meaning retrieval of syndrome cross word form.