目前,生物医学文献的数量正以爆炸性的速度增长,这些文献中隐含着大量有用的信息,挖掘这些文献可以形成医学假设。然而,传统的基于简单共现的方法会产生大量的目标词,从而导致准确率下降。本文提出一种新的选取连接词的方法,使用统计特征和文本特征来代替每一个连接词并表示为向量形式,然后把这些词分类为相关和不相关。使用相关的连接词发现目标词,可以提高知识发现的准确率。本文通过Swanson的两组经典实验——雷诺氏病和鱼油、偏头痛和镁,使用有效连接词的比例变化作为依据验证了方法的有效性。最后,本文以H1N1为初始词,进行开放式和闭合式知识发现研究,得到了较好的效果。
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is a lot of useful information undiscovered in these literatures. Researcher can form biomedical hypotheses through mining from these literatures. However, the popular methods based on co-occurrence produce too many target concepts which will lead to the decline of precision. This paper presents a new method for selecting linking concepts. This method uses the statistical and textual features to represent each linking concept and then classifies them as relevant or irrelevant to the starting concepts. The relevant linking concepts are used to discover target concepts. This approach obtains good performance in the experiments of Raynaud Disease and Migraine. Then, we uses H1N1 as starting concept to perform open and closed discovery and good experiments results were obtained.