癌症的发生发展与机体内基因的改变有密切联系,在临床上表现为症状或检测指标的异常.通过挖掘分析临床表现与基因改变之间的关系,可为癌症早期诊断和精准治疗提供临床决策支持.从文献数据出发,利用结论性数据挖掘基因与临床表现的关系具有重要意义.本文提出一种基于医学主题词(MedicalSubjectHeadings,MESH)的生物医学实体关系挖掘方法.该方法利用PubMed中提供的文献信息,借用向量空间模型思想,使用MeSH主题词矢量表达待研究实体,引入文献相互引用因素对结果进行修正,将关系挖掘转化为矢量间的数学运算,实现定量分析.本文将该方法应用于结直肠癌临床表现和基因关系的研究中,得到与结直肠癌相关的203个基因和对应的临床.基因462个关系.通过结合使用基因功能和通路分析工具g:Profiler和KEGG等,对结果进行分析验证.结果表明,基于MeSH主题词的文献挖掘方法,避免传统“共现”方法对发现潜在关系的限制和复杂语义分析带来的大量计算,为生物实体之间潜在关系的挖掘提供一种新的思路和方法.
The causes and progressions of cancers have close associations with the mutations of genes in our body, which lead to abnormal symptoms and detection indicators. Therefore, providing clinical decision support for early diagnosis and precise treatment of cancers is very urgent and necessary, which can be achieved by mining the associations between genes and clinical behaviors from conclusive biomedical literature data. A MeSH-based (Medical Subject Headings, MESH) method was proposed for biomedical objects association mining in this paper. By using MeSH (which is provided in PubMed) to represent each object as a vector in the Vector Space Model and taking the citations between articles into consideration, we translated the associations mining into mathematical operating successfully. We finally obtained 203 genes and 462 associations related to colorectal cancer (CRC) after applying our method in the associations mining between genes and clinical behaviors of CRC. In order to analyze and verify the mining results, some bioinformatics tools, such as g:Profiler and KEGG were used for functions and pathway analysis of genes. The results show that this MeSH-based method works robust in the association mining. Besides removing the restriction of co-occurrence for the indirect associations mining, our proposed method also avoid complex grammatical analysis which lead to massive calculation.