生物医学研究是二十一世纪最受关注的研究领域之一,该领域发表了巨量的研究论文,已经达到年平均60万篇以上。如何在规模巨大的研究文献中有效地获取相关知识,是该领域研究者所面临的挑战。作为生物信息学分支之一的生物医学文本挖掘技术就是一项高效自动地获取相关知识的新探索,近年来取得了较大进展。这篇综述介绍了生物医学文本挖掘的主要研究方法和成果,即基于机器学习方法的生物医学命名实体识别、缩写词和同义词的识别、命名实体关系抽取,以及相关资源建设、相关评测会议和学术会议等。此外还简要介绍了国内研究现状,最后对该领域近期发展作了展望。
21^st century is the era of biology and there are more than 6 hundred thousand academic papers published annually in this field. The challenge to researchers is how to automatically and effectively acquire relevant knowledge from huge size of biomedical literature. To address this issue, the biomedical text mining has become a new branch of bioinformatics and made great progress.. This survey introduces main approaches and relevant achievements in this research, including machine learning methods to named entity recognition, abbreviation and synonym recognition, relation extraction, as well as relevant resource constructions, international evaluations and academic gatherings, Some domestic researches are briefly described and, finally, prospective developments in the near future are anticipated.