生物实体名识别对生物医学文献的信息抽取有重要的意义。本文针对如何识别蛋白质名进行了有益的尝试,主要采用了基于词典的方法,其中运用了近似搭配算法和首词查询的方法进行蛋白质名识别,同时结合机器学习方法训练了一个分类器来过滤候选词以提高识别的准确率。
Identification of biomedical entities is one of important techniques to extract information from biomedical documents. This paper proposes an effective model based on dictionary to identify protein names. The approximate string searching method and first name searching are used to identify the candidate protein names, and a Naieve Bayes classifier filtering the candidates is applied to improve the accuracy.