东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

生物文本中蛋白质名称的识别

期刊名称：计算机应用研究, 核心期刊
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]哈尔滨工业大学计算机与技术学院,黑龙江哈尔滨150001
相关基金：国家自然科学基金资助项目（60302021）
相关项目：面向特定领域基于Ontology的跨语言信息检索技术研究

关键词：生物信息, 命名实体识别, 机器学习, 特征选择, Bioinformatics, Name Entity Recognition, Machine Learning, Feature Selection

中文摘要：

随着基因和蛋白质序列的发布和分子生物学研究的发展，其相关的数据呈指数级增长，因此如何从海量的相关文献中直接获取生物学家研究领域的相关信息变得迫在眉睫，识别生物文献中的命名实体如蛋白质、基因、脱氧核糖核酸名称等成为生物信息学中信息抽取的最基本任务。介绍了国际同类研究中生物命名实体识别的各种方法。重点介绍了蛋白质名称识别的相关方法、所用资源、实验结果及与国际同类研究的比较结果。

英文摘要：

The genome sequence has ushered in a new era of rapid and exponential growth of data related to the biology community. Thus, there is a clear need in this area for automatic methods of extracting specific information directly relating to the interests of biology researchers. Name Entity（NE） such as protein, gene, DNA, etc. recognized from biological literature is a fundamental task in information extraction of bioinformatics. This paper introduces various methods of biological name entity recognition in international research on this area. Then the methods are presented with the relevant corpus and experiment resuits for protein name recognition. The promising results are gotten compared with the other state-of-the-art research.

同期刊论文项目