位置:成果数据库 > 期刊 > 期刊详情页
采用术语定义模式和多特征的新术语及定义识别方法
  • ISSN号:1000-1239
  • 期刊名称:《计算机研究与发展》
  • 时间:0
  • 分类:TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]北京语言大学语言信息处理研究所,北京100083
  • 相关基金:国家“八六三”高技术研究发展计划基金项目(2006AA010101);国家自然科学基金项目(60572158)
中文摘要:

新术语及其定义抽取是信息抽取的重要研究内容之一.研究结果表明,在科技文献中,一个新术语往往伴随其定义出现,通过考察,在真实文本中,术语定义存在显著的语言表述特征,从大规模真实语料库中,通过考察术语定义构成的语言学模式、定义中词汇和术语周边的统计特征,提出了以术语定义的语言学模式(LPTD)作为待识别候选新术语集,同时考虑到有关新术语出现的上下文统计特征,用SVM分类器方法完成科技语料中新术语及其定义的识别.在大规模科技期刊上进行方法验证,开放性评测结果的精确率为90.5%、召回率达78.1%.

英文摘要:

identification of technical new term and its definition is an important research topic information extraction. It is still a great challenge to provide a scalable solution for large-scale terms extraction, because most previous approaches fail to explicitly define the linguistic constituent of terms and the function of their definition patterns. The authors' research shows that the occurrences of technical new terms in most cases are accompanied with their definition descriptions in the real corpus. Based on this intuition, the linguistic constituent of technical terms and the numerical function of their definitions are defined explicitly. Also presented is a novel statistical approach based on linguistic pattern of terminology definition (LPTD) to extract Chinese lechnical new terms and their definitions. LPTD in this paper is first proposed to delimit the boundary of technical terms. In the identification phase, both statistical information of terms and LPTD features obtained from the previous filtering process are taken into account in the SVM classifier. They are integrated into one unified framework. The idea in this paper can also be used for reference in collocation extraction (CE) and be easily extended to other different languages. Compared with the previously reported outcomes, this approach achieves a competitive result in real large-scale corpora at 90.5 % in precision and 78.1% in recall.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349