东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于句法语义特征的中文实体关系抽取

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2016.2.16
页码：284-302
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]江西财经大学信息管理学院,南昌330013, [2]数据与知识工程江西省高校重点实验室江西财经大学,南昌330013
相关基金：国家自然科学基金项目（61173146,61562032,61363039,61363010,61462037）;江西省高等学校科技落地计划项目（KJLD12022）：江西省教育厅科技研究项目（GJJ12733,GJJ13249）
相关项目：基于用户反馈的Web数据集成中的数据质量管理

关键词：关系抽取, 关系探测, 句法特征, 语义特征, 支持向量机, relationship extraction, relationship detection, syntactic feature, semantic feature, support vector machine （SVM）

中文摘要：

作为语义网络和本体的基础，实体关系抽取已被广泛应用于信息检索、机器翻译和自动问答系统中.实体关系抽取的核心问题在于实体关系特征的选择和提取.中文长句的句式较复杂，经常包含多个实体的特点以及数据稀疏问题，给中文关系探测和关系抽取任务带了挑战.为了解决上述问题，提出了一种基于句法语义特征的实体关系抽取方法.通过将2个实体各自的依存句法关系进行组合，获取依存句法关系组合特征，利用依存句法分析和词性标注选择最近句法依赖动词特征.将这2个新特征加入到基于特征的关系探测和关系抽取中，使用支持向量机（support vector machine, SVM）方法，以真实旅游领域文本作为语料进行实验.实验表明，从句法和语义上提取的2个特征能够有效地提高实体关系探测和关系抽取的性能，其准确率、召回率和F1值均优于已有方法.此外，最近句法依赖动词特征非常有效，尤其对数据稀疏的关系类型贡献最大，在关系探测和关系抽取上的性能均优于当前经典的基于动词特征方法.

英文摘要：

Named entity relations are a foundation of semantic networks and ontology, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. In named entity relationships, relationship feature selection and extraction are two key issues. Characteristics of Chinese long sentences with complicated sentence patterns and many entities, as well as the data sparse problem, bring challenges for Chinese entity relationship detection and extraction tasks. To deal with above problems, a novel method based on syntactic and semantic features is proposed. The feature of dependency relation composition is obtained through the combination of their respective dependency relations between two entities. And the verb feature with the nearest syntactic dependency is captured from dependency relation and POS （part of speech）. The above features are incorporated into feature-based relationship detection and extraction using SVM. Evaluation on a real text corpus in tourist domain shows above two features from syntactic and semantic aspects can effectively improve the performance of entity relationship detection and extraction, and outperform previously best-reported systems in terms of precision, recall and F1 value. In addition, the verb feature with nearest syntactic dependency achieves high effectiveness for relationship detection and extraction, especially obtaining the most prominent contribution to the performance improvement of data sparse entity relationships, and significantly outperforms the state-of-the-art based on the verb feature.

同期刊论文项目