作为语义网络和本体的基础,实体关系抽取已被广泛应用于信息检索、机器翻译和自动问答系统中.实体关系抽取的核心问题在于实体关系特征的选择和提取.中文长句的句式较复杂,经常包含多个实体的特点以及数据稀疏问题,给中文关系探测和关系抽取任务带了挑战.为了解决上述问题,提出了一种基于句法语义特征的实体关系抽取方法.通过将2个实体各自的依存句法关系进行组合,获取依存句法关系组合特征,利用依存句法分析和词性标注选择最近句法依赖动词特征.将这2个新特征加入到基于特征的关系探测和关系抽取中,使用支持向量机(support vector machine, SVM)方法,以真实旅游领域文本作为语料进行实验.实验表明,从句法和语义上提取的2个特征能够有效地提高实体关系探测和关系抽取的性能,其准确率、召回率和F1值均优于已有方法.此外,最近句法依赖动词特征非常有效,尤其对数据稀疏的关系类型贡献最大,在关系探测和关系抽取上的性能均优于当前经典的基于动词特征方法.
Named entity relations are a foundation of semantic networks and ontology, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. In named entity relationships, relationship feature selection and extraction are two key issues. Characteristics of Chinese long sentences with complicated sentence patterns and many entities, as well as the data sparse problem, bring challenges for Chinese entity relationship detection and extraction tasks. To deal with above problems, a novel method based on syntactic and semantic features is proposed. The feature of dependency relation composition is obtained through the combination of their respective dependency relations between two entities. And the verb feature with the nearest syntactic dependency is captured from dependency relation and POS (part of speech). The above features are incorporated into feature-based relationship detection and extraction using SVM. Evaluation on a real text corpus in tourist domain shows above two features from syntactic and semantic aspects can effectively improve the performance of entity relationship detection and extraction, and outperform previously best-reported systems in terms of precision, recall and F1 value. In addition, the verb feature with nearest syntactic dependency achieves high effectiveness for relationship detection and extraction, especially obtaining the most prominent contribution to the performance improvement of data sparse entity relationships, and significantly outperforms the state-of-the-art based on the verb feature.