提出利用基于图的半监督学习算法,即标注传递算法,指导计算机从非结构化的文本中自动识别出实体之间的关系.该方法首先利用图策略来建立关系抽取的模型.在这个图模型中,各个有标签和未标签的样本被表示成图上的各个节点,而样本间的距离则作为图上各边的权重.然后,关系抽取的任务就转化成在这个图上估计出一个满足全局一致性假设的标注函数.通过对ACE(automatic content extraction)语料库的评测,结果显示,当只有少量的标签样本时,采用该标注传递的方法可以获得比基于SVM(support vector machine)的有监督关系抽取更好的性能。同时也明显优于基于Bootstrapping的半监督关系抽取的方法.
This paper investigates a graph-based semi-supervised learning algorithm, that is, label propagation algorithm for relation extraction. Labeled and unlabeled examples are represented as the nodes, and their distances as the weights of edges in the graph. The relation extraction tries to obtain a labeling function on this graph to satisfy the global consistency assumption. Experimental results on the ACE (automatic content extraction) corpus showed that this method achieves a better performance than SVM (support vector machine) when only very few labeled examples are available, and it also performs better than bootstrapping for the relation extraction task.