在概念实例和属性的提取研究中,针对基于模式的方法召回率比较低的特点,该文提出了一种基于并列结构的概念实例和属性的同步提取方法。首先利用并列结构模式去网页集合中提取同类词语集合,然后再用基于种子的弱指导方法去学习实例和属性共现的上下文模式,最后再通过模式去提取候选实例或候选属性。在此过程中,每提取出一个候选,就将该候选所在的同类词语集合合并到候选集合中。实验结果表明,该文的方法在不降低准确率的基础上,能大大提高提取结果的召回率。
Most researches on concept instances and concept attributes extraction focuses on pattern-based approaches,which usually suffer from a low recall rate.In this paper,we present a method of extracting concept instances and concept attributes based on the coordinate structure.Since a part of candidate instances and attributes extracted by the coordination patterns can be putted into the similar-concept-phrases sets in advance,we can use these similar-concept-phrases sets to expand the extraction results in the procedure of co-occurrence pattern-based extraction.Compared with the baseline without using the coordination patterns,experimental results show that the coverage of this method is significantly improved without reducing the precision.