二元实体关系元组可以应用到知识库构建,数据挖掘,模式抽取等多个领域.本文利用特定关系的一个元组和一个关键词作为种子,结合多种自然语言处理底层技术,采取改进的模式获取方法和自举迭代策略,提出了一种新的从Web上抽取实体关系元组的方法.基准方法的平均准确率达到了78.12%,采用过滤措施后抽取方法的平均准确率达到了98.42%.实验结果表明,利用网络挖掘方法获取的实体关系元组能够很好满足信息抽取的应用,对抽取出的元组进一步处理,能够获取更多有价值的信息.
Binary entity relationship tuples can be applied in many fields such as knowledge base construction,data mining and pattern extraction and so on.A seed with a tuple and a keyword of a special relation is used to implement the method of extracting entity relation tuples from the web.Multiple Natural Language Processing(NLP)technologies are combined in this method.A novel pattern acquisition method and an improved bootstrapping iteration strategy are adopted to extract tuples.The baseline method achieves to 78.12% of average precision.The method with filtering measure achieves to 98.42%.The experimental results show that it can satisfy information extraction application well and the extracted tuples can derive more valuable information through further processing.