从大规模非结构化文本中自动地抽取有用信息是自然语言处理和人工智能的一个重要目标.开放式信息抽取在高效挖掘网络文本信息方面已成为必然趋势,按关系参数可分为二元、多元实体关系抽取,该文按此路线对典型方法的现状和存在问题进行分析与总结.目前多数开放式实体关系抽取仍是浅层语义处理,对隐含关系抽取很少涉及.采用马尔科夫逻辑、本体结构推理等联合推理方法可综合多种特征,有效推断细微完整信息,为深入理解文本打开新局面.
Extracting useful information automatically from large-scale unstructured texts has been a long-standing goal of NLP and AI. And open information extraction is now widely pursued for effective web information acquisition. Open information extraction can be divided into dual and n-tuple entity relation extraction according to the number of arguments involved. In accordance with these two aspects, this paper analyses several typical methods for open relation extraction together with their defects. It is indicated that most current methods still belong to shallow semantic processing, hardly considering the implicit relation. Therefore, it is beleved that the adoption of joint inference strategy such as the markov logic and the ontology structure based inference can take advantage of multiple features. The combination of open and open up a promising prospect to infer the fine and full information for open information extraction.