深网来源包含一大高质量;质问相关的结构化的日期。在深网的挑战之一正在提取深网来源的结果纲要。探讨这挑战,这篇论文描述提取两个结果数据的一条新奇途径;一个网数据库的结果纲要。途径首先为深网来源的询问接口建模;用填它一明确地质问例子。然后,深网来源的结果页在树结构被格式化检索包含质问例子的元素的子树。下次,深网来源的结果纲要被与质问例子匹配 subtree' 节点提取,在哪个,一个二阶段的纲要抽取方法为获得更多的精确结果纲要被采用。最后,真实的深网来源的实验显示出我们的途径的用途,它提供高精确;召回。
Deep Web sources contain a large of high-quality and query-related structured date. One of the challenges in the Deep Web is extracting result schemas of Deep Web sources. To address this challenge, this paper describes a novel approach that extracts both result data and the result schema of a Web database. The approach first models the query interface of a Deep Web source and fills in it with a specifically query instance. Then the result pages of the Deep Web sources are formatted in the tree structure to retrieve subtrees that contain elements of the query instance, Next, result schema of the Deep Web source is extracted by matching the subtree' nodes with the query instance, in which, a two-phase schema extraction method is adopted for obtaining more accurate result schema. Finally, experiments on real Deep Web sources show the utility of our approach, which provides a high precision and recall.