针对在Deep Web数据库查询结果存在上限k的情况下,对于如何提取数据记录的问题,提出一种基于范围型属性的数据提取方法。利用范围型属性的值域特征,将其值域按照目标数据库的分布样本划分为多个子区间。实验结果表明,按照该方法划分的子区间,使得查询收益,查询饱和度和已提取数据的覆盖程度等指标均达到98.50%以上。
This paper presents a range property-based data extraction method aiming at the problem of how to extract data records in condition that the query result of Deep Web database has upper limit k.Making use of numerical field feature of the range property,we divide its numerical field into several subintervals according to the distribution sample of object database.The results of experiments show that the subinterval partitioned in this method enables the indexes including query gains,query saturation and the coverage degree of extracted data,etc.,all reach 98.5% and higher.