在因特网上,每个主题往往拥有成百上千个相关的深网数据源,从众多的数据源中找到合适的源进行数据集成变得越来越重要.传统的考虑源质量的选择方法是不考虑源主题特性的,而是根据经验选取统一的质量维度,因而在不同主题下选择准确性有较大的差异.基于此,提出基于用户反馈的深网源选择方法,依据用户反馈获取特定主题源的核心质量维度从而建立质量评价模型.选取了三个不同主题下的数据源进行了相关的验证,实验结果表明,针对不同主题下的数据源选取,该方法均具有较高的准确性且计算量较少.
There are hundreds or thousands of Web data sources providing data of relevance to a particular domain on the Web,so how to find a suitable result quickly to integrate from a number of sources is becoming more and more important.Traditional data sources selection methods based on source quality do not take quality characteristics of different data sources under the specific domain into account,but selecting the same quality dimensions for quality evaluation models.Therefore,accuracy of traditional methods under different domains are quite different.In light of this,we propose a Deep Web source selection method based on user feedback,we gain accurate core quality dimensions of a particular domain base on user feedback,then achieve a quality estimation model for the data sources.In experiment,we choose many data sources in three domains to carry test.The experiment result shows that it is of good accuracy and computational efficiency to choose data sources for different domains.