在Web数据集成的过程中,如何从大量的web数据源集合中选择合适数量的数据源,使得在满足特定查询需求的前提下尽可能地减少所需访问的数据源数量,同时保持返回数据结果的高质量,成为Web数据集成中的一个热点问题.以近十几年的研究实践为背景,介绍Web数据源选择的研究沿革及现状,并对Web数据源选择方法进行了归类.分别讨论了基于相关性的和基于质量的数据源选择的研究动机、研究方法和研究成果等,并对相关研究的目标、关键技术、优点和缺点进行了对比分析:最后展望了Web数据源选择未来的研究方向.
In Web data integration, selecting data from a Web data source collection such that the specific query intents are sal while the number of accesses to data sources is minimized and the quality of returned results are guaranteed is a popular topic. I paper, using the researches and practices in recent ten years as the background, the study focuses on the evolution and presents resea the area of Web data source selection and classifies Web data source selection methods. In addition, the paper discusses the re,, motivations, methods and results of relevance-based data source selection and quality-based data source selection. Moreover, the introduces the correlation research results and analyzes their destinations, key techniques, merits and demerits. Finally, some dire for future research are put forward.