随着World Wide Web(WWW)的飞速发展,Deep Web中蕴含了海量的可供访问的信息,并且还在迅速地增长.这些信息要通过查询接口在线访问其后端的Web数据库.尽管丰富的信息蕴藏在Deep Web中,由于Deep Web数据的异构性和动态性,有效地把这些信息加以利用是一件十分挑战性的工作.Deep Web数据集成至今仍然是一个新兴的研究领域,其中包含有若干需要解决的问题.总体来看,在该领域已经开展了大量的研究工作,但各个方面发展并不均衡.文中提出了一个Deep Web数据集成的系统架构,依据这个系统架构对Deep Web数据集成领域中若干关键研究问题的现状进行了回顾总结,并对未来的研究发展方向作了较为深入的探讨分析.
As the rapid development of World Wide Web, there is tremendous information "hiddened" in Deep Web, and its capacity is increasing rapidly. The information can only be accessed by the query interfaces provided by Web database. The data in Deep Web are obtained in the form of dynamic Web pages when users send a query. Due to the poor structure of Web pages and the instability and large scale of Deep Web, it is a very challenging task to integrate the abundant information automatically and use it effectively. Until now, Deep Web data integration has still been a rising research field, and there are a number of challenging issues in it. A great deal of research works is developed in this field, but it is imbalanced on the issues of this field. A framework of Deep Web data integration is proposed in this paper, and the key research works in Deep Web data integration are classified and surveyed according to this framework. At last, the deficiencies in this field are analyzed and the suggestions for future research works are put forward.