结合大数据的特点,提出以标签云改进方案来快速识别网络热搜词,同时考虑到传统的数据仓库在查询、存储结构化数据方面的优势,在目前学者提出的数据仓库与Hadoop平台结合的基础上,提出了协作模式中与以往不同的数据迁移方式,即使用数据中间件,并通过相同数量记录导入Hadoop的时间比较,得出文中所提的数据迁移方法较Sqoop方法更具优势的结论。
Based on the characteristics of big data,the paper proposes quickly recognizing top search queries by the tag cloud. It also introduces the advantages of the traditional data warehouse in query and storage structure,and puts forward a different method of data transfer from the traditional ones in the collaboration mode,which is based on the combination of data warehouse and Hadoop platform. The data middle ware is used,and a comparison between the time taken for the same quantity of records to be introduced into Hadoop shows that the method proposed is superior to Sqoop.