分析了云键—值数据库的特点,提出了基于云键—值模型的信息集成模型。根据该模型提出了并行化数据集成方法。首先根据数据源的依赖关系确定集成活动的优先级;然后根据该优先级结合MapReduce算法实现数据的并行集成;最后通过实验结果表明,提出的方法能更好地支持云数据仓库中的数据集成。
This paper analyzed the characteristics of key-value data model, and proposed an information integration model based on key-value data model. It designed a parallel data integration method based on the model above. It determined the priority of integrated activities according to dependency relationship of data sources, and integrated parallel data using the priority and MapReduee algorithm. Experimental results show that the proposed method can better support cloud data integration in a data warehouse.