论文设计并实现了一个集群式ETL(Extract-Transform-Load)任务处理平台,该平台支持异构数据源的多源数据集,提出了基于预测时间调度算法。该算法对任务进行调度优化,以提高数据抽取、转换和加载等任务的执行效率。基于实际应用的实验结果证明,集群式ETL任务调度技术对于减少多个ETL任务并行执行的总时间,提升ETL任务的执行效率具有良好的效果。
This paper designs and realizes a clustered ETL(Extract-Transform-Load)task processing platform.The platform supports multi-source data integration of heterogeneous data sources,and proposes the scheduling algorithm based on the predicating time.The algorithm realizes task scheduling optimization,which mainly improving the execution efficiency of data extracting,transforming and loading etc.Based on the experimental results of actual applications,it is known that the cluster ETL task scheduling technology has a great effect for reducing the multiple ETL task parallel execution time and improving ETL task execution efficiency.