ETL(Extract-Transform-Load)是数据仓库获得高质量数据的重要环节,一个设计良好、功能强大的ETL工具对于构建一个数据质量、结构良好的数据仓库有着重要意义。首先分析了传统ETL架构的局限性以及元数据管理对ETL过程的重要性,然后对传统的ETL架构进行改进,结合元数据管理思想,提出并设计了一种新的基于元数据驱动的ETL架构。结果表明,该ETL架构通过增加数据中转区并使整个ETL过程在元数据的指导下进行,有效保证了数据仓库的数据质量,提高了数据装载的效率,减轻了数据源和目标数据库的压力,增加了数据转换的灵活性和可靠性。
ETL (Extract-Transform-Load) is an important part for Data Warehouse to gain data with high quality. A kind of ETL tool which is well designed and powerful in function plays the key role in building a Data Warehouse System with good data quality and structure. First,in the paper it points out the limitations of traditional architecture of ETL and analyses the importance of the metadata Management to ETL process. Then,it improves the traditional ETL architecture and combines with the theory of Metadata Management to present a new ETL architecture based on metadata-driven. Theoretical analysis and results show that data quality in Data Warehouse is ensured efficiently and the data loading efficiency is raised by that ETL architecture due to adding a DSA in it as well as Metadata supervising the whole ETL process ,the ETL process underlying the proposed architecture can alleviate the pressure from data source and target database greatly and improve the flexibility and reliability of Data Transformation.