目前的关系数据库代价模型及查询优化算法无法处理保存在第三级存储器中的海量数据.提出了估算第三级关系代数操作的代价模型,通过定义若干基本数据访问模式及两种模式合成方法的代价,导出关系代数操作的代价.提出了针对第三级存储器的查询优化方法,该方法不仅可以选择最高效的关系代数操作实现算法,而且可以选择I/O代价最小的关系副本,从而提高查询效率.实验结果表明,应用提出的代价模型及查询优化方法后可以显著地提高第三级存储器上数据的查询效率.关系副本的引入充分证明了用存储空间换取查询执行时间的策略的可行性.
The management of DBMS on tertiary storage is becoming more and more important with the development of applications, not only because tertiary devices are used to archive data, but also the amount of data that application has to deal with is increasing rapidly. The cost model and query optimization method of current disk based database management system can't deal with massive data on tertiary storage. A cost model which can evaluate relational operations for tertiary resident data is proposed. The cost of various relational operations can be deduced through the cost definitions of several basic data accessing pattern and the costs of two pattern combination operators. To further improve query efficiency, multiple relation copies are stored on the tertiary storage with different organization methods. The cost model can also evaluate the cost of the same relational operation on different relation copies. Two query optimization methods are also proposed, which can not only choose the most efficient implementation algorithm for relational operators, but also choose the most I/O efficient copy of the relation on tertiary storage. The experimental results show that query efficiency for tertiary resident data can be greatly improved by adapting the proposed cost model and the query optimization methods. The introduction of relation copies demonstrates the feasibility of improving query efficiency at the cost of using more storage space.