针对在数据网格中创建多副本虽可有效提升下载速度、降低网络流量,但多副本创建会带来大量存储开销和网络流量开销,以及基于GridFTP协议的各种并行下载算法虽可进一步提升下载速度,但仍不能解决多副本对存储空间和网络流量的影响的问题,提出了一个能保证数据的完整性、存储的可靠性和降低存储空间的数据网格存储模型,并基于该存储模型和GridFTP协议,提出了一个并行下载调度算法。实验表明,该算法只需要较少的冗余便可达到现有的针对全副本的并行下载算法可达到的理想下载速度,取得较好的效果,实现并行快速传输、节约存储空间和降低网络流量的目标。
To solve the problems that creating a number of copies in data grid systems can bring great overheads in data storage and network traffic and a variety of parallel download algorithms based on the GridFTP protocol can not eliminate the influence of multi-copy on the storage space and network traffic, the paper puts forward a storage model for data grid systems to guarantee data integrity, storage reliability and reduce storage space, and presents a parallel download scheduling algorithm based on the model and the GridFTP protocol. The experimental results show that the proposed algorithm can achieve a better effect in fast parallel transmitting, storage space saving and network traffic reducing.