针对海量数据中存在的大量冗余信息,本文设计并实现了一种基于重复数据删除的文件备份与恢复系统,该系统采用改进的Winnowing动态分块算法,将文件分割成不同长度的数据块,并结合摘要算法、索引表、数据压缩等技术,确保服务器仅存储数据块唯一副本,以达到重复数据的删除目的.实验表明,该系统相比cwRsync能较更好的减少网络流量,并且相比传统的压缩技术能更进一步减少磁盘空间占用率.
Aimed at much redundant information of mass data, the authors design and carry out a file backup and recovery system in this article. For achieving the goal of data deduplication, this system applying improved winnowing splits files into blocks with variable length, and combines digest algorithm, index table and compression technology to insure that there is only one copy of every data block saved on the server. The experimental results show that our system can reduce more network traffic than cwRsync. Furthermore, compared with traditional compression techniques, the system has lower disk space occupancy rate.