东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于重复数据删除的远程备份系统

ISSN号：1000-7024
期刊名称：计算机工程与设计
时间：2012.12.16
页码：4546-4550
分类：TP309.3[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]四川大学计算机学院,四川成都610065
相关基金：国家自然科学基金项目（61173159）;教育部重大项目培育基金项目（708075）

作者：姜涛|刘晓洁|

关键词：重复数据删除, 变长分块, 磁盘索引, 远程备份, 数据容灾, data de-duplication, variable-length chunking, disk index, remote backup, data tolerant

中文摘要：

针对传统远程备份中大量冗余数据导致备份效率低下和存储空间浪费的问题，设计并实现了一个基于重复数据删除的远程备份系统。首先根据文件的内容用Rabin指纹将备份文件划分为变长的数据块，把每个数据块的相关信息发送到备份中心，在备份中心利用Google Bigtable及Leveldb的索引算法辅以布隆过滤器对数据块进行判重，最后只传输和存储不重复的数据块。实验结果表明，采用该系统备份相似的数据集能够有效删除其q-的重复数据。对数据集进行增量备份，在增量数据变化不大时，相比Rsync备份有更少的网络流量。

英文摘要：

To the problem that a large number of redundant data caused inefficient backup and storage waste in traditional remote backup, a remote backup system based on data de-duplication is designed and implemented. Backup files are divided into variable length chunks based on Rabin fingerprint of contents. Chunks＇ information is sent to backup centre where duplicate chunks are sought by using Google Bigtable and Leveldb index algorithm along with bloom filter. Finally, it only transmitted and stored unique chunks. Experimental results show that, it can remove duplicate data effectively to backup similar data sets. Compared with Rsync backup, it has less network flow when it does a incremental backup which has small incremental data.

同期刊论文项目