为越过在象文件同步那样的应用程序的宽区域网络(广域网) 的文件通讯的数据 deduplication 并且云环境反射通常完成以数据 deduplication 的重要时间开销的成本节省的重要带宽。时间开销包括在二个地理上分布式的节点为数据 deduplication 要求的时间(例如,磁盘存取瓶颈) 并且在发送者之间的复制质问 / 答案操作和接收装置,后来,每询问或答案介绍至少一个潜伏的双程的时间(RTT ) 。在这份报纸,我们在场越过有元数据反馈和元数据利用(MFMU ) 的广域网的一个数据 deduplication 系统,联系了时间开销以便利用数据 deduplication。在建议 MFMU 系统,到发送者的从接收装置的选择元数据反馈被介绍减少复制质问 / 答案操作的数字。另外,到马具,元数据在接收装置联系了磁盘 I/O 操作,以及带宽开销由元数据反馈介绍了,磁滞现象哈希值重新组合机制基于的元数据利用部件被介绍。我们的试验性的结果证明 MFMU 与保存没被元数据反馈减少的比率的带宽完成了 20%40% deduplication 加速的一般水准,当与基线相比内容定义组合(CDC ) 在 LBFS (Low-bandwith 网络文件系统) 使用,组合算法的退出的最先进的 Bimodal 基于数据 deduplication 解决方案。
Data deduplication for file communication across wide area network (WAN) in the applications such as file synchronization and mirroring of cloud environments usually achieves significant bandwidth saving at the cost of significant time overheads of data deduplication. The time overheads include the time required for data deduplication at two geographi- cally distributed nodes (e.g., disk access bottleneck) and the duplication query/answer operations between the sender and the receiver, since each query or answer introduces at least one round-trip time (RTT) of latency. In this paper, we present a data deduplication system across WAN with metadata feedback and metadata utilization (MFMU), in order to harness the data deduplication related time overheads. In the proposed MFMU system, selective metadata feedbacks from the receiver to the sender are introduced to reduce the number of duplication query/answer operations. In addition, to harness the metadata related disk I/O operations at the receiver, as well as the bandwidth overhead introduced by the metadata feedbacks, a hysteresis hash re-chunking mechanism based metadata utilization component is introduced. Our experimental results demonstrated that MFMU achieved an average of 20%~40% deduplication acceleration with the bandwidth saving ratio not reduced by the metadata feedbacks, as compared with the "baseline" content defined chunking (CDC) used in LBFS (Low-bandwith Network File system) and exiting state-of-the-art Bimodal chunking algorithms based data deduplication solutions.