东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于分级编码的高可靠存储策略

ISSN号：1002-0470
期刊名称：高技术通讯
时间：2013.11.11
页码：1103-1109
分类：TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所计算机应用研究中心,北京100190, [2]中国科学院信息工程研究所,北京100093, [3]中国科学院大学,北京100049
相关基金：863计划（2012AA01100,2012AA01A401）,国家自然科学基金（61070028,61003063）和中国科学院先导专项（XDA06030200）资助项目.
相关项目：并行程序中非确定错误的调试技术研究

关键词：大数据, 存储, 可靠性, 容错, 低冗余, 分级, 编码, big data, storage, reliability, fault tolerance, low redundancy, hierarchical, coding

中文摘要：

研究了适应当前大数据时代的数据可靠性存储，针对已有存储策略难以同时满足高可靠性存储和高空间利用的需求的问题，提出了一种面向大数据的高可靠低冗余分级编码存储策略。该策略考虑到数据因类型不同、生命周期不同而重要程度有别的特性，可为不同类型数据分别设定容错级别；将不同冗余度的容错编码方式在-套统-存储架构中实现，用一组简单参数设置为数据选择恰当的容错级别编码存储；通过动态降低历史数据的冗余度进一步减少存储空间开销。实验验证了其有效性。对重要小文件采用高容错级别的编码分片存储，能在系统95％存储节点失效的情况下，根据编码后的部分数据分片快速修复所有数据；对普通文件采用适当放松的容错编码级别，在保证数据快速、无损修复的前提下比传统3副本策略节省1．5倍的存储空间。

英文摘要：

A study of high celiable data storage in the current big data age was conducted, and a novel hierarchical storage strategy for high reliable, low redundant storage of big data was proposed to solve the contradiction between the high reliability and the low storage utilization, facing the traditional storage strategies such as the multi-replication and the unified coding. To satisfy the diverse requirements of reliability for different storage objects, this strategy uses a unique architecture to provide variety of encoding methods for fault-tolerance. By setting the higher fault tolerance level for small text files and the lower fault tolerance level for large media files, the proposed strategy can bring the space overhead down from 200% to 50% compared with the triplication strategy. In addition, the small files will be recoverable even if 95% of storage node failures.

同期刊论文项目