研究了适应当前大数据时代的数据可靠性存储,针对已有存储策略难以同时满足高可靠性存储和高空间利用的需求的问题,提出了一种面向大数据的高可靠低冗余分级编码存储策略。该策略考虑到数据因类型不同、生命周期不同而重要程度有别的特性,可为不同类型数据分别设定容错级别;将不同冗余度的容错编码方式在-套统-存储架构中实现,用一组简单参数设置为数据选择恰当的容错级别编码存储;通过动态降低历史数据的冗余度进一步减少存储空间开销。实验验证了其有效性。对重要小文件采用高容错级别的编码分片存储,能在系统95%存储节点失效的情况下,根据编码后的部分数据分片快速修复所有数据;对普通文件采用适当放松的容错编码级别,在保证数据快速、无损修复的前提下比传统3副本策略节省1.5倍的存储空间。
A study of high celiable data storage in the current big data age was conducted, and a novel hierarchical storage strategy for high reliable, low redundant storage of big data was proposed to solve the contradiction between the high reliability and the low storage utilization, facing the traditional storage strategies such as the multi-replication and the unified coding. To satisfy the diverse requirements of reliability for different storage objects, this strategy uses a unique architecture to provide variety of encoding methods for fault-tolerance. By setting the higher fault tolerance level for small text files and the lower fault tolerance level for large media files, the proposed strategy can bring the space overhead down from 200% to 50% compared with the triplication strategy. In addition, the small files will be recoverable even if 95% of storage node failures.