为了解决大规模云存储系统中Master节点发生故障导致存储服务不可用的问题,建立了面向云存储系统管理节点发生故障时的故障影响分析模型。该模型以存储服务可用性、数据可靠性和数据可用性为分析目标,通过故障状态、管理节点实时状态以及管理节点故障的限制条件三个维度对故障影响进行分析,为恢复故障提供了有效的方法依据。同时,基于故障影响分析模型,提出了一种基于消息的Master节点故障动态切换算法-DSA-M。该算法通过基于序号的优先级策略实现了Master节点动态申请和切换,保证了云存储服务的高可用性。测试结果表明,DSA—M算法能够在Master节点发生故障时自动进行Master节点的切换和接管,恢复云存储服务的运行;通过控制故障检测周期,能够使得DSA—M算法的性能保持在相对稳定的区间内,随失效时刻的适应性也比较强。
In order to solve the unavailable problem of storage service on account of the Master node fault in huge cloud storage system, an analysis model for fault effect of management node has been constructed, which takes storage service availability, data reliability and data availability as the analysis target. Three-dimensional element has been employed in the analysis model such as fault statUs,real-time status and restrictive condition so as to provide an effective method for fault recovery. Based on the analysis model, a dynamic switching algorithm for master node based on message called DSA-M has been presented,in which it implements the dynamic application and switching of Master node by PRI policy based on sequence number and ensures the high availability. Test results show that DSA-M has provided management nodes auto switching and taken over while master node is breakdown and high storage service availability. The per- formance of DRA-M also can be stable in relative region by reasonable control of fault detection cycle and DRA-M also has strong a- daptability for crush moment.