随着互联网的快速发展和大数据时代的来临,传统数据库的局限性开始逐渐显现,而支持海量数据存储和高并发访问的分布式数据库系统越来越流行.在此背景下阿里巴巴集团研发了一款适用于海量数据存储的分布式数据库系统(OceanBase),并提供单集群和多集群两种部署模式.但多集群部署模式下的可用性较低,无法满足关键性应用的需求,包括:发生故障时不支持主备集群的自动切换;主备集群之间无法保证日志的强同步.针对上述问题,本文分析了传统数据库的高可用方案,针对OceanBase架构的特点,结合了Raft算法的思想,设计并实现了基于时间戳的分布式选举模块、自动化的集群切换模块和基于QUORUM策略的日志强同步模块.经实验验证,以上模块的实现能够提高系统整体的可用性.
With the rapid development of Internet and the up-coming Big Data era, the limitation of traditional database has been emerged and enlarged. The distributed database system based on massive data storage and high concurrent accesses has become more and more popular. Alibaba group developed a distributed database system suitable for mass data storage named OceanBase, which supports two deployment modes, i. e., single cluster and multiple clusters. But the availability of multiple clusters mode is not efficient and can't satisfy the requirement of some critical applications, where it does not support the automatic switch between master cluster and slave cluster when a failure occurred and the inconsistent log is also generated during switching under multiple clusters mode. To address these problems, we analysis the high availability solutions of the traditional database, aiming at the characteristics of OceanBase architecture, combining the idea of in Raft, and then designs and implements the distributed election module based on the timestamp of logs, the automatic clusters switching module and the strong synchronization logs module based on QUORUM. The experimental results showed that the above approachescould improve the availability of the whole system.