请求负载均衡,是分布式文件系统元数据管理需要面对的核心问题.以最大化元数据服务器集群吞吐量为目标,在已有元数据管理层之上设计实现了一种分布式缓存框架,专门管理热点元数据,均衡不断变化的负载.与已有的元数据负载均衡架构相比,这种两层的负载均衡架构灵活度更高,对负载的感知能力更强,并且避免了热点元数据重新分布、迁移引起的元数据命名空间结构被破坏的情况.经观察分析,元数据尺寸小、数量大,预取错误元数据带来的代价远远小于预取错误数据带来的代价.针对元数据的以上鲜明特点,提出一种元数据预取策略和基于预取机制的元数据缓存替换算法,加强了上述分布式缓存层的性能,这种两层的元数据负载均衡框架同时考虑了缓存一致性的问题.最后,在一个真实的分布式文件系统中验证了框架及方法的有效性.
Request load balancing is the core issue in distributed file system metadata management. To maximize the throughput of the metadata service, an adaptive request load balancing framework is critical. This paper presents a distributed cache framework above the distributed metadata management schemes to manage hotspots rather than managing all metadata to achieve request load balancing. Compared with the existing distributed metadata load balancing framework, it has a higher degree of flexibility of the two-tier load balancing structure, and is stronger on the perception of the overall load. It also avoids hot spots redistribution and namespace structure destruction caused by metadata migration. Compared with data, metadata has its own distinct characteristics, such as small size and large quantity. The cost of non-use metadata prefetching is much less than data prefetching. Based on this study, a time period-based prefetching strategy and a perfecting-based adaptive replacement cache algorithm are devised to improve the performance of the distributed caching layer to adapt constantly changing workloads. Finally, the presented approach is evaluated with a Hadoop distributed file system cluster.