以SSD(solid state drive)为代表的新型存储介质在虚拟化环境下得到了广泛的应用,通常作为虚拟机读写缓存,起到优化磁盘I/O性能的作用.已有研究往往关注SSD缓存的容量规划,依据缓存读写命中率评价SSD缓存分配效果,未能充分考虑SSD的服务能力上限,难以适用于典型的分布式应用场景衍在虚拟机抢占SSD缓存资源,导致虚拟机中应用性能违约的可能.实现了虚拟化环境下面向多目标优化的自适应SSD缓存系统’考虑了SSD的服务能力上限.基于自适应闭环实现对虚拟机和应用状态的动态感知.动态检测局部SSD缓存抢占状态,基于聚类方法生成虚拟机的优化放置方案,依据全局SSD缓存供给能力确定虚拟机迁移顺序和时机,实验结果表明,该方法在应对典型分布式应用场景时可以有效缓解SSD缓存资源的争用,同时满足应用对虚拟机放置的需求,提升应用的性能并兼顾应用的可靠性.在Hadoop应用场景下,平均降低了25%的任务执行时间,对I/O密集型应用平均提升39%的吞吐率.在ZooKeeper应用场景下,以不到5%的性能损失为代价,应对了虚拟化主机的单点失效带来的虚拟机宕机问题.
As a new type of storage media, solid state drive (SSD) is widely used in virtualization environment. SSD is usually used as the read and write cache of the virtual machine (VM) storage to improve the disk I/O performance of the VMs. Existing SSD caching schemes mostly focus on the capacity planning of the SSD cache and use metrics such as cache hit rate to evaluate the effect of SSD cache allocation. Since they do not consider the limitations of service capabilities of SSD, which may lead to the contention of cache resource and the performance degradation and violations among VMs, they are not suitable to be used with some typical distributed applications. This article proposes a self-adaptive SSD caching system for multiobjective optimization in virtualization environment, to reduce the resource contention, and take the limitations of service capabilities of SSD into consideration. With the help of closed loop adaption, it can dynamically detect the status of VMs and applications. Moreover, it continuously detects the contention of SSD cache, generates the migration plan using clustering algorithm and decides the timing and order of the VM migrations according to the capabilities of SSD as well as the characteristics and requirements of applications. The evaluation shows that when facing the scenarios of using some typical distributed applications, the contention of SSD cache resource is reduced and the requirements of applications are considered, which lead to the improvement of performance and reliability of applications. For Hadoop applications, the execution time of jobs is reduced by 25% on average, and the throughput for I/O sensitive applications is improved by 39%. For ZooKeeper applications, the service outage caused by the single point of fault of the hypervisor can be handled at the cost of less than 5% of performance degradation.