针对多租户集群中无法保证作业服务水平目标(SLO)的问题,提出了一种多租户场景下基于SLO的调度机制,其中包括优先调度算法和资源抢占算法。优先调度算法区别考虑超额使用资源的租户和未超额使用资源的租户,赋予后者的作业更高的优先级,在此前提下选择紧急度最高的作业,优先为其分配资源;资源抢占算法在资源受限的情况下,选择紧急度超过阈值的作业实施资源抢占,并根据租户的资源使用情况,在相应的运行作业范围内选择紧急度最低的作业,抢占其资源。实验结果表明,与现有保证公平的多租户调度器Capacity Scheduler相比,该调度机制可以在兼顾作业执行效率和租户间公平的前提下,显著提高作业的截止时间保证率,从而保证业务的服务水平目标。
A scheduling mechanism based on Service Level Objective( SLO) in multi-tenant cluster, including a preference scheduling algorithm and a resource preemption algorithm, was proposed to solve the problem of the inability to guarantee the SLOs of jobs in multi-tenant clusters. The preference scheduling algorithm considered the users who overused resources above their quota and the users who did not, then assigned a higher priority to the jobs of the latter users, under this condition, the job with highest priority was preferentially allocated resources. When the resources was limited, the resource preemption algorithm preempted the resource for the jobs whose urgency was above the threshold, and chose the jobs with the lowest urgency in the corresponding range of the running jobs according to the resource usages of tenants. The experimental results show that, compared with the current multi-tenant scheduler named Capacity Scheduler, the proposed mechanism can significantly improve the deadline guarantee rate of jobs and SLO with guaranteeing the job execution efficiency and the equity among tenants at the same time.