提出了一种利用Q-学习解决动态单机调度环境下的自适应调度规则选择的方法。该方法针对动态调度环境中系统状态空间大,Q-学习不易收敛的特点,首先提取系统状态特征,对系统状态进行合理聚类,有效地降低系统状态空间维数,然后在学习过程中令设备Agent根据瞬时状态向量对各聚类状态的隶属度做出综合判断,选择合适规则,并在每次迭代后根据隶属度将动作奖惩分配给各聚类状态的动作值函数。仿真结果表明,所提Q-学习算法较之传统Q-学习具有更快的收敛速度,提高了设备Agent的动态调度规则选择能力。
Q-learning was applied to resolution of the adaptive dispatching rule selection problem under dynamic single-machine scheduling environment. Considering that Q-learning is hard to converge due to the large scale of the system state space during dynamic scheduling, the method extracts several state features of the system firstly, so that the dimension of the system state space can be reduced through the fuzzy clustering method. Then the machine agent can choose proper rules based on the transient system state membership of all the clustering system states. Each time after machine agent performs an action, the reward is assigned to all the value functions of the same rule in different clustering system states according to the fuzzy membership. The simulation results demonstrate that the proposed algorithm has a faster convergence rate, compared with the traditional Q-learning algorithm, and can improve the dynamic dispatching rule selection ability of machine agent.