针对目前在线学习系统练习、测试环节随机抽题的盲目性,基于在线评测数据和K-means聚类算法,利用不同的特征子集、不同参数对题目进行聚类。在ACM Online Judge系统的评测数据集上,以时间波动、平均用时和重复提交率为特征,通过聚类分析建立题目等级分类模型,实现题目难度等级分类,并对不同特征值数量和聚类中心数量对分类效果的影响进行实验研究,以确定最佳分类模型。实验结果表明,提出的方法简单有效,模型的分类结果符合经验分类结果。
To tackle with the blindness of random questions choosing for exercise and test of on-line learning system, this paper clusters questions by exploiting various feature subsets and parameters based on online judge data and K-means. For the test data of ACM online judge system, the features of temporal fluctuations, mean of time consumption and repeat submission rate are used to build a classification model which will be optimized based on experimental analysis of number of features and clusters. The ex- perimental results show that the proposed method is simple but effective, the classification results of the model are consistent with the empirical results.