数据挖掘在多尺度研究方面取得了一些成果。然而,多尺度数据挖掘研究还不够深入和完善。目前针对空间和图像数据的研究较多,对于一般数据的多尺度数据挖掘的研究较少。随着大数据应用的不断发展,其研究显得尤为重要。针对上述问题,进行了普适的多尺度关联规则尺度转换方法的研究。首先,基于包含度的相似度理论提出频繁项集的处理方法;然后,以图像金字塔为理论基础,提出了多尺度关联规则尺度上推算法MSARSUA(Multi-Scale Association Rules Scaling Up Algorithm);最后,利用H省1)全员人口真实数据集、UCI公用数据集和IBM数据集对所提算法进行了实验验证与分析,结果表明MSARSUA具有较高的覆盖率、较高的F1-measure值和较低的平均支持度估计误差,在效率上比Apriori算法和FP-Growth算法有较大的提升,在性能上比SU-ARMA有更好的表现。
Great achievements have been made on multi-scale research of data mining.However,multi-scale data mining research is far from being deep and perfect.Current research,which mainly focuses on space and image data,pays less attention to multi-scale data mining on the general data.With the continuous development of big data applications,research of multi-scale data mining becomes particularly important.Regarding the issue above,this paper carried out a study of scale-conversion methods on universal multi-scale association rules data mining.First of all,this paper gave an approach of frequent items based on the similarity theory of including degree.Then,the paper proposed an algorithm named MSARSUA(Multi-Scale Association Rules Scaling Up Algorithm)based on the theory of image pyramid.Finally,experimental results on data sets from H province,UCI and IBM show that algorithm MSARSUA has higher coverage,higher F1-measure and lower estimation error of average support.Algorithm MSARSUA outperforms both Apriori algorithm and FP-Growth algorithm on efficiency aspect.Meanwhile,the results indicate that algorithm MSARSUA possesses superior performance compared with algorithm SU-ARMA.