多尺度理论已被引入到数据挖掘领域,但人们对其研究仍不够深入和完善,缺乏普适性理论与方法.随着大数据处理应用的不断深入,其研究变得更加迫切.针对上述问题,进行了普适的多尺度数据挖掘理论和方法的研究.首先,基于概念分层理论给出了数据尺度划分和数据尺度的定义以及多尺度数据集之间的上下层尺度数据集关系;其次,阐明了多尺度数据挖掘的定义、研究实质和方法分类;最后,提出了多尺度数据挖掘算法框架,给出其理论基础,并将此框架应用于关联规则挖掘,提出了多尺度关联规则挖掘算法MSARMA(multi-scale association rules mining algorithm),实现了多尺度数据集之间知识的跨尺度推导.利用IBM T10I4D100K数据集和H省全员人口真实数据集对MSARMA算法进行了实验和分析,实验结果表明:算法具有较高的覆盖率、精确度和较低的支持度估计误差,是可行且有效的.
Many researches of data mining have paid close attention to multi-scale theory. However the study of multi-scale data mining still comes short on universal theories and approaches. To overcome this limitation, this paper conducts a study of universal multi-scale data mining on theoretical and methodological aspect. First, the paper lays out the definition of data-scale-partition and data-scale based on concept hierarchy, and characterizes the relationship of upper-layer and lower-layer datasets between multi-scale datasets. Next, it illustrates the definition and essence of multi-scale data mining, and presents the classification of multi-scale data mining methods. Finally, it introduces the algorithm framework and its theoretical basis of multi-scale data mining, and proposes an algorithm named MSARMA(multi-scale association rules mining algorithm) to realize the transition of knowledge in multi-scale data expressions. Experiments are carried out to test MSARMA with the help of IBM T10I4D100 K dataset and demographic dataset from H province, and the results indicate that MSARMA is effective and feasible with better coverage rate, better accuracy and lower average support error.