多尺度理论已被引入到数据挖掘领域中,但目前多尺度数据挖掘的研究并不深入。缺乏普适性理论与方法。针对上述问题,研究了普适的多尺度数据挖掘理论,提出了尺度上推关联规则挖掘算法。首先基于概念分层理论给出了数据尺度划分和数据尺度的定义;然后根据多尺度理论的研究重点阐明了多尺度数据挖掘的实质及研究核心;最后在多尺度数据理论研究的基础上提出了尺度上推关联规则挖掘算法SU-ARMA(scaling-up association rules mining algorithm)。该算法利用采样理论和Jaccard相似性系数对数据集挖掘结果中的频繁项集进行处理,实现了多尺度数据间知识的向上推导。利用人造数据集和H省全员人口真实数据集对算法进行了实验和分析,实验结果表明算法具有较高的覆盖率、精确度和较低的支持度估计误差,是可行且有效的。
Many researches of data mining have paid close attention to multi-scale theory, but the study of multi-scale data mining is not thorough, still lacking for universal theories and approaches. To overcome this limitation, this paper conducted a study of universal multi-scale data mining theory and proposed a new algorithm for scaling-up association rules mining. Firstly, it gave the definition of data-scale-partition and data-scale based on concept hierarchy; secondly, illustrated the essence of multi-scale data mining according to the key point of multi-scale theory research; lastly, put forward a new algorithm named SU-ARMA based on theoretical research. Taking advantage of sampling theory and Jaccard similarity coefficient, SU-ARMA dealt with the frequent itemsets of data mining results, and realized the transition of knowledge in multi-scale data expressions. Experiments tested SU-ARMA with the help of synthetic dataset and demographic dataset from H province, and the experimental results turn out that SU-ARMA has better coverage rate, accuracy, lower average support error, and SU-ARMA is effective and feasible.