为解决传统关联规则挖掘算法对大规模连续数据库进行挖掘时所产生的信息损失和效率低下等问题,给出一种改进的模糊关联规则挖掘算法,称为F-ARMVLQD算法。该算法利用模糊均值聚类算法解决离散属性间隔之间出现“尖锐边界”的问题,同时算法引入有向无环图和字节向量用以提高频繁项目集的计算效率,并吸取分区算法的优势,解决对该数据库挖掘时磁盘操作频繁的问题,整个算法只需扫描两次数据库。实验结果表明,该算法比传统算法具有更高的执行效率。
To address the information loss and the low efficiency problem in the process of mining the association rules on the large database with continuous attributes by the traditional algorithm, an improved algorithm of fuzzy association rule mining named F-ARMVLQD is suggested. FCM (fuzzy c-means) is used to solve the "sharp boundary" problems between the discretizing attribute intervals. Meanwhile, the algorithm improves the computational efficiency of frequent itemsets by introducing the DAG (directed acyclic graphs) and the byte-vector structure, and draws the advantages of partition algorithm for reducing the I/O overhead generated during the database mining. The algorithm needs to scan the database for only twice. Experimental results show that the algorithm has a better performance than the traditional algorithm.