频繁项集挖掘是关联规则挖掘的重要内容,而现有的频繁项集挖掘算法在数据库扫描和复杂数据结构构建方面消耗过多的时间,效率较低。为克服现有频繁项集挖掘算法的不足,提出了基于随机相遇的频繁项集挖掘算法。在随机相遇过程中,不断从原始事务集中随机挑选两条事务,将其交集作为新事务集中的元素,通过计算新事务集中最小支持度与原事务集中最小支持度的关系,将在原事务集上的频繁项集挖掘转化为在新事务集上的频繁项集挖掘,算法的时间复杂度和空间复杂度大大降低。由于随机样本蕴含原始数据集的主要统计特性,新事务集具有原事务集的统计特性,通过调整参数,算法在新事物集上挖掘结果的准确度可以得到保证。并利用一个零售超市的交易数据对该算法的有效性进行了测试。测试结果表明,该算法能将挖掘速度提升数十倍,同时挖掘结果的准确度和其它算法相差不大。
An association rule mining algorithm based on random meeting is proposed to handle the inefficient problem of algorithms. During random meeting, two transactions are selected randomly from an original set, and then the intersection of the transactions is computed and taken as the elements of a new set. The frequent item mining on the original set can be instead on the new set by mapping the min support degree on the two sets. Due to the fact that the new set is smaller than the original one, and the statistical properties of the new set are similar to those of the original one, the complexity of the new algorithm can be reduced while getting the similar accuracy to existing algorithms. The new algorithm was evaluated based on the transactions of a supermarket. The experiment results showed that the new algorithm can improve the speed of frequent item-set mining by more than ten times while achieve the similar accuracy of the computing results compared with other algorithms.