针对现有的隐私保护关联规则挖掘算法无法满足效率与精度之间较好的折中问题,提出了一种基于安全多方计算与随机干扰相结合的混合算法。算法基于半诚实模型,首先使用项集随机干扰矩阵对各个分布站点的数据进行变换和隐藏,然后提出一种方法恢复项集的全局支持数。由于采用的是对项集进行干扰,克服了传统方法由于独立地干扰每个项而破坏项之间相关性,导致恢复精度下降的缺陷。将小于阈值的项集进行剪枝,再使用安全多方计算在剪枝后的空间中精确找出全局频繁项集,进而生成全局关联规则。实验表明,该算法在保持隐私度的情况下,能够获得精度和效率之间较好的折中。
To solve the problem that the existing privacy preserving association rule mining algorithm can not meet better trade-off between efficiency and accuracy, thia paper proposed a new hybrid algorithm based on secure multi-party computation combined with randomization approach. The algorithm based on semi-honest model and used the itemset random perturbation matrix to disturb and hide the data of distribution sites firstly, then it proposed a method to estimate the global support of candidate itemsets. Due to it disturbed the itemsets, overcoming the defect of traditional algorithms which disturbed each item independently and made the result inaccuracy. Then it pruned to reduce the search space of candidate frequent itemsets. Secondly, it used the secure multi-party computation to find the precise global frequent itemsets in the reduced search space. The experimental results show that the proposed algorithm can obtain better trade-off between accuracy and efficiency while maintain the privacy.