隐私保护是当前数据挖掘领域中一个十分重要的研究问题,其目标是要在不精确访问真实原始数据的条件下,得到准确的模型和分析结果.为了提高对隐私数据的保护程度和挖掘结果的准确性,提出一种有效的隐私保护关联规则挖掘方法.首先将数据干扰和查询限制这两种隐私保护的基本策略相结合,提出了一种新的数据随机处理方法.即部分隐藏的随机化回答(randomized response with partial hiding,简称RRPH)方法,以对原始数据进行变换和隐藏.然后以此为基础,针对经过RRPH方法处理后的数据,给出了一种简单而又高效的频繁项集生成算法,进而实现了隐私保护的关联规则挖掘.理论分析和实验结果均表明,基于RRPH的隐私保护关联规则挖掘方法具有很好的隐私性、准确性、高效性和适用性.
Privacy preservation is one of the most important topics in data mining. The purpose is to discover accurate patterns without precise access to the original data. In order to improve the privacy preservation and mining accuracy, an effective method for privacy preserving association rule mining is presented in this paper. First, a new data preprocessing approach, Randomized Response with Partial Hiding (RRPH) is proposed. In this approach, the two privacy preserving strategies, data perturbation and query restriction, are combined to transform and hide the original data. Then, a privacy preserving association rule mining algorithm based on RRPH is presented. As shown in the theoretical analysis and the experimental results, privacy preserving association rule mining based on RRPH can achieve significant improvements in terms of privacy, accuracy, efficiency, and applicability.