数据挖掘中的隐私保护方法,试图在不精确访问原始数据详细信息的条件下,挖掘出准确的模式与规则.围绕着分类挖掘中的隐私保护问题展开研究,给出了一种基于数据处理和特征重构的朴素贝叶斯分类中的隐私保护方法.分别提出了一种针对枚举类型的隐私数据处理与特征重构方法——扩展的部分隐藏随机化回答(Extended Randomized Response with Partial Hiding,ERRPH)方法和一种针对数值类型的隐私数据处理与特征重构方法——转换的随机化回答(TransformingRandomizedResponse,TRR)方法,并在此基础上实现了一个完整的隐私保护的朴素贝叶斯分类算法.理论分析和实验结果均表明:朴素贝叶斯分类中基于ERRPH和TRR的隐私保护方法具有很好的隐私性、准确性、高效性和适用性.
Privacy preserving data mining is to discover accurate patterns without precise access to the original data. This paper focuses on privacy preserving classification, and presents a privacy preserving Naive Bayes classification approach based on data randomization and feature reconstruction. An ERRPH (Extended Randomized Response with Partial Hidding) method and a TRR (Transforming Randomized Response) method are respectively presented for enumerated data and numerical data. Then, a privacy preserving Naive Bayes classification algorithm is implemented based on those methods. Theoretical analyses show that it can provide better privacy, accuracy, efficiency, and applicability. The effectiveness is also verified by experiments.