对差分隐私的基本概念和实现方法进行了介绍,提出了一种用于决策树分析的差分隐私保护数据发布算法。该算法首先将数据完全泛化,然后在给定的隐私保护预算下采用指数机制将数据逐步精确化,最后根据拉普拉斯机制向数据中加入噪声,保证整个算法过程满足差分隐私保护要求;对指数机制中方案选择的方法进行了有效的改进。相对于已有的算法,本算法可在给定的隐私保护预算下使数据泛化程度更小,使所发布数据建立的决策树模型具有更高的分类准确率。实验结果验证了本算法的有效性和相对于其他算法的优越性。
This paper introduced the basic concept and implementation methods about differential privacy. It proposed a dif- ferential private data publishing algorithm for building decision tree. The algorithm first totally generalized the raw data and then specialized the data recursively by using exponential mechanism with a given privacy budget. To ensure the algorithm meet the requirement of differential privacy, it added noise to the data according to the Laplace mechanism finally. The advan- tage of the algorithm over existing ones is that the solution selecting method in exponential mechanism is improved and the raw data can be generalized in a less level with the given privacy budget. Thus the released data can yield a better decision tree model with higher classification accuracy. Experimental results demonstrate that the proposed algorithm performs better than the existing ones for classification analysis.