在数据挖掘隐私保护进行协作数据分析时,部分数据集可能分属不同的数据对象,处理时就需要采取不同的数据失真方法.提出了一组全新的数据失真优化策略,通过将属性划分与奇异值分解法(SVD)、非负矩阵因子分解法(NMF)、离散小波变换法(DWT)相结合,运用4种方案对隐私保护原始数据集的子矩阵进行扰动,并用一些衡量指标来衡量这些策略的效果;利用基于支持向量机(SVM)的二元分类来进行数据实用性的检测.结果表明与数据失真单策略相比,新提出的方案在实现数据隐私和数据实用性的良好平衡方面效果十分显著,为协作数据分析提供了可行性解决方案.
In collaborative data analysis of privacy preservation based on data mining, part of the data sets may come from different data objects and may be processed using different data distortion methods. This paper proposes a group of data distortion strategies. By combining the property division with the singular value decomposition (SVD), non-negative matrix factorization (NMF), and discrete wavelet transform (DWT), four schemes are used to disturb the sub-matrix of the original data matrix of privacy preservation, and with some measurable indicators to measure the effectiveness of these strategies. Data utility is examined by using a binary classification method based on the support vector machine (SVM). Experimental results indicate that, in comparison with the individual data distortion techniques, the proposed schemes are very efficient in achieving a good trade-off between data privacy and data utility, providing a feasible solution for collaborative data analysis.