在数据发布的隐私保护中,现有的算法在划分临时匿名组时,没有考虑临时匿名组中相邻数据点的距离,在划分过程中极易产生许多不必要的信息损失,从而影响发布匿名数据集的可用性。针对以上问题,提出矩形投影区域,投影区域密度和划分表征系数等概念,旨在通过提高记录点的投影区域密度来合理地划分临时匿名组,使划分后的匿名组产生的信息损失尽量小;并提出基于投影区域密度划分的k匿名算法,通过优化取整划分函数和属性维选择策略,在保证匿名组数量不减少的同时,减少划分过程中不必要的信息损失,进一步提高发布数据集的可用性。通过理论分析和实验验证了算法的合理性和有效性。
In data publishing privacy preserving, while classifying temporary anonymous groups, the existing algorithms didn't consider the distance between adjacent data points, and could easily produce a lot of unnecessary information loss, thus affecting the availability of released anonymous data sets. To solve the above problem, the concept of rectangular projection area, the projection area density and partition coefficient characterization were presented, aim to increase the recording points's projection area density to divide temporary anonymous group reasonably, and to make the information loss of divided anonymous groups as small as possible. And presents the algorithm for k-anonymity based on projection area density partition, by optimizing the rounded partition function and properties dimension selection strategy, to reduce unnecessary information loss and to further improve the availability of released data sets, without reducing the number of anonymous groups. The rationality and validity of the algorithm are verified by theoretical analysis and multiple experi- ments.