k-匿名模型是数据发布领域用于对原始待发布数据集进行匿名处理以阻止链接攻击的有效方法之一,但已有的k-匿名及其改进模型没有考虑不同应用领域对匿名发布表数据质量需求不同的问题.在特定应用领域不同准码属性对基于匿名发布表的数据分析任务效用的贡献程度是不同的,若没有根据发布表用途的差异区别处理各准码属性的泛化过程,将会导致泛化后匿名发布表数据效用较差、无法满足具体数据分析任务的需要.在分析不同应用领域数据分析任务特点的基础上,首先通过修正基本0DP目录系统建立适用于特定问题领域的概念泛化结构;然后在泛化过程中为不同准码属性的泛化路径设置权重以反映具体数据分析任务对各准码属性的不同要求;最后设计一种考虑属性权重的数据匿名发布算法WAK(QI weight—aware k-anonymity),这是一种灵活地保持匿名发布表数据效用的隐私保护问题解决方案.示例分析和实验结果表明,利用该方案求解的泛化匿名发布表在达到指定隐私保护目标的同时,能够保持较高的数据效用,满足具体应用领域特定数据分析任务对数据质量的要求.
In recent years, publishing data about individuals without revealing their identity information has become an active issue, and k-anonymity based models are the effective techniques that can prevent linking attacks. Most of the previous works, however, focus on the efficiency and the scope of application of the models. Specific requirements of quality of published microdata for the analyzing task in various scenarios and the difference of contributions of each QI attribute to the result have not been addressed. If the contribution of different generalizing paths and orders of QI attributes has not been considered, the published microdata may have bad utility in the application. Paying more attention to them, which makes the published table have different utility, is valuable. By analyzing the differences among several application areas, a scheme which provides an effective and secure tradeoff of privacy and utility, is proposed. Firstly the basic ODP is revised to indicate the characters of special domain. Secondly, the weight on quasi-attribute is introduced to reflect the effect for the data analyzing task. And then QI weight-aware k-anonymity (WAK), which is an algorithm based on the weight of attribute, is introduced. Theoretical analysis and experimental results testify that the scheme is effective and can preserve privacy of the sensitive data well, meanwhile maintaining better data utility.