在发布同时包含关系和事务属性的数据(简称为关系.事务数据)时,由于关系数据和事务数据均有可能受到链接攻击,需要同时匿名这两部分的数据.现有的数据匿名技术在匿名化关系.事务数据时会造成严重的数据缺损,无法保障数据可用性.针对此问题,提出了(k,l)-多样化模型,通过等价类上的l-多样化约束和事务数据上的肛匿名约束来保证用户隐私不被泄露.在此基础上,设计并实现了APA和PAA两种满足该模型的匿名算法,以不同的顺序对关系-事务数据进行匿名,并提出了相应的数据缺损评估方法.实际公开数据集上的实验结果表明,与现有的数据匿名技术相比,APA和PAA能够在保护用户隐私的前提下,以更低的数据缺损和更高的效率完成对关系-事务数据的匿名.
When publishing datasets that contain relational and transaction attributes, referred to as RT-data for briefness, either type of data may suffer from linking attacks. Anonymizing both of them is essential However, previous approaches suffer from huge information loss during anonymizing RT-data, and they fail to preserve the utility of datasets. To address this problem, an anonymization model, (k,l)- diversity is proposed to ensure privacy by guaranteeing l-diversity on each equivalence class and k-anonymity on transaction data. In addition, two heuristic algorithms named APA and PAA, which anonymize RT-data in different orders, are also provided to achieve (k,l)- diversity. Extensive experiments based on real-world dataset show that APA and PAA outperform existing approaches in terms of execution time and information loss.