数据表的k-匿名化(k-anonymization)是数据发布时保护私有信息的一种重要方法.泛化/隐匿是实现k-匿名的传统技术,然而,该技术存在效率低、k-匿名化后数据的可用性差等问题.近年来,微聚集(Microaggregation)算法被应用到数据表的k-匿名化上,弥补了泛化/隐匿技术的不足,其基本思想是:将大量的数据按相似程度划分为若干类,要求每个类内元组数至少为k个,然后用类质心取代类内元组的值,实现数据表的k-匿名化.本文综述了微聚集算法的基本思想、相关技术和当前动态,对现有的微聚集算法进行了分类分析,并总结了微聚集算法的评估方法,最后对微聚集算法的研究难点及未来的发展趋势作了探讨.
K-anonymization of tables is a method to prevent private information from disclosure prior to publication, which is achieved traditionally via generalization/suppression techniques. However, these methods have some defects on efficiency, availability, etc. Recently, microaggregation algorithm is proposed as an alternative to generalization/suppression method for k-anonymization whose goal is to cluster a set of records into groups of size at least k such that groups are as homogeneous as possible. Then the records'attribute values in the same group are replaced by the group's centroid. Microaggregation algorithms'core ideas, the stateof-the-art and related techniques are surveyed. The existing algorithms are classified and analyzed. Evaluation methods of microaggregation algorithms are investigated. Finally, some open problems and the research directions in this area are discussed.