为了防止数据敏感属性的泄露,需要对数据敏感属性进行匿名保护。针对l-多样性模型当前已提出的算法大多是建立在概念层次结构的基础上,该方法会导致不必要的信息损失。为此,将基于属性泛化层次距离KACA算法中的距离度量方法与聚类结合,提出了一种基于聚类的数据敏感属性匿名保护算法。该算法按照l-多样性模型的要求对数据集进行聚类。实验结果表明,该算法既能对数据中的敏感属性值进行匿名保护,又能降低信息的损失程度。
In order to prevent the disclosure of data sensitive attributes,it requires preserving the anonymity of data sensitive attributes.The current algorithm that has proposed to meet l-diversity is mostly based on the hierarchy,which can lead to unnecessary information loss.For this reason,this paper proposed a clustering-based algorithm for data sensitive attributes anonymous protection,it adopted an improved distance measure method which was from achieving k-anonymity by clustering in attribute hierarchical structures and combined clustering together,the algorithm in accordance with the requirements of l-diversity model clustering of data sets.Experimental results show that the algorithm can not only protect anonymity of sensitive attri-butes in data set,but also reduce the extent of information losses.