数据挖掘问题是提高k-匿名隐私保护模型下数据可用性问题之一.通过分析发现,k-匿名表中准标识符属性值与利用精确表生成的判定树的部分非叶结点的属性值均是通过泛化产生的,根据这一对应关系,本文提出了一种基于k-匿名表的判定树生成算法.该算法直接以k-匿名表作为输入,避免了经典ID3算法运行前的数据准备工作.实验表明,该算法节省了建立概化层次树的时间,并且行之有效.
Data mining is one of problems for the utility of anonymized data under the k-anonymity privacy protection model.Through analysis,we find that both the quasi-identifier attribute values in the k-anonymity table and the node except leaf of the decision tree in the private table are needed to generalize.According to this correspondence,we propose a decision tree algorithm based on k-anonymity.The algorithm accepts the k-anonymity table as input to avoid the ID3algorithm data preparation work before running.Experimental results show that the algorithm saves the time which is used to build generalize tree and it is efficient for k-anonymity data table.