东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于KD树最优投影划分的k匿名算法

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]安徽师范大学数学计算机科学学院,芜湖241003, [2]安徽师范大学网络与信息安全工程研究中心,芜湖241003
相关基金：基金项目：国家自然科学基金（61672039,61370050）,安徽省自然科学基金（1508085QF133）

作者：汪小寒[1,2], 罗永龙[1,2], 江叶峰[1], 赵传信[1,2], 吴文莉[1], 郭良敏[1,2]

关键词：数据发布, 隐私保护, k匿名, KD树, 划分, data publishing, privacy protection, k-anonymity, KD-tree, partition

中文摘要：

针对现有数据发布隐私保护保护算法中的“局部最优”划分问题,提出了一种基于KD树最优投影划分的k匿名算法.首先,在全局范围内对每一个属性维度进行遍历,根据投影距离方差值衡量每个维度的离散度,并确定最优维度;然后,在最优属性维度上,计算其划分系数值,并确定最优划分点.进一步引入一种改进的KD树结构,与传统的KD树结点是一个数据点不同,新设计的KD树中的每个结点均是一个集合.用经过划分点并垂直于最优维度的超平面将一个结点分成两部分,分别作为其左、右孩子结点.最后通过理论分析证明了本文算法的正确性,用实验比较和验证了算法的性能,实验结果显示所提算法平均概化范围减小10%~22%,能够实现更优的划分和更好的数据集可用性.

英文摘要：

Data should be provided to relevant decision makers and researchers.Privacy protection should be carried out before the release because the data contains personal sensitive and privacy information.k-anonymity algorithm is an important algorithm in privacy protection.And partition is a significant method in k-anonymity mechanism.For sake of solving the problem of“local optimization”in the existing privacy preserving algorithm for data publishing,a k-anonymity algorithm based on optimized proj ection partition of KD-tree is proposed in this paper.Firstly,the optimized attribute dimension is determined by the dispersion degree according to the proj ection distance variance value in the global domain.Then,in the optimal attribute dimension,the value of the partition coefficient is calculated,and the optimal partition point is determined.Furthermore,an improved KD-tree structure is introduced in this paper.And this structure can solve the problem efficiently.The new designed KD-tree node is a collection rather than a data point in＆amp;nbsp;traditional KD-tree.A KD-tree node is divided into two parts as its left child node and its right child node by the hyper plane which pass through the dividing point and is perpendicular to the optimal dimensional.Finally,the validity of the proposed algorithm is verified by theoretical analysis and the performance of the algorithm is demonstrated by comparing experiments.The experimental results show that the proposed algorithm can reduce the average generalization range by 22%to 1 0%,which can achieve a better division and better data set availability.

同期刊论文项目