基于KD-树与差分隐私保护的空间数据分割得到了研究者的广泛关注,空间数据的大小与拉普拉斯噪音的多少直接制约着空间分割的精度。针对现有基于KD-树分割方法难以有效兼顾大规模空间数据与噪音量不足的问题,提出了一种满足差分隐私的KD-树分割方法SKD-Tree(sampling-based KD-Tree)。该方法利用满足差分隐私的伯努利随机抽样技术,抽取空间样本作为分割对象,然而却没有摆脱利用树高度控制拉普拉斯噪音。启发式设定合适的树高度非常困难,树高度过大,导致结点的噪音值过大;树高度过小,导致空间分割粒度太粗劣。为了弥补SKD-Tree方法的不足,提出了一种基于稀疏向量技术(sparse vector technology,SVT)的空间分割方法KD-TSS(KD-Tree with sampling and SVT)。该方法通过SVT判断树中结点是否继续分割,不再依赖KD-树高度来控制结点中的噪音值。SKD-Tree、KD-TSS与KD-Stand、KD-Hybrid在真实的大规模空间数据集上实验结果表明,其分割精度以及响应范围查询效果优于同类算法。
KD-Tree-based differentially private spatial decomposition has attracted considerable research attention in recent years.The trade-off between the size of spatial data and Laplace noise directly constrains the accuracy of decomposition.This paper proposes a straightforward method with differential privacy,called SKD-TS(samplingbased KD-Tree)to partition spatial data.To handle the large-scale spatial data,this method employs Bernoulli random sampling technology to obtain the samples.While SKD-Tree still relies on the height of KD-Tree to control the Laplace noise.However,the choice of the height is a serious subtitle:a large height makes excessive noise in the nodes,while a small height leads to the partition too coarse-grained.To remedy the deficiency of SKD-Tree,this paper proposes another method,called KD-TSS(KD-Tree with sampling and SVT)for spatial decomposition.The sparse vector technology(SVT)is used in KD-TSS to judge whether a node of KD-Tree should be split,without depending on the height.SKD-TS and KD-TSS methods are compared with existing methods such as KD-Stand,KD-Hybird on the large-scale real datasets.The experimental results show that the two algorithms outperform their competitors,achieve the accurate decomposition and results of range query.